linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware
@ 2013-01-15 11:43 Terje Bergstrom
  2013-01-15 11:43 ` [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver Terje Bergstrom
                   ` (8 more replies)
  0 siblings, 9 replies; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:43 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

This set of patches adds support for Tegra20 and Tegra30 host1x and
2D. It is based on linux-next-20130114. The set was regenerated with
git format-patch -M.

The fifth version merges DRM and host1x drivers into one driver. This
allowed moving include/linux/host1x.h back into the driver and removed
the need for a dummy platform device. This version also uses the code
from tegradrm driver almost as is, so there are a lot less actual code
changes.

This patch set does not have the host1x allocator, but it uses CMA
helpers for memory management.

host1x is the driver that controls host1x hardware. It supports
host1x command channels, synchronization, and memory management. It
is sectioned into logical driver under drivers/gpu/host1x and
physical driver under drivers/host1x/hw. The physical driver is
compiled with the hardware headers of the particular host1x version.

The hardware units are described (briefly) in the Tegra2 TRM. Wiki
page https://gitorious.org/linux-tegra-drm/pages/Host1xIntroduction
also contains a short description of the functionality.

The patch set merges tegradrm into host1x and adds 2D driver, which
uses host1x channels and sync points. The patch set also adds user
space API to tegradrm for accessing host1x and 2D.


Terje Bergstrom (8):
  gpu: host1x: Add host1x driver
  gpu: host1x: Add syncpoint wait and interrupts
  gpu: host1x: Add channel support
  gpu: host1x: Add debug support
  drm: tegra: Move drm to live under host1x
  gpu: host1x: Remove second host1x driver
  ARM: tegra: Add board data and 2D clocks
  drm: tegra: Add gr2d device

 arch/arm/mach-tegra/board-dt-tegra20.c         |    1 +
 arch/arm/mach-tegra/board-dt-tegra30.c         |    1 +
 arch/arm/mach-tegra/tegra20_clocks_data.c      |    2 +-
 arch/arm/mach-tegra/tegra30_clocks_data.c      |    1 +
 drivers/gpu/Makefile                           |    1 +
 drivers/gpu/drm/Kconfig                        |    2 -
 drivers/gpu/drm/Makefile                       |    1 -
 drivers/gpu/drm/tegra/Makefile                 |    7 -
 drivers/gpu/drm/tegra/drm.c                    |  115 -----
 drivers/gpu/host1x/Kconfig                     |   32 ++
 drivers/gpu/host1x/Makefile                    |   22 +
 drivers/gpu/host1x/cdma.c                      |  473 ++++++++++++++++++
 drivers/gpu/host1x/cdma.h                      |  107 +++++
 drivers/gpu/host1x/channel.c                   |  140 ++++++
 drivers/gpu/host1x/channel.h                   |   58 +++
 drivers/gpu/host1x/cma.c                       |  116 +++++
 drivers/gpu/host1x/cma.h                       |   43 ++
 drivers/gpu/host1x/debug.c                     |  215 +++++++++
 drivers/gpu/host1x/debug.h                     |   50 ++
 drivers/gpu/host1x/dev.c                       |  251 ++++++++++
 drivers/gpu/host1x/dev.h                       |  170 +++++++
 drivers/gpu/{drm/tegra => host1x/drm}/Kconfig  |    2 +-
 drivers/gpu/{drm/tegra => host1x/drm}/dc.c     |    7 +-
 drivers/gpu/{drm/tegra => host1x/drm}/dc.h     |    0
 drivers/gpu/host1x/drm/drm.c                   |  548 +++++++++++++++++++++
 drivers/gpu/{drm/tegra => host1x/drm}/drm.h    |   37 +-
 drivers/gpu/{drm/tegra => host1x/drm}/fb.c     |    0
 drivers/gpu/host1x/drm/gr2d.c                  |  325 +++++++++++++
 drivers/gpu/{drm/tegra => host1x/drm}/hdmi.c   |    7 +-
 drivers/gpu/{drm/tegra => host1x/drm}/hdmi.h   |    0
 drivers/gpu/{drm/tegra => host1x/drm}/host1x.c |    0
 drivers/gpu/{drm/tegra => host1x/drm}/output.c |    0
 drivers/gpu/{drm/tegra => host1x/drm}/rgb.c    |    0
 drivers/gpu/host1x/host1x.h                    |   29 ++
 drivers/gpu/host1x/host1x_client.h             |   34 ++
 drivers/gpu/host1x/hw/Makefile                 |    6 +
 drivers/gpu/host1x/hw/cdma_hw.c                |  478 ++++++++++++++++++
 drivers/gpu/host1x/hw/cdma_hw.h                |   37 ++
 drivers/gpu/host1x/hw/channel_hw.c             |  148 ++++++
 drivers/gpu/host1x/hw/debug_hw.c               |  400 ++++++++++++++++
 drivers/gpu/host1x/hw/host1x01.c               |   45 ++
 drivers/gpu/host1x/hw/host1x01.h               |   25 +
 drivers/gpu/host1x/hw/host1x01_hardware.h      |  150 ++++++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h    |  120 +++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h       |  241 ++++++++++
 drivers/gpu/host1x/hw/hw_host1x01_uclass.h     |  168 +++++++
 drivers/gpu/host1x/hw/intr_hw.c                |  178 +++++++
 drivers/gpu/host1x/hw/syncpt_hw.c              |  157 ++++++
 drivers/gpu/host1x/intr.c                      |  383 +++++++++++++++
 drivers/gpu/host1x/intr.h                      |  109 +++++
 drivers/gpu/host1x/job.c                       |  612 ++++++++++++++++++++++++
 drivers/gpu/host1x/job.h                       |  164 +++++++
 drivers/gpu/host1x/memmgr.c                    |  173 +++++++
 drivers/gpu/host1x/memmgr.h                    |   72 +++
 drivers/gpu/host1x/syncpt.c                    |  399 +++++++++++++++
 drivers/gpu/host1x/syncpt.h                    |  165 +++++++
 drivers/video/Kconfig                          |    2 +
 include/drm/tegra_drm.h                        |  131 +++++
 include/trace/events/host1x.h                  |  272 +++++++++++
 59 files changed, 7295 insertions(+), 137 deletions(-)
 delete mode 100644 drivers/gpu/drm/tegra/Makefile
 delete mode 100644 drivers/gpu/drm/tegra/drm.c
 create mode 100644 drivers/gpu/host1x/Kconfig
 create mode 100644 drivers/gpu/host1x/Makefile
 create mode 100644 drivers/gpu/host1x/cdma.c
 create mode 100644 drivers/gpu/host1x/cdma.h
 create mode 100644 drivers/gpu/host1x/channel.c
 create mode 100644 drivers/gpu/host1x/channel.h
 create mode 100644 drivers/gpu/host1x/cma.c
 create mode 100644 drivers/gpu/host1x/cma.h
 create mode 100644 drivers/gpu/host1x/debug.c
 create mode 100644 drivers/gpu/host1x/debug.h
 create mode 100644 drivers/gpu/host1x/dev.c
 create mode 100644 drivers/gpu/host1x/dev.h
 rename drivers/gpu/{drm/tegra => host1x/drm}/Kconfig (94%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/dc.c (99%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/dc.h (100%)
 create mode 100644 drivers/gpu/host1x/drm/drm.c
 rename drivers/gpu/{drm/tegra => host1x/drm}/drm.h (86%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/fb.c (100%)
 create mode 100644 drivers/gpu/host1x/drm/gr2d.c
 rename drivers/gpu/{drm/tegra => host1x/drm}/hdmi.c (99%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/hdmi.h (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/host1x.c (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/output.c (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/rgb.c (100%)
 create mode 100644 drivers/gpu/host1x/host1x.h
 create mode 100644 drivers/gpu/host1x/host1x_client.h
 create mode 100644 drivers/gpu/host1x/hw/Makefile
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.c
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.h
 create mode 100644 drivers/gpu/host1x/hw/channel_hw.c
 create mode 100644 drivers/gpu/host1x/hw/debug_hw.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.h
 create mode 100644 drivers/gpu/host1x/hw/host1x01_hardware.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_sync.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_uclass.h
 create mode 100644 drivers/gpu/host1x/hw/intr_hw.c
 create mode 100644 drivers/gpu/host1x/hw/syncpt_hw.c
 create mode 100644 drivers/gpu/host1x/intr.c
 create mode 100644 drivers/gpu/host1x/intr.h
 create mode 100644 drivers/gpu/host1x/job.c
 create mode 100644 drivers/gpu/host1x/job.h
 create mode 100644 drivers/gpu/host1x/memmgr.c
 create mode 100644 drivers/gpu/host1x/memmgr.h
 create mode 100644 drivers/gpu/host1x/syncpt.c
 create mode 100644 drivers/gpu/host1x/syncpt.h
 create mode 100644 include/drm/tegra_drm.h
 create mode 100644 include/trace/events/host1x.h

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
@ 2013-01-15 11:43 ` Terje Bergstrom
  2013-02-04  9:09   ` Thierry Reding
  2013-01-15 11:43 ` [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts Terje Bergstrom
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:43 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

Add host1x, the driver for host1x and its client unit 2D.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/Makefile                      |    1 +
 drivers/gpu/host1x/Kconfig                |    6 +
 drivers/gpu/host1x/Makefile               |    8 ++
 drivers/gpu/host1x/dev.c                  |  161 +++++++++++++++++++++
 drivers/gpu/host1x/dev.h                  |   73 ++++++++++
 drivers/gpu/host1x/hw/Makefile            |    6 +
 drivers/gpu/host1x/hw/host1x01.c          |   35 +++++
 drivers/gpu/host1x/hw/host1x01.h          |   25 ++++
 drivers/gpu/host1x/hw/host1x01_hardware.h |   26 ++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h  |   72 ++++++++++
 drivers/gpu/host1x/hw/syncpt_hw.c         |  146 +++++++++++++++++++
 drivers/gpu/host1x/syncpt.c               |  217 +++++++++++++++++++++++++++++
 drivers/gpu/host1x/syncpt.h               |  153 ++++++++++++++++++++
 drivers/video/Kconfig                     |    2 +
 include/trace/events/host1x.h             |   61 ++++++++
 15 files changed, 992 insertions(+)
 create mode 100644 drivers/gpu/host1x/Kconfig
 create mode 100644 drivers/gpu/host1x/Makefile
 create mode 100644 drivers/gpu/host1x/dev.c
 create mode 100644 drivers/gpu/host1x/dev.h
 create mode 100644 drivers/gpu/host1x/hw/Makefile
 create mode 100644 drivers/gpu/host1x/hw/host1x01.c
 create mode 100644 drivers/gpu/host1x/hw/host1x01.h
 create mode 100644 drivers/gpu/host1x/hw/host1x01_hardware.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_sync.h
 create mode 100644 drivers/gpu/host1x/hw/syncpt_hw.c
 create mode 100644 drivers/gpu/host1x/syncpt.c
 create mode 100644 drivers/gpu/host1x/syncpt.h
 create mode 100644 include/trace/events/host1x.h

diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
index cc92778..7e227097 100644
--- a/drivers/gpu/Makefile
+++ b/drivers/gpu/Makefile
@@ -1 +1,2 @@
 obj-y			+= drm/ vga/ stub/
+obj-$(CONFIG_TEGRA_HOST1X)	+= host1x/
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
new file mode 100644
index 0000000..e89fb2b
--- /dev/null
+++ b/drivers/gpu/host1x/Kconfig
@@ -0,0 +1,6 @@
+config TEGRA_HOST1X
+	tristate "Tegra host1x driver"
+	help
+	  Driver for the Tegra host1x hardware.
+
+	  Required for enabling tegradrm.
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
new file mode 100644
index 0000000..363e6ab
--- /dev/null
+++ b/drivers/gpu/host1x/Makefile
@@ -0,0 +1,8 @@
+ccflags-y = -Idrivers/gpu/host1x
+
+host1x-y = \
+	syncpt.o \
+	dev.o \
+	hw/host1x01.o
+
+obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
new file mode 100644
index 0000000..cd2b1ef
--- /dev/null
+++ b/drivers/gpu/host1x/dev.c
@@ -0,0 +1,161 @@
+/*
+ * Tegra host1x driver
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/clk.h>
+#include <linux/io.h>
+#include "dev.h"
+#include "hw/host1x01.h"
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/host1x.h>
+
+#define DRIVER_NAME		"tegra-host1x"
+
+void host1x_sync_writel(struct host1x *host1x, u32 v, u32 r)
+{
+	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
+
+	writel(v, sync_regs + r);
+}
+
+u32 host1x_sync_readl(struct host1x *host1x, u32 r)
+{
+	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
+
+	return readl(sync_regs + r);
+}
+
+static struct host1x_device_info host1x_info = {
+	.nb_channels	= 8,
+	.nb_pts		= 32,
+	.nb_mlocks	= 16,
+	.nb_bases	= 8,
+	.init		= host1x01_init,
+	.sync_offset	= 0x3000,
+};
+
+static struct of_device_id host1x_match[] = {
+	{ .compatible = "nvidia,tegra30-host1x", .data = &host1x_info, },
+	{ .compatible = "nvidia,tegra20-host1x", .data = &host1x_info, },
+	{ },
+};
+
+static int host1x_probe(struct platform_device *dev)
+{
+	struct host1x *host;
+	struct resource *regs;
+	int syncpt_irq;
+	int err;
+	const struct of_device_id *devid =
+		of_match_device(host1x_match, &dev->dev);
+
+	if (!devid)
+		return -EINVAL;
+
+	regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
+	if (!regs) {
+		dev_err(&dev->dev, "missing regs\n");
+		return -ENXIO;
+	}
+
+	syncpt_irq = platform_get_irq(dev, 0);
+	if (IS_ERR_VALUE(syncpt_irq)) {
+		dev_err(&dev->dev, "missing irq\n");
+		return -ENXIO;
+	}
+
+	host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL);
+	if (!host) {
+		dev_err(&dev->dev, "failed to alloc host1x\n");
+		return -ENOMEM;
+	}
+
+	host->dev = dev;
+	memcpy(&host->info, devid->data, sizeof(struct host1x_device_info));
+
+	/* set common host1x device data */
+	platform_set_drvdata(dev, host);
+
+	host->regs = devm_request_and_ioremap(&dev->dev, regs);
+	if (!host->regs) {
+		dev_err(&dev->dev, "failed to remap host registers\n");
+		return -ENXIO;
+	}
+
+	if (host->info.init) {
+		err = host->info.init(host);
+		if (err)
+			return err;
+	}
+
+	err = host1x_syncpt_init(host);
+	if (err)
+		return err;
+
+	host->clk = devm_clk_get(&dev->dev, NULL);
+	if (IS_ERR(host->clk)) {
+		dev_err(&dev->dev, "failed to get clock\n");
+		err = PTR_ERR(host->clk);
+		goto fail_deinit_syncpt;
+	}
+
+	err = clk_prepare_enable(host->clk);
+	if (err < 0) {
+		dev_err(&dev->dev, "failed to enable clock\n");
+		goto fail_deinit_syncpt;
+	}
+
+	host1x_syncpt_reset(host);
+
+	dev_info(&dev->dev, "initialized\n");
+
+	return 0;
+
+fail_deinit_syncpt:
+	host1x_syncpt_deinit(host);
+	return err;
+}
+
+static int __exit host1x_remove(struct platform_device *dev)
+{
+	struct host1x *host = platform_get_drvdata(dev);
+	host1x_syncpt_deinit(host);
+	clk_disable_unprepare(host->clk);
+	return 0;
+}
+
+static struct platform_driver platform_driver = {
+	.probe = host1x_probe,
+	.remove = __exit_p(host1x_remove),
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = DRIVER_NAME,
+		.of_match_table = host1x_match,
+	},
+};
+
+module_platform_driver(platform_driver);
+
+MODULE_AUTHOR("Terje Bergstrom <tbergstrom@nvidia.com>");
+MODULE_DESCRIPTION("Host1x driver for Tegra products");
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
new file mode 100644
index 0000000..d8f5979
--- /dev/null
+++ b/drivers/gpu/host1x/dev.h
@@ -0,0 +1,73 @@
+/*
+ * Copyright (c) 2012-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HOST1X_DEV_H
+#define HOST1X_DEV_H
+
+#include "syncpt.h"
+
+struct host1x;
+struct host1x_syncpt;
+struct platform_device;
+
+struct host1x_syncpt_ops {
+	void (*reset)(struct host1x_syncpt *);
+	void (*reset_wait_base)(struct host1x_syncpt *);
+	void (*read_wait_base)(struct host1x_syncpt *);
+	u32 (*load_min)(struct host1x_syncpt *);
+	void (*cpu_incr)(struct host1x_syncpt *);
+	int (*patch_wait)(struct host1x_syncpt *, void *patch_addr);
+	void (*debug)(struct host1x_syncpt *);
+	const char * (*name)(struct host1x_syncpt *);
+};
+
+struct host1x_device_info {
+	int	nb_channels;		/* host1x: num channels supported */
+	int	nb_pts;			/* host1x: num syncpoints supported */
+	int	nb_bases;		/* host1x: num syncpoints supported */
+	int	nb_mlocks;		/* host1x: number of mlocks */
+	int	(*init)(struct host1x *); /* initialize per SoC ops */
+	int	sync_offset;
+};
+
+struct host1x {
+	void __iomem *regs;
+	struct host1x_syncpt *syncpt;
+	struct platform_device *dev;
+	struct host1x_device_info info;
+	struct clk *clk;
+
+	struct host1x_syncpt_ops syncpt_op;
+
+	struct dentry *debugfs;
+};
+
+static inline
+struct host1x *host1x_get_host(struct platform_device *_dev)
+{
+	struct platform_device *pdev;
+
+	if (_dev->dev.parent) {
+		pdev = to_platform_device(_dev->dev.parent);
+		return platform_get_drvdata(pdev);
+	} else
+		return platform_get_drvdata(_dev);
+}
+
+void host1x_sync_writel(struct host1x *host1x, u32 r, u32 v);
+u32 host1x_sync_readl(struct host1x *host1x, u32 r);
+
+#endif
diff --git a/drivers/gpu/host1x/hw/Makefile b/drivers/gpu/host1x/hw/Makefile
new file mode 100644
index 0000000..9b50863
--- /dev/null
+++ b/drivers/gpu/host1x/hw/Makefile
@@ -0,0 +1,6 @@
+ccflags-y = -Idrivers/gpu/host1x
+
+host1x-hw-objs  = \
+	host1x01.o
+
+obj-$(CONFIG_TEGRA_HOST1X) += host1x-hw.o
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
new file mode 100644
index 0000000..ea6e604
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -0,0 +1,35 @@
+/*
+ * Host1x init for T20 and T30 Architecture Chips
+ *
+ * Copyright (c) 2011-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/init.h>
+#include <linux/clk.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+
+#include "hw/host1x01.h"
+#include "dev.h"
+#include "hw/host1x01_hardware.h"
+
+#include "hw/syncpt_hw.c"
+
+int host1x01_init(struct host1x *host)
+{
+	host->syncpt_op = host1x_syncpt_ops;
+
+	return 0;
+}
diff --git a/drivers/gpu/host1x/hw/host1x01.h b/drivers/gpu/host1x/hw/host1x01.h
new file mode 100644
index 0000000..6ec30051
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01.h
@@ -0,0 +1,25 @@
+/*
+ * Host1x init for T20 and T30 Architecture Chips
+ *
+ * Copyright (c) 2011-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef HOST1X_HOST1X01_H
+#define HOST1X_HOST1X01_H
+
+struct host1x;
+
+int host1x01_init(struct host1x *);
+
+#endif /* HOST1X_HOST1X01_H_ */
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
new file mode 100644
index 0000000..c1d5324
--- /dev/null
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -0,0 +1,26 @@
+/*
+ * Tegra host1x Register Offsets for Tegra20 and Tegra30
+ *
+ * Copyright (c) 2010-2013 NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_HOST1X01_HARDWARE_H
+#define __HOST1X_HOST1X01_HARDWARE_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include "hw_host1x01_sync.h"
+
+#endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
new file mode 100644
index 0000000..b12c1a4
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -0,0 +1,72 @@
+/*
+ * Copyright (c) 2012-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x01_sync_h__
+#define __hw_host1x01_sync_h__
+
+static inline u32 host1x_sync_syncpt_0_r(void)
+{
+	return 0x400;
+}
+#define HOST1X_SYNC_SYNCPT_0 \
+	host1x_sync_syncpt_0_r()
+static inline u32 host1x_sync_syncpt_base_0_r(void)
+{
+	return 0x600;
+}
+#define HOST1X_SYNC_SYNCPT_BASE_0 \
+	host1x_sync_syncpt_base_0_r()
+static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
+{
+	return 0x700;
+}
+#define HOST1X_SYNC_SYNCPT_CPU_INCR \
+	host1x_sync_syncpt_cpu_incr_r()
+#endif /* __hw_host1x01_sync_h__ */
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
new file mode 100644
index 0000000..16e3ada
--- /dev/null
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -0,0 +1,146 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/io.h>
+#include "syncpt.h"
+#include "dev.h"
+
+/*
+ * Write the current syncpoint value back to hw.
+ */
+static void syncpt_reset(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	int min = host1x_syncpt_read_min(sp);
+	host1x_sync_writel(dev, min, HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
+}
+
+/*
+ * Write the current waitbase value back to hw.
+ */
+static void syncpt_reset_wait_base(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	host1x_sync_writel(dev, sp->base_val,
+			HOST1X_SYNC_SYNCPT_BASE_0 + sp->id * 4);
+}
+
+/*
+ * Read waitbase value from hw.
+ */
+static void syncpt_read_wait_base(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	sp->base_val = host1x_sync_readl(dev,
+				HOST1X_SYNC_SYNCPT_BASE_0 + sp->id * 4);
+}
+
+/*
+ * Updates the last value read from hardware.
+ * (was host1x_syncpt_load_min)
+ */
+static u32 syncpt_load_min(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	u32 old, live;
+
+	do {
+		old = host1x_syncpt_read_min(sp);
+		live = host1x_sync_readl(dev,
+				HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
+	} while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
+
+	if (!host1x_syncpt_check_max(sp, live))
+		dev_err(&dev->dev->dev,
+				"%s failed: id=%u, min=%d, max=%d\n",
+				__func__,
+				sp->id,
+				host1x_syncpt_read_min(sp),
+				host1x_syncpt_read_max(sp));
+
+	return live;
+}
+
+/*
+ * Write a cpu syncpoint increment to the hardware, without touching
+ * the cache. Caller is responsible for host being powered.
+ */
+static void syncpt_cpu_incr(struct host1x_syncpt *sp)
+{
+	struct host1x *dev = sp->dev;
+	u32 reg_offset = sp->id / 32;
+
+	if (!host1x_syncpt_client_managed(sp)
+			&& host1x_syncpt_min_eq_max(sp)) {
+		dev_err(&dev->dev->dev,
+			"Trying to increment syncpoint id %d beyond max\n",
+			sp->id);
+		return;
+	}
+	host1x_sync_writel(dev, BIT_MASK(sp->id),
+			HOST1X_SYNC_SYNCPT_CPU_INCR + reg_offset * 4);
+	wmb();
+}
+
+static const char *syncpt_name(struct host1x_syncpt *sp)
+{
+	struct host1x_device_info *info = &sp->dev->info;
+	const char *name = NULL;
+
+	if (sp->id < info->nb_pts)
+		name = sp->name;
+
+	return name ? name : "";
+}
+
+static void syncpt_debug(struct host1x_syncpt *sp)
+{
+	u32 i;
+	for (i = 0; i < host1x_syncpt_nb_pts(sp->dev); i++) {
+		u32 max = host1x_syncpt_read_max(sp);
+		u32 min = host1x_syncpt_load_min(sp);
+		if (!max && !min)
+			continue;
+		dev_info(&sp->dev->dev->dev,
+			"id %d (%s) min %d max %d\n",
+			i, sp->name,
+			min, max);
+
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(sp->dev); i++) {
+		u32 base_val;
+		host1x_syncpt_read_wait_base(sp);
+		base_val = sp->base_val;
+		if (base_val)
+			dev_info(&sp->dev->dev->dev,
+					"waitbase id %d val %d\n",
+					i, base_val);
+
+	}
+}
+
+static const struct host1x_syncpt_ops host1x_syncpt_ops = {
+	.reset = syncpt_reset,
+	.reset_wait_base = syncpt_reset_wait_base,
+	.read_wait_base = syncpt_read_wait_base,
+	.load_min = syncpt_load_min,
+	.cpu_incr = syncpt_cpu_incr,
+	.debug = syncpt_debug,
+	.name = syncpt_name,
+};
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
new file mode 100644
index 0000000..b45651f
--- /dev/null
+++ b/drivers/gpu/host1x/syncpt.c
@@ -0,0 +1,217 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/stat.h>
+#include <linux/module.h>
+#include "syncpt.h"
+#include "dev.h"
+#include <trace/events/host1x.h>
+
+#define MAX_SYNCPT_LENGTH	5
+
+static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
+		struct platform_device *pdev,
+		int client_managed);
+
+u32 host1x_syncpt_id(struct host1x_syncpt *sp)
+{
+	return sp->id;
+}
+
+/*
+ * Updates the value sent to hardware.
+ */
+u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs)
+{
+	return (u32)atomic_add_return(incrs, &sp->max_val);
+}
+
+/*
+ * Resets syncpoint and waitbase values to sw shadows
+ */
+void host1x_syncpt_reset(struct host1x *dev)
+{
+	struct host1x_syncpt *sp_base = dev->syncpt;
+	u32 i;
+
+	for (i = 0; i < host1x_syncpt_nb_pts(dev); i++)
+		dev->syncpt_op.reset(sp_base + i);
+	for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
+		dev->syncpt_op.reset_wait_base(sp_base + i);
+	wmb();
+}
+
+/*
+ * Updates sw shadow state for client managed registers
+ */
+void host1x_syncpt_save(struct host1x *dev)
+{
+	struct host1x_syncpt *sp_base = dev->syncpt;
+	u32 i;
+
+	for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) {
+		if (host1x_syncpt_client_managed(sp_base + i))
+			dev->syncpt_op.load_min(sp_base + i);
+		else
+			WARN_ON(!host1x_syncpt_min_eq_max(sp_base + i));
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
+		dev->syncpt_op.read_wait_base(sp_base + i);
+}
+
+/*
+ * Updates the last value read from hardware.
+ */
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
+{
+	u32 val;
+	val = sp->dev->syncpt_op.load_min(sp);
+	trace_host1x_syncpt_load_min(sp->id, val);
+
+	return val;
+}
+
+/*
+ * Get the current syncpoint base
+ */
+u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp)
+{
+	u32 val;
+	sp->dev->syncpt_op.read_wait_base(sp);
+	val = sp->base_val;
+	return val;
+}
+
+/*
+ * Write a cpu syncpoint increment to the hardware, without touching
+ * the cache. Caller is responsible for host being powered.
+ */
+void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp)
+{
+	sp->dev->syncpt_op.cpu_incr(sp);
+}
+
+/*
+ * Increment syncpoint value from cpu, updating cache
+ */
+void host1x_syncpt_incr(struct host1x_syncpt *sp)
+{
+	if (host1x_syncpt_client_managed(sp))
+		host1x_syncpt_incr_max(sp, 1);
+	host1x_syncpt_cpu_incr(sp);
+}
+
+void host1x_syncpt_debug(struct host1x_syncpt *sp)
+{
+	sp->dev->syncpt_op.debug(sp);
+}
+
+int host1x_syncpt_init(struct host1x *host)
+{
+	struct host1x_syncpt *syncpt, *sp;
+	int i;
+
+	syncpt = sp = devm_kzalloc(&host->dev->dev,
+			sizeof(struct host1x_syncpt) * host->info.nb_pts,
+			GFP_KERNEL);
+	if (!syncpt)
+		return -ENOMEM;
+
+	for (i = 0; i < host->info.nb_pts; ++i, ++sp) {
+		sp->id = i;
+		sp->dev = host;
+	}
+
+	host->syncpt = syncpt;
+
+	return 0;
+}
+
+static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
+		struct platform_device *pdev,
+		int client_managed)
+{
+	int i;
+	struct host1x_syncpt *sp = host->syncpt;
+	char *name;
+
+	for (i = 0; i < host->info.nb_pts && sp->name; i++, sp++)
+		;
+	if (sp->pdev)
+		return NULL;
+
+	name = kasprintf(GFP_KERNEL, "%02d-%s", sp->id,
+			pdev ? dev_name(&pdev->dev) : NULL);
+	if (!name)
+		return NULL;
+
+	sp->pdev = pdev;
+	sp->name = name;
+	sp->client_managed = client_managed;
+
+	return sp;
+}
+
+struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
+		int client_managed)
+{
+	struct host1x *host = host1x_get_host(pdev);
+	return _host1x_syncpt_alloc(host, pdev, client_managed);
+}
+
+void host1x_syncpt_free(struct host1x_syncpt *sp)
+{
+	if (!sp)
+		return;
+
+	kfree(sp->name);
+	sp->pdev = NULL;
+	sp->name = NULL;
+	sp->client_managed = 0;
+}
+
+void host1x_syncpt_deinit(struct host1x *host)
+{
+	int i;
+	struct host1x_syncpt *sp = host->syncpt;
+	for (i = 0; i < host->info.nb_pts; i++, sp++)
+		kfree(sp->name);
+}
+
+int host1x_syncpt_nb_pts(struct host1x *dev)
+{
+	return dev->info.nb_pts;
+}
+
+int host1x_syncpt_nb_bases(struct host1x *dev)
+{
+	return dev->info.nb_bases;
+}
+
+int host1x_syncpt_nb_mlocks(struct host1x *dev)
+{
+	return dev->info.nb_mlocks;
+}
+
+struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id)
+{
+	return dev->syncpt + id;
+}
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
new file mode 100644
index 0000000..d9b9b0a
--- /dev/null
+++ b/drivers/gpu/host1x/syncpt.h
@@ -0,0 +1,153 @@
+/*
+ * Tegra host1x Syncpoints
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_SYNCPT_H
+#define __HOST1X_SYNCPT_H
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/atomic.h>
+
+struct host1x;
+
+#define NVSYNCPT_INVALID			(-1)
+
+struct host1x_syncpt {
+	int id;
+	atomic_t min_val;
+	atomic_t max_val;
+	u32 base_val;
+	const char *name;
+	int client_managed;
+	struct host1x *dev;
+	struct platform_device *pdev;
+};
+
+/* Initialize sync point array  */
+int host1x_syncpt_init(struct host1x *);
+
+/*  Free sync point array */
+void host1x_syncpt_deinit(struct host1x *);
+
+/*
+ * Read max. It indicates how many operations there are in queue, either in
+ * channel or in a software thread.
+ * */
+static inline u32 host1x_syncpt_read_max(struct host1x_syncpt *sp)
+{
+	smp_rmb();
+	return (u32)atomic_read(&sp->max_val);
+}
+
+/*
+ * Read min, which is a shadow of the current sync point value in hardware.
+ */
+static inline u32 host1x_syncpt_read_min(struct host1x_syncpt *sp)
+{
+	smp_rmb();
+	return (u32)atomic_read(&sp->min_val);
+}
+
+/* Return number of sync point supported. */
+int host1x_syncpt_nb_pts(struct host1x *dev);
+
+/* Return number of wait bases supported. */
+int host1x_syncpt_nb_bases(struct host1x *dev);
+
+/* Return number of mlocks supported. */
+int host1x_syncpt_nb_mlocks(struct host1x *dev);
+
+/*
+ * Check sync point sanity. If max is larger than min, there have too many
+ * sync point increments.
+ *
+ * Client managed sync point are not tracked.
+ * */
+static inline bool host1x_syncpt_check_max(struct host1x_syncpt *sp, u32 real)
+{
+	u32 max;
+	if (sp->client_managed)
+		return true;
+	max = host1x_syncpt_read_max(sp);
+	return (s32)(max - real) >= 0;
+}
+
+/* Return true if sync point is client managed. */
+static inline int host1x_syncpt_client_managed(struct host1x_syncpt *sp)
+{
+	return sp->client_managed;
+}
+
+/*
+ * Returns true if syncpoint min == max, which means that there are no
+ * outstanding operations.
+ */
+static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp)
+{
+	int min, max;
+	smp_rmb();
+	min = atomic_read(&sp->min_val);
+	max = atomic_read(&sp->max_val);
+	return (min == max);
+}
+
+/* Return pointer to struct denoting sync point id. */
+struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id);
+
+/* Request incrementing a sync point. */
+void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
+
+/* Load current value from hardware to the shadow register. */
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp);
+
+/* Save host1x sync point state into shadow registers. */
+void host1x_syncpt_save(struct host1x *dev);
+
+/* Reset host1x sync point state from shadow registers. */
+void host1x_syncpt_reset(struct host1x *dev);
+
+/* Read current wait base value into shadow register and return it. */
+u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp);
+
+/* Increment sync point and its max. */
+void host1x_syncpt_incr(struct host1x_syncpt *sp);
+
+/* Indicate future operations by incrementing the sync point max. */
+u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs);
+
+/* Do a debug dump of sync point values. */
+void host1x_syncpt_debug(struct host1x_syncpt *sp);
+
+/* Check if sync point id is valid. */
+static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
+{
+	return sp->id != NVSYNCPT_INVALID &&
+		sp->id < host1x_syncpt_nb_pts(sp->dev);
+}
+
+/* Return id of the sync point */
+u32 host1x_syncpt_id(struct host1x_syncpt *sp);
+
+/* Allocate a sync point for a device. */
+struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
+		int client_managed);
+
+/* Free a sync point. */
+void host1x_syncpt_free(struct host1x_syncpt *sp);
+
+#endif
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index e7068c5..776ddba 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -21,6 +21,8 @@ source "drivers/gpu/vga/Kconfig"
 
 source "drivers/gpu/drm/Kconfig"
 
+source "drivers/gpu/host1x/Kconfig"
+
 source "drivers/gpu/stub/Kconfig"
 
 config VGASTATE
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
new file mode 100644
index 0000000..3c14cac
--- /dev/null
+++ b/include/trace/events/host1x.h
@@ -0,0 +1,61 @@
+/*
+ * include/trace/events/host1x.h
+ *
+ * Nvhost event logging to ftrace.
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM host1x
+
+#if !defined(_TRACE_HOST1X_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HOST1X_H
+
+#include <linux/ktime.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(host1x,
+	TP_PROTO(const char *name),
+	TP_ARGS(name),
+	TP_STRUCT__entry(__field(const char *, name)),
+	TP_fast_assign(__entry->name = name;),
+	TP_printk("name=%s", __entry->name)
+);
+
+TRACE_EVENT(host1x_syncpt_load_min,
+	TP_PROTO(u32 id, u32 val),
+
+	TP_ARGS(id, val),
+
+	TP_STRUCT__entry(
+		__field(u32, id)
+		__field(u32, val)
+	),
+
+	TP_fast_assign(
+		__entry->id = id;
+		__entry->val = val;
+	),
+
+	TP_printk("id=%d, val=%d", __entry->id, __entry->val)
+);
+
+#endif /*  _TRACE_HOST1X_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
  2013-01-15 11:43 ` [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver Terje Bergstrom
@ 2013-01-15 11:43 ` Terje Bergstrom
  2013-02-04 10:30   ` Thierry Reding
  2013-01-15 11:43 ` [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support Terje Bergstrom
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:43 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

Add support for sync point interrupts, and sync point wait. Sync
point wait used interrupts for unblocking wait.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Makefile              |    1 +
 drivers/gpu/host1x/dev.c                 |   21 +-
 drivers/gpu/host1x/dev.h                 |   17 +-
 drivers/gpu/host1x/hw/host1x01.c         |    2 +
 drivers/gpu/host1x/hw/hw_host1x01_sync.h |   42 ++++
 drivers/gpu/host1x/hw/intr_hw.c          |  178 +++++++++++++++
 drivers/gpu/host1x/intr.c                |  356 ++++++++++++++++++++++++++++++
 drivers/gpu/host1x/intr.h                |  103 +++++++++
 drivers/gpu/host1x/syncpt.c              |  163 ++++++++++++++
 drivers/gpu/host1x/syncpt.h              |    5 +
 10 files changed, 883 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/host1x/hw/intr_hw.c
 create mode 100644 drivers/gpu/host1x/intr.c
 create mode 100644 drivers/gpu/host1x/intr.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 363e6ab..5ef47ff 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -3,6 +3,7 @@ ccflags-y = -Idrivers/gpu/host1x
 host1x-y = \
 	syncpt.o \
 	dev.o \
+	intr.o \
 	hw/host1x01.o
 
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index cd2b1ef..7f9f389 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -24,6 +24,7 @@
 #include <linux/clk.h>
 #include <linux/io.h>
 #include "dev.h"
+#include "intr.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)
 
 	/* set common host1x device data */
 	platform_set_drvdata(dev, host);
-
 	host->regs = devm_request_and_ioremap(&dev->dev, regs);
 	if (!host->regs) {
 		dev_err(&dev->dev, "failed to remap host registers\n");
@@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev)
 	}
 
 	err = host1x_syncpt_init(host);
-	if (err)
+	if (err) {
+		dev_err(&dev->dev, "failed to init sync points");
 		return err;
+	}
+
+	err = host1x_intr_init(&host->intr, syncpt_irq);
+	if (err) {
+		dev_err(&dev->dev, "failed to init irq");
+		goto fail_deinit_syncpt;
+	}
 
 	host->clk = devm_clk_get(&dev->dev, NULL);
 	if (IS_ERR(host->clk)) {
 		dev_err(&dev->dev, "failed to get clock\n");
 		err = PTR_ERR(host->clk);
-		goto fail_deinit_syncpt;
+		goto fail_deinit_intr;
 	}
 
 	err = clk_prepare_enable(host->clk);
 	if (err < 0) {
 		dev_err(&dev->dev, "failed to enable clock\n");
-		goto fail_deinit_syncpt;
+		goto fail_deinit_intr;
 	}
 
 	host1x_syncpt_reset(host);
 
+	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
+
 	dev_info(&dev->dev, "initialized\n");
 
 	return 0;
 
+fail_deinit_intr:
+	host1x_intr_deinit(&host->intr);
 fail_deinit_syncpt:
 	host1x_syncpt_deinit(host);
 	return err;
@@ -139,6 +151,7 @@ fail_deinit_syncpt:
 static int __exit host1x_remove(struct platform_device *dev)
 {
 	struct host1x *host = platform_get_drvdata(dev);
+	host1x_intr_deinit(&host->intr);
 	host1x_syncpt_deinit(host);
 	clk_disable_unprepare(host->clk);
 	return 0;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index d8f5979..8376092 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -17,11 +17,12 @@
 #ifndef HOST1X_DEV_H
 #define HOST1X_DEV_H
 
+#include <linux/platform_device.h>
 #include "syncpt.h"
+#include "intr.h"
 
 struct host1x;
 struct host1x_syncpt;
-struct platform_device;
 
 struct host1x_syncpt_ops {
 	void (*reset)(struct host1x_syncpt *);
@@ -34,6 +35,18 @@ struct host1x_syncpt_ops {
 	const char * (*name)(struct host1x_syncpt *);
 };
 
+struct host1x_intr_ops {
+	void (*init_host_sync)(struct host1x_intr *);
+	void (*set_host_clocks_per_usec)(
+		struct host1x_intr *, u32 clocks);
+	void (*set_syncpt_threshold)(
+		struct host1x_intr *, u32 id, u32 thresh);
+	void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
+	void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
+	void (*disable_all_syncpt_intrs)(struct host1x_intr *);
+	int (*free_syncpt_irq)(struct host1x_intr *);
+};
+
 struct host1x_device_info {
 	int	nb_channels;		/* host1x: num channels supported */
 	int	nb_pts;			/* host1x: num syncpoints supported */
@@ -46,11 +59,13 @@ struct host1x_device_info {
 struct host1x {
 	void __iomem *regs;
 	struct host1x_syncpt *syncpt;
+	struct host1x_intr intr;
 	struct platform_device *dev;
 	struct host1x_device_info info;
 	struct clk *clk;
 
 	struct host1x_syncpt_ops syncpt_op;
+	struct host1x_intr_ops intr_op;
 
 	struct dentry *debugfs;
 };
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index ea6e604..3d633a3 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -26,10 +26,12 @@
 #include "hw/host1x01_hardware.h"
 
 #include "hw/syncpt_hw.c"
+#include "hw/intr_hw.c"
 
 int host1x01_init(struct host1x *host)
 {
 	host->syncpt_op = host1x_syncpt_ops;
+	host->intr_op = host1x_intr_ops;
 
 	return 0;
 }
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index b12c1a4..5da9afb 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -51,12 +51,54 @@
 #ifndef __hw_host1x01_sync_h__
 #define __hw_host1x01_sync_h__
 
+static inline u32 host1x_sync_syncpt_thresh_cpu0_int_status_r(void)
+{
+	return 0x40;
+}
+#define HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS \
+	host1x_sync_syncpt_thresh_cpu0_int_status_r()
+static inline u32 host1x_sync_syncpt_thresh_int_disable_r(void)
+{
+	return 0x60;
+}
+#define HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE \
+	host1x_sync_syncpt_thresh_int_disable_r()
+static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
+{
+	return 0x68;
+}
+#define HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 \
+	host1x_sync_syncpt_thresh_int_enable_cpu0_r()
+static inline u32 host1x_sync_usec_clk_r(void)
+{
+	return 0x1a4;
+}
+#define HOST1X_SYNC_USEC_CLK \
+	host1x_sync_usec_clk_r()
+static inline u32 host1x_sync_ctxsw_timeout_cfg_r(void)
+{
+	return 0x1a8;
+}
+#define HOST1X_SYNC_CTXSW_TIMEOUT_CFG \
+	host1x_sync_ctxsw_timeout_cfg_r()
+static inline u32 host1x_sync_ip_busy_timeout_r(void)
+{
+	return 0x1bc;
+}
+#define HOST1X_SYNC_IP_BUSY_TIMEOUT \
+	host1x_sync_ip_busy_timeout_r()
 static inline u32 host1x_sync_syncpt_0_r(void)
 {
 	return 0x400;
 }
 #define HOST1X_SYNC_SYNCPT_0 \
 	host1x_sync_syncpt_0_r()
+static inline u32 host1x_sync_syncpt_int_thresh_0_r(void)
+{
+	return 0x500;
+}
+#define HOST1X_SYNC_SYNCPT_INT_THRESH_0 \
+	host1x_sync_syncpt_int_thresh_0_r()
 static inline u32 host1x_sync_syncpt_base_0_r(void)
 {
 	return 0x600;
diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c
new file mode 100644
index 0000000..12488e2
--- /dev/null
+++ b/drivers/gpu/host1x/hw/intr_hw.c
@@ -0,0 +1,178 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (C) 2010 Google, Inc.
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/io.h>
+#include <asm/mach/irq.h>
+
+#include "intr.h"
+#include "dev.h"
+
+/* Spacing between sync registers */
+#define REGISTER_STRIDE 4
+
+static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt);
+
+static void syncpt_thresh_cascade_fn(struct work_struct *work)
+{
+	struct host1x_intr_syncpt *sp =
+		container_of(work, struct host1x_intr_syncpt, work);
+	host1x_syncpt_thresh_fn(sp);
+}
+
+static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
+{
+	struct host1x *host1x = dev_id;
+	struct host1x_intr *intr = &host1x->intr;
+	unsigned long reg;
+	int i, id;
+
+	for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
+		reg = host1x_sync_readl(host1x,
+				HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
+				i * REGISTER_STRIDE);
+		for_each_set_bit(id, &reg, BITS_PER_LONG) {
+			struct host1x_intr_syncpt *sp =
+				intr->syncpt + (i * BITS_PER_LONG + id);
+			host1x_intr_syncpt_thresh_isr(sp);
+			queue_work(intr->wq, &sp->work);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void host1x_intr_init_host_sync(struct host1x_intr *intr)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	int i, err;
+
+	host1x_sync_writel(host1x, 0xffffffffUL,
+		HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE);
+	host1x_sync_writel(host1x, 0xffffffffUL,
+		HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS);
+
+	for (i = 0; i < host1x->info.nb_pts; i++)
+		INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
+
+	err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
+				syncpt_thresh_cascade_isr,
+				IRQF_SHARED, "host1x_syncpt", host1x);
+	WARN_ON(IS_ERR_VALUE(err));
+
+	/* disable the ip_busy_timeout. this prevents write drops */
+	host1x_sync_writel(host1x, 0, HOST1X_SYNC_IP_BUSY_TIMEOUT);
+
+	/*
+	 * increase the auto-ack timout to the maximum value. 2d will hang
+	 * otherwise on Tegra2.
+	 */
+	host1x_sync_writel(host1x, 0xff, HOST1X_SYNC_CTXSW_TIMEOUT_CFG);
+}
+
+static void host1x_intr_set_host_clocks_per_usec(struct host1x_intr *intr,
+		u32 cpm)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	/* write microsecond clock register */
+	host1x_sync_writel(host1x, cpm, HOST1X_SYNC_USEC_CLK);
+}
+
+static void host1x_intr_set_syncpt_threshold(struct host1x_intr *intr,
+	u32 id, u32 thresh)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	host1x_sync_writel(host1x, thresh,
+		HOST1X_SYNC_SYNCPT_INT_THRESH_0 + id * REGISTER_STRIDE);
+}
+
+static void host1x_intr_enable_syncpt_intr(struct host1x_intr *intr, u32 id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+			HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 +
+			BIT_WORD(id) * REGISTER_STRIDE);
+}
+
+static void host1x_intr_disable_syncpt_intr(struct host1x_intr *intr, u32 id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+			HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE +
+			BIT_WORD(id) * REGISTER_STRIDE);
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
+		BIT_WORD(id) * REGISTER_STRIDE);
+}
+
+static void host1x_intr_disable_all_syncpt_intrs(struct host1x_intr *intr)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 reg;
+
+	for (reg = 0; reg <= BIT_WORD(host1x->info.nb_pts) * REGISTER_STRIDE;
+			reg += REGISTER_STRIDE) {
+		host1x_sync_writel(host1x, 0xffffffffu,
+				HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE +
+				reg);
+
+		host1x_sync_writel(host1x, 0xffffffffu,
+			HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS + reg);
+	}
+}
+
+/*
+ * Sync point threshold interrupt service function
+ * Handles sync point threshold triggers, in interrupt context
+ */
+static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt)
+{
+	unsigned int id = syncpt->id;
+	struct host1x_intr *intr = intr_syncpt_to_intr(syncpt);
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 reg = BIT_WORD(id) * REGISTER_STRIDE;
+
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE + reg);
+	host1x_sync_writel(host1x, BIT_MASK(id),
+		HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS + reg);
+}
+
+static int host1x_free_syncpt_irq(struct host1x_intr *intr)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	devm_free_irq(&host1x->dev->dev, intr->syncpt_irq, host1x);
+	flush_workqueue(intr->wq);
+	return 0;
+}
+
+static const struct host1x_intr_ops host1x_intr_ops = {
+	.init_host_sync = host1x_intr_init_host_sync,
+	.set_host_clocks_per_usec = host1x_intr_set_host_clocks_per_usec,
+	.set_syncpt_threshold = host1x_intr_set_syncpt_threshold,
+	.enable_syncpt_intr = host1x_intr_enable_syncpt_intr,
+	.disable_syncpt_intr = host1x_intr_disable_syncpt_intr,
+	.disable_all_syncpt_intrs = host1x_intr_disable_all_syncpt_intrs,
+	.free_syncpt_irq = host1x_free_syncpt_irq,
+};
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
new file mode 100644
index 0000000..26099b8
--- /dev/null
+++ b/drivers/gpu/host1x/intr.c
@@ -0,0 +1,356 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "intr.h"
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include "dev.h"
+
+/* Wait list management */
+
+struct host1x_waitlist {
+	struct list_head list;
+	struct kref refcount;
+	u32 thresh;
+	enum host1x_intr_action action;
+	atomic_t state;
+	void *data;
+	int count;
+};
+
+enum waitlist_state {
+	WLS_PENDING,
+	WLS_REMOVED,
+	WLS_CANCELLED,
+	WLS_HANDLED
+};
+
+static void waiter_release(struct kref *kref)
+{
+	kfree(container_of(kref, struct host1x_waitlist, refcount));
+}
+
+/*
+ * add a waiter to a waiter queue, sorted by threshold
+ * returns true if it was added at the head of the queue
+ */
+static bool add_waiter_to_queue(struct host1x_waitlist *waiter,
+				struct list_head *queue)
+{
+	struct host1x_waitlist *pos;
+	u32 thresh = waiter->thresh;
+
+	list_for_each_entry_reverse(pos, queue, list)
+		if ((s32)(pos->thresh - thresh) <= 0) {
+			list_add(&waiter->list, &pos->list);
+			return false;
+		}
+
+	list_add(&waiter->list, queue);
+	return true;
+}
+
+/*
+ * run through a waiter queue for a single sync point ID
+ * and gather all completed waiters into lists by actions
+ */
+static void remove_completed_waiters(struct list_head *head, u32 sync,
+			struct list_head completed[HOST1X_INTR_ACTION_COUNT])
+{
+	struct list_head *dest;
+	struct host1x_waitlist *waiter, *next;
+
+	list_for_each_entry_safe(waiter, next, head, list) {
+		if ((s32)(waiter->thresh - sync) > 0)
+			break;
+
+		dest = completed + waiter->action;
+
+		/* PENDING->REMOVED or CANCELLED->HANDLED */
+		if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) {
+			list_del(&waiter->list);
+			kref_put(&waiter->refcount, waiter_release);
+		} else {
+			list_move_tail(&waiter->list, dest);
+		}
+	}
+}
+
+static void reset_threshold_interrupt(struct host1x_intr *intr,
+			       struct list_head *head,
+			       unsigned int id)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 thresh = list_first_entry(head,
+				struct host1x_waitlist, list)->thresh;
+
+	host1x->intr_op.set_syncpt_threshold(intr, id, thresh);
+	host1x->intr_op.enable_syncpt_intr(intr, id);
+}
+
+static void action_wakeup(struct host1x_waitlist *waiter)
+{
+	wait_queue_head_t *wq = waiter->data;
+
+	wake_up(wq);
+}
+
+static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
+{
+	wait_queue_head_t *wq = waiter->data;
+
+	wake_up_interruptible(wq);
+}
+
+typedef void (*action_handler)(struct host1x_waitlist *waiter);
+
+static action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
+	action_wakeup,
+	action_wakeup_interruptible,
+};
+
+static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT])
+{
+	struct list_head *head = completed;
+	int i;
+
+	for (i = 0; i < HOST1X_INTR_ACTION_COUNT; ++i, ++head) {
+		action_handler handler = action_handlers[i];
+		struct host1x_waitlist *waiter, *next;
+
+		list_for_each_entry_safe(waiter, next, head, list) {
+			list_del(&waiter->list);
+			handler(waiter);
+			WARN_ON(atomic_xchg(&waiter->state, WLS_HANDLED)
+					!= WLS_REMOVED);
+			kref_put(&waiter->refcount, waiter_release);
+		}
+	}
+}
+
+/*
+ * Remove & handle all waiters that have completed for the given syncpt
+ */
+static int process_wait_list(struct host1x_intr *intr,
+			     struct host1x_intr_syncpt *syncpt,
+			     u32 threshold)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct list_head completed[HOST1X_INTR_ACTION_COUNT];
+	unsigned int i;
+	int empty;
+
+	for (i = 0; i < HOST1X_INTR_ACTION_COUNT; ++i)
+		INIT_LIST_HEAD(completed + i);
+
+	spin_lock(&syncpt->lock);
+
+	remove_completed_waiters(&syncpt->wait_head, threshold, completed);
+
+	empty = list_empty(&syncpt->wait_head);
+	if (empty)
+		host1x->intr_op.disable_syncpt_intr(intr, syncpt->id);
+	else
+		reset_threshold_interrupt(intr, &syncpt->wait_head,
+					  syncpt->id);
+
+	spin_unlock(&syncpt->lock);
+
+	run_handlers(completed);
+
+	return empty;
+}
+
+/*
+ * Sync point threshold interrupt service thread function
+ * Handles sync point threshold triggers, in thread context
+ */
+irqreturn_t host1x_syncpt_thresh_fn(void *dev_id)
+{
+	struct host1x_intr_syncpt *syncpt = dev_id;
+	unsigned int id = syncpt->id;
+	struct host1x_intr *intr = intr_syncpt_to_intr(syncpt);
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	(void)process_wait_list(intr, syncpt,
+				host1x_syncpt_load_min(host1x->syncpt + id));
+
+	return IRQ_HANDLED;
+}
+
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
+			enum host1x_intr_action action, void *data,
+			void *_waiter,
+			void **ref)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct host1x_waitlist *waiter = _waiter;
+	struct host1x_intr_syncpt *syncpt;
+	int queue_was_empty;
+
+	if (waiter == NULL) {
+		pr_warn("%s: NULL waiter\n", __func__);
+		return -EINVAL;
+	}
+
+	/* initialize a new waiter */
+	INIT_LIST_HEAD(&waiter->list);
+	kref_init(&waiter->refcount);
+	if (ref)
+		kref_get(&waiter->refcount);
+	waiter->thresh = thresh;
+	waiter->action = action;
+	atomic_set(&waiter->state, WLS_PENDING);
+	waiter->data = data;
+	waiter->count = 1;
+
+	syncpt = intr->syncpt + id;
+
+	spin_lock(&syncpt->lock);
+
+	queue_was_empty = list_empty(&syncpt->wait_head);
+
+	if (add_waiter_to_queue(waiter, &syncpt->wait_head)) {
+		/* added at head of list - new threshold value */
+		host1x->intr_op.set_syncpt_threshold(intr, id, thresh);
+
+		/* added as first waiter - enable interrupt */
+		if (queue_was_empty)
+			host1x->intr_op.enable_syncpt_intr(intr, id);
+	}
+
+	spin_unlock(&syncpt->lock);
+
+	if (ref)
+		*ref = waiter;
+	return 0;
+}
+
+void *host1x_intr_alloc_waiter(void)
+{
+	return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
+}
+
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
+{
+	struct host1x_waitlist *waiter = ref;
+	struct host1x_intr_syncpt *syncpt;
+	struct host1x *host1x = intr_to_host1x(intr);
+
+	while (atomic_cmpxchg(&waiter->state,
+				WLS_PENDING, WLS_CANCELLED) == WLS_REMOVED)
+		schedule();
+
+	syncpt = intr->syncpt + id;
+	(void)process_wait_list(intr, syncpt,
+				host1x_syncpt_load_min(host1x->syncpt + id));
+
+	kref_put(&waiter->refcount, waiter_release);
+}
+
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
+{
+	unsigned int id;
+	struct host1x *host1x = intr_to_host1x(intr);
+	u32 nb_pts = host1x_syncpt_nb_pts(host1x);
+
+	intr->syncpt = devm_kzalloc(&host1x->dev->dev,
+			sizeof(struct host1x_intr_syncpt) *
+			host1x->info.nb_pts,
+			GFP_KERNEL);
+
+	if (!host1x->intr.syncpt)
+		return -ENOMEM;
+
+	mutex_init(&intr->mutex);
+	intr->syncpt_irq = irq_sync;
+	intr->wq = create_workqueue("host_syncpt");
+	if (!intr->wq)
+		return -ENOMEM;
+
+	for (id = 0; id < nb_pts; ++id) {
+		struct host1x_intr_syncpt *syncpt = &intr->syncpt[id];
+
+		syncpt->intr = &host1x->intr;
+		syncpt->id = id;
+		spin_lock_init(&syncpt->lock);
+		INIT_LIST_HEAD(&syncpt->wait_head);
+		snprintf(syncpt->thresh_irq_name,
+			sizeof(syncpt->thresh_irq_name),
+			"host1x_sp_%02d", id);
+	}
+
+	return 0;
+}
+
+void host1x_intr_deinit(struct host1x_intr *intr)
+{
+	host1x_intr_stop(intr);
+	destroy_workqueue(intr->wq);
+}
+
+void host1x_intr_start(struct host1x_intr *intr, u32 hz)
+{
+	struct host1x *host1x = intr_to_host1x(intr);
+	mutex_lock(&intr->mutex);
+
+	host1x->intr_op.init_host_sync(intr);
+	host1x->intr_op.set_host_clocks_per_usec(intr,
+			DIV_ROUND_UP(hz, 1000000));
+
+	mutex_unlock(&intr->mutex);
+}
+
+void host1x_intr_stop(struct host1x_intr *intr)
+{
+	unsigned int id;
+	struct host1x *host1x = intr_to_host1x(intr);
+	struct host1x_intr_syncpt *syncpt;
+	u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
+
+	mutex_lock(&intr->mutex);
+
+	host1x->intr_op.disable_all_syncpt_intrs(intr);
+
+	for (id = 0, syncpt = intr->syncpt;
+	     id < nb_pts;
+	     ++id, ++syncpt) {
+		struct host1x_waitlist *waiter, *next;
+		list_for_each_entry_safe(waiter, next,
+				&syncpt->wait_head, list) {
+			if (atomic_cmpxchg(&waiter->state,
+						WLS_CANCELLED, WLS_HANDLED)
+				== WLS_CANCELLED) {
+				list_del(&waiter->list);
+				kref_put(&waiter->refcount, waiter_release);
+			}
+		}
+
+		if (!list_empty(&syncpt->wait_head)) {  /* output diagnostics */
+			mutex_unlock(&intr->mutex);
+			pr_warn("%s cannot stop syncpt intr id=%d\n",
+					__func__, id);
+			return;
+		}
+	}
+
+	host1x->intr_op.free_syncpt_irq(intr);
+
+	mutex_unlock(&intr->mutex);
+}
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
new file mode 100644
index 0000000..679a7b4
--- /dev/null
+++ b/drivers/gpu/host1x/intr.h
@@ -0,0 +1,103 @@
+/*
+ * Tegra host1x Interrupt Management
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_INTR_H
+#define __HOST1X_INTR_H
+
+#include <linux/interrupt.h>
+#include <linux/workqueue.h>
+
+enum host1x_intr_action {
+	/*
+	 * Wake up a  task.
+	 * 'data' points to a wait_queue_head_t
+	 */
+	HOST1X_INTR_ACTION_WAKEUP,
+
+	/*
+	 * Wake up a interruptible task.
+	 * 'data' points to a wait_queue_head_t
+	 */
+	HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
+
+	HOST1X_INTR_ACTION_COUNT
+};
+
+struct host1x_intr;
+
+struct host1x_intr_syncpt {
+	struct host1x_intr *intr;
+	u8 id;
+	spinlock_t lock;
+	struct list_head wait_head;
+	char thresh_irq_name[12];
+	struct work_struct work;
+};
+
+struct host1x_intr {
+	struct host1x_intr_syncpt *syncpt;
+	struct mutex mutex;
+	int syncpt_irq;
+	struct workqueue_struct *wq;
+};
+#define intr_to_host1x(x) container_of(x, struct host1x, intr)
+#define intr_syncpt_to_intr(is) (is->intr)
+
+/*
+ * Schedule an action to be taken when a sync point reaches the given threshold.
+ *
+ * @id the sync point
+ * @thresh the threshold
+ * @action the action to take
+ * @data a pointer to extra data depending on action, see above
+ * @waiter waiter allocated with host1x_intr_alloc_waiter - assumes ownership
+ * @ref must be passed if cancellation is possible, else NULL
+ *
+ * This is a non-blocking api.
+ */
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
+			enum host1x_intr_action action, void *data,
+			void *waiter,
+			void **ref);
+
+/*
+ * Allocate a waiter.
+ */
+void *host1x_intr_alloc_waiter(void);
+
+/*
+ * Unreference an action submitted to host1x_intr_add_action().
+ * You must call this if you passed non-NULL as ref.
+ * @ref the ref returned from host1x_intr_add_action()
+ */
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref);
+
+/* Initialize host1x sync point interrupt */
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync);
+
+/* Deinitialize host1x sync point interrupt */
+void host1x_intr_deinit(struct host1x_intr *intr);
+
+/* Enable host1x sync point interrupt */
+void host1x_intr_start(struct host1x_intr *intr, u32 hz);
+
+/* Disable host1x sync point interrupt */
+void host1x_intr_stop(struct host1x_intr *intr);
+
+irqreturn_t host1x_syncpt_thresh_fn(void *dev_id);
+#endif
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index b45651f..32e2b42 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -22,9 +22,12 @@
 #include <linux/module.h>
 #include "syncpt.h"
 #include "dev.h"
+#include "intr.h"
 #include <trace/events/host1x.h>
 
 #define MAX_SYNCPT_LENGTH	5
+#define SYNCPT_CHECK_PERIOD (2 * HZ)
+#define MAX_STUCK_CHECK_COUNT 15
 
 static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
 		struct platform_device *pdev,
@@ -119,6 +122,166 @@ void host1x_syncpt_incr(struct host1x_syncpt *sp)
 	host1x_syncpt_cpu_incr(sp);
 }
 
+/*
+ * Updated sync point form hardware, and returns true if syncpoint is expired,
+ * false if we may need to wait
+ */
+static bool syncpt_load_min_is_expired(
+	struct host1x_syncpt *sp,
+	u32 thresh)
+{
+	sp->dev->syncpt_op.load_min(sp);
+	return host1x_syncpt_is_expired(sp, thresh);
+}
+
+/*
+ * Main entrypoint for syncpoint value waits.
+ */
+int host1x_syncpt_wait(struct host1x_syncpt *sp,
+			u32 thresh, long timeout, u32 *value)
+{
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+	void *ref;
+	void *waiter;
+	int err = 0, check_count = 0;
+	u32 val;
+
+	if (value)
+		*value = 0;
+
+	/* first check cache */
+	if (host1x_syncpt_is_expired(sp, thresh)) {
+		if (value)
+			*value = host1x_syncpt_read_min(sp);
+		return 0;
+	}
+
+	/* try to read from register */
+	val = sp->dev->syncpt_op.load_min(sp);
+	if (host1x_syncpt_is_expired(sp, thresh)) {
+		if (value)
+			*value = val;
+		goto done;
+	}
+
+	if (!timeout) {
+		err = -EAGAIN;
+		goto done;
+	}
+
+	/* schedule a wakeup when the syncpoint value is reached */
+	waiter = host1x_intr_alloc_waiter();
+	if (!waiter) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	err = host1x_intr_add_action(&(sp->dev->intr), sp->id, thresh,
+				HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE, &wq,
+				waiter,
+				&ref);
+	if (err)
+		goto done;
+
+	err = -EAGAIN;
+	/* Caller-specified timeout may be impractically low */
+	if (timeout < 0)
+		timeout = LONG_MAX;
+
+	/* wait for the syncpoint, or timeout, or signal */
+	while (timeout) {
+		long check = min_t(long, SYNCPT_CHECK_PERIOD, timeout);
+		int remain = wait_event_interruptible_timeout(wq,
+				syncpt_load_min_is_expired(sp, thresh),
+				check);
+		if (remain > 0 || host1x_syncpt_is_expired(sp, thresh)) {
+			if (value)
+				*value = host1x_syncpt_read_min(sp);
+			err = 0;
+			break;
+		}
+		if (remain < 0) {
+			err = remain;
+			break;
+		}
+		timeout -= check;
+		if (timeout && check_count <= MAX_STUCK_CHECK_COUNT) {
+			dev_warn(&sp->dev->dev->dev,
+				"%s: syncpoint id %d (%s) stuck waiting %d, timeout=%ld\n",
+				 current->comm, sp->id, sp->name,
+				 thresh, timeout);
+			sp->dev->syncpt_op.debug(sp);
+			check_count++;
+		}
+	}
+	host1x_intr_put_ref(&(sp->dev->intr), sp->id, ref);
+
+done:
+	return err;
+}
+EXPORT_SYMBOL(host1x_syncpt_wait);
+
+/*
+ * Returns true if syncpoint is expired, false if we may need to wait
+ */
+bool host1x_syncpt_is_expired(
+	struct host1x_syncpt *sp,
+	u32 thresh)
+{
+	u32 current_val;
+	u32 future_val;
+	smp_rmb();
+	current_val = (u32)atomic_read(&sp->min_val);
+	future_val = (u32)atomic_read(&sp->max_val);
+
+	/* Note the use of unsigned arithmetic here (mod 1<<32).
+	 *
+	 * c = current_val = min_val	= the current value of the syncpoint.
+	 * t = thresh			= the value we are checking
+	 * f = future_val  = max_val	= the value c will reach when all
+	 *				  outstanding increments have completed.
+	 *
+	 * Note that c always chases f until it reaches f.
+	 *
+	 * Dtf = (f - t)
+	 * Dtc = (c - t)
+	 *
+	 *  Consider all cases:
+	 *
+	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
+	 *	B) .....c.....f..t..	Dtf > Dtc	expired
+	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
+	 *
+	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
+	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
+	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
+	 *							Dtc!=0)
+	 *
+	 *  Other cases:
+	 *
+	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
+	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
+	 *	A) .....f..t..c.....	Dtf > Dtc	expired
+	 *
+	 *   So:
+	 *	   Dtf >= Dtc implies EXPIRED	(return true)
+	 *	   Dtf <  Dtc implies WAIT	(return false)
+	 *
+	 * Note: If t is expired then we *cannot* wait on it. We would wait
+	 * forever (hang the system).
+	 *
+	 * Note: do NOT get clever and remove the -thresh from both sides. It
+	 * is NOT the same.
+	 *
+	 * If future valueis zero, we have a client managed sync point. In that
+	 * case we do a direct comparison.
+	 */
+	if (!host1x_syncpt_client_managed(sp))
+		return future_val - thresh >= current_val - thresh;
+	else
+		return (s32)(current_val - thresh) >= 0;
+}
+
 void host1x_syncpt_debug(struct host1x_syncpt *sp)
 {
 	sp->dev->syncpt_op.debug(sp);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index d9b9b0a..b46d044 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -114,6 +114,7 @@ void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
 
 /* Load current value from hardware to the shadow register. */
 u32 host1x_syncpt_load_min(struct host1x_syncpt *sp);
+bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh);
 
 /* Save host1x sync point state into shadow registers. */
 void host1x_syncpt_save(struct host1x *dev);
@@ -133,6 +134,10 @@ u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs);
 /* Do a debug dump of sync point values. */
 void host1x_syncpt_debug(struct host1x_syncpt *sp);
 
+/* Wait until sync point reaches a threshold value, or a timeout. */
+int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh,
+			long timeout, u32 *value);
+
 /* Check if sync point id is valid. */
 static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
 {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
  2013-01-15 11:43 ` [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver Terje Bergstrom
  2013-01-15 11:43 ` [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts Terje Bergstrom
@ 2013-01-15 11:43 ` Terje Bergstrom
  2013-02-25 15:24   ` Thierry Reding
  2013-01-15 11:44 ` [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support Terje Bergstrom
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:43 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

Add support for host1x client modules, and host1x channels to submit
work to the clients. The work is submitted in GEM CMA buffers, so
this patch adds support for them.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Kconfig                  |   25 +-
 drivers/gpu/host1x/Makefile                 |    5 +
 drivers/gpu/host1x/cdma.c                   |  439 +++++++++++++++++++
 drivers/gpu/host1x/cdma.h                   |  107 +++++
 drivers/gpu/host1x/channel.c                |  140 ++++++
 drivers/gpu/host1x/channel.h                |   58 +++
 drivers/gpu/host1x/cma.c                    |  116 +++++
 drivers/gpu/host1x/cma.h                    |   43 ++
 drivers/gpu/host1x/dev.c                    |   13 +
 drivers/gpu/host1x/dev.h                    |   59 +++
 drivers/gpu/host1x/host1x.h                 |   29 ++
 drivers/gpu/host1x/hw/cdma_hw.c             |  475 +++++++++++++++++++++
 drivers/gpu/host1x/hw/cdma_hw.h             |   37 ++
 drivers/gpu/host1x/hw/channel_hw.c          |  148 +++++++
 drivers/gpu/host1x/hw/host1x01.c            |    6 +
 drivers/gpu/host1x/hw/host1x01_hardware.h   |  124 ++++++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |  102 +++++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h    |   12 +
 drivers/gpu/host1x/hw/hw_host1x01_uclass.h  |  168 ++++++++
 drivers/gpu/host1x/hw/syncpt_hw.c           |   10 +
 drivers/gpu/host1x/intr.c                   |   29 +-
 drivers/gpu/host1x/intr.h                   |    6 +
 drivers/gpu/host1x/job.c                    |  612 +++++++++++++++++++++++++++
 drivers/gpu/host1x/job.h                    |  164 +++++++
 drivers/gpu/host1x/memmgr.c                 |  173 ++++++++
 drivers/gpu/host1x/memmgr.h                 |   72 ++++
 drivers/gpu/host1x/syncpt.c                 |   11 +
 drivers/gpu/host1x/syncpt.h                 |    4 +
 include/trace/events/host1x.h               |  211 +++++++++
 29 files changed, 3396 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/host1x/cdma.c
 create mode 100644 drivers/gpu/host1x/cdma.h
 create mode 100644 drivers/gpu/host1x/channel.c
 create mode 100644 drivers/gpu/host1x/channel.h
 create mode 100644 drivers/gpu/host1x/cma.c
 create mode 100644 drivers/gpu/host1x/cma.h
 create mode 100644 drivers/gpu/host1x/host1x.h
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.c
 create mode 100644 drivers/gpu/host1x/hw/cdma_hw.h
 create mode 100644 drivers/gpu/host1x/hw/channel_hw.c
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_uclass.h
 create mode 100644 drivers/gpu/host1x/job.c
 create mode 100644 drivers/gpu/host1x/job.h
 create mode 100644 drivers/gpu/host1x/memmgr.c
 create mode 100644 drivers/gpu/host1x/memmgr.h

diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
index e89fb2b..57680a6 100644
--- a/drivers/gpu/host1x/Kconfig
+++ b/drivers/gpu/host1x/Kconfig
@@ -3,4 +3,27 @@ config TEGRA_HOST1X
 	help
 	  Driver for the Tegra host1x hardware.
 
-	  Required for enabling tegradrm.
+	  Required for enabling tegradrm and 2D acceleration.
+
+if TEGRA_HOST1X
+
+config TEGRA_HOST1X_CMA
+	bool "Support DRM CMA buffers"
+	depends on DRM
+	default y
+	select DRM_GEM_CMA_HELPER
+	select DRM_KMS_CMA_HELPER
+	help
+	  Say yes if you wish to use DRM CMA buffers.
+
+	  If unsure, choose Y.
+
+config TEGRA_HOST1X_FIREWALL
+	bool "Enable HOST1X security firewall"
+	default y
+	help
+	  Say yes if kernel should protect command streams from tampering.
+
+	  If unsure, choose Y.
+
+endif
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 5ef47ff..cdd87c8 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -4,6 +4,11 @@ host1x-y = \
 	syncpt.o \
 	dev.o \
 	intr.o \
+	cdma.o \
+	channel.o \
+	job.o \
+	memmgr.o \
 	hw/host1x01.o
 
+host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
new file mode 100644
index 0000000..d6a38d2
--- /dev/null
+++ b/drivers/gpu/host1x/cdma.c
@@ -0,0 +1,439 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "cdma.h"
+#include "channel.h"
+#include "dev.h"
+#include "memmgr.h"
+#include "job.h"
+#include <asm/cacheflush.h>
+
+#include <linux/slab.h>
+#include <linux/kfifo.h>
+#include <linux/interrupt.h>
+#include <trace/events/host1x.h>
+
+#define TRACE_MAX_LENGTH 128U
+
+/*
+ * Add an entry to the sync queue.
+ */
+static void add_to_sync_queue(struct host1x_cdma *cdma,
+			      struct host1x_job *job,
+			      u32 nr_slots,
+			      u32 first_get)
+{
+	if (job->syncpt_id == NVSYNCPT_INVALID) {
+		dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n",
+				__func__);
+		return;
+	}
+
+	job->first_get = first_get;
+	job->num_slots = nr_slots;
+	host1x_job_get(job);
+	list_add_tail(&job->list, &cdma->sync_queue);
+}
+
+/*
+ * Return the status of the cdma's sync queue or push buffer for the given event
+ *  - sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-)
+ *  - pb space: returns the number of free slots in the channel's push buffer
+ * Must be called with the cdma lock held.
+ */
+static unsigned int cdma_status_locked(struct host1x_cdma *cdma,
+		enum cdma_event event)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	switch (event) {
+	case CDMA_EVENT_SYNC_QUEUE_EMPTY:
+		return list_empty(&cdma->sync_queue) ? 1 : 0;
+	case CDMA_EVENT_PUSH_BUFFER_SPACE: {
+		struct push_buffer *pb = &cdma->push_buffer;
+		return host1x->cdma_pb_op.space(pb);
+	}
+	default:
+		return 0;
+	}
+}
+
+/*
+ * Sleep (if necessary) until the requested event happens
+ *   - CDMA_EVENT_SYNC_QUEUE_EMPTY : sync queue is completely empty.
+ *     - Returns 1
+ *   - CDMA_EVENT_PUSH_BUFFER_SPACE : there is space in the push buffer
+ *     - Return the amount of space (> 0)
+ * Must be called with the cdma lock held.
+ */
+unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma,
+		enum cdma_event event)
+{
+	for (;;) {
+		unsigned int space = cdma_status_locked(cdma, event);
+		if (space)
+			return space;
+
+		trace_host1x_wait_cdma(cdma_to_channel(cdma)->dev->name,
+				event);
+
+		/* If somebody has managed to already start waiting, yield */
+		if (cdma->event != CDMA_EVENT_NONE) {
+			mutex_unlock(&cdma->lock);
+			schedule();
+			mutex_lock(&cdma->lock);
+			continue;
+		}
+		cdma->event = event;
+
+		mutex_unlock(&cdma->lock);
+		down(&cdma->sem);
+		mutex_lock(&cdma->lock);
+	}
+	return 0;
+}
+
+/*
+ * Start timer for a buffer submition that has completed yet.
+ * Must be called with the cdma lock held.
+ */
+static void cdma_start_timer_locked(struct host1x_cdma *cdma,
+		struct host1x_job *job)
+{
+	struct host1x *host = cdma_to_host1x(cdma);
+
+	if (cdma->timeout.clientid) {
+		/* timer already started */
+		return;
+	}
+
+	cdma->timeout.clientid = job->clientid;
+	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
+	cdma->timeout.syncpt_val = job->syncpt_end;
+	cdma->timeout.start_ktime = ktime_get();
+
+	schedule_delayed_work(&cdma->timeout.wq,
+			msecs_to_jiffies(job->timeout));
+}
+
+/*
+ * Stop timer when a buffer submition completes.
+ * Must be called with the cdma lock held.
+ */
+static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
+{
+	cancel_delayed_work(&cdma->timeout.wq);
+	cdma->timeout.clientid = 0;
+}
+
+/*
+ * For all sync queue entries that have already finished according to the
+ * current sync point registers:
+ *  - unpin & unref their mems
+ *  - pop their push buffer slots
+ *  - remove them from the sync queue
+ * This is normally called from the host code's worker thread, but can be
+ * called manually if necessary.
+ * Must be called with the cdma lock held.
+ */
+static void update_cdma_locked(struct host1x_cdma *cdma)
+{
+	bool signal = false;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_job *job, *n;
+
+	/* If CDMA is stopped, queue is cleared and we can return */
+	if (!cdma->running)
+		return;
+
+	/*
+	 * Walk the sync queue, reading the sync point registers as necessary,
+	 * to consume as many sync queue entries as possible without blocking
+	 */
+	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
+		struct host1x_syncpt *sp = host1x->syncpt + job->syncpt_id;
+
+		/* Check whether this syncpt has completed, and bail if not */
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+			/* Start timer on next pending syncpt */
+			if (job->timeout)
+				cdma_start_timer_locked(cdma, job);
+			break;
+		}
+
+		/* Cancel timeout, when a buffer completes */
+		if (cdma->timeout.clientid)
+			stop_cdma_timer_locked(cdma);
+
+		/* Unpin the memory */
+		host1x_job_unpin(job);
+
+		/* Pop push buffer slots */
+		if (job->num_slots) {
+			struct push_buffer *pb = &cdma->push_buffer;
+			host1x->cdma_pb_op.pop_from(pb, job->num_slots);
+			if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
+				signal = true;
+		}
+
+		list_del(&job->list);
+		host1x_job_put(job);
+	}
+
+	if (list_empty(&cdma->sync_queue) &&
+				cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
+			signal = true;
+
+	/* Wake up CdmaWait() if the requested event happened */
+	if (signal) {
+		cdma->event = CDMA_EVENT_NONE;
+		up(&cdma->sem);
+	}
+}
+
+void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
+		struct platform_device *dev)
+{
+	u32 get_restart;
+	u32 syncpt_incrs;
+	struct host1x_job *job = NULL;
+	u32 syncpt_val;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	syncpt_val = host1x_syncpt_load_min(cdma->timeout.syncpt);
+
+	dev_dbg(&dev->dev,
+		"%s: starting cleanup (thresh %d)\n",
+		__func__, syncpt_val);
+
+	/*
+	 * Move the sync_queue read pointer to the first entry that hasn't
+	 * completed based on the current HW syncpt value. It's likely there
+	 * won't be any (i.e. we're still at the head), but covers the case
+	 * where a syncpt incr happens just prior/during the teardown.
+	 */
+
+	dev_dbg(&dev->dev,
+		"%s: skip completed buffers still in sync_queue\n",
+		__func__);
+
+	list_for_each_entry(job, &cdma->sync_queue, list) {
+		if (syncpt_val < job->syncpt_end)
+			break;
+
+		host1x_job_dump(&dev->dev, job);
+	}
+
+	/*
+	 * Walk the sync_queue, first incrementing with the CPU syncpts that
+	 * are partially executed (the first buffer) or fully skipped while
+	 * still in the current context (slots are also NOP-ed).
+	 *
+	 * At the point contexts are interleaved, syncpt increments must be
+	 * done inline with the pushbuffer from a GATHER buffer to maintain
+	 * the order (slots are modified to be a GATHER of syncpt incrs).
+	 *
+	 * Note: save in get_restart the location where the timed out buffer
+	 * started in the PB, so we can start the refetch from there (with the
+	 * modified NOP-ed PB slots). This lets things appear to have completed
+	 * properly for this buffer and resources are freed.
+	 */
+
+	dev_dbg(&dev->dev,
+		"%s: perform CPU incr on pending same ctx buffers\n",
+		__func__);
+
+	get_restart = cdma->last_put;
+	if (!list_empty(&cdma->sync_queue))
+		get_restart = job->first_get;
+
+	/* do CPU increments as long as this context continues */
+	list_for_each_entry_from(job, &cdma->sync_queue, list) {
+		/* different context, gets us out of this loop */
+		if (job->clientid != cdma->timeout.clientid)
+			break;
+
+		/* won't need a timeout when replayed */
+		job->timeout = 0;
+
+		syncpt_incrs = job->syncpt_end - syncpt_val;
+		dev_dbg(&dev->dev,
+			"%s: CPU incr (%d)\n", __func__, syncpt_incrs);
+
+		host1x_job_dump(&dev->dev, job);
+
+		/* safe to use CPU to incr syncpts */
+		host1x->cdma_op.timeout_cpu_incr(cdma,
+				job->first_get,
+				syncpt_incrs,
+				job->syncpt_end,
+				job->num_slots);
+
+		syncpt_val += syncpt_incrs;
+	}
+
+	list_for_each_entry_from(job, &cdma->sync_queue, list)
+		if (job->clientid == cdma->timeout.clientid)
+			job->timeout = 500;
+
+	dev_dbg(&dev->dev,
+		"%s: finished sync_queue modification\n", __func__);
+
+	/* roll back DMAGET and start up channel again */
+	host1x->cdma_op.timeout_teardown_end(cdma, get_restart);
+}
+
+/*
+ * Create a cdma
+ */
+int host1x_cdma_init(struct host1x_cdma *cdma)
+{
+	int err;
+	struct push_buffer *pb = &cdma->push_buffer;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	mutex_init(&cdma->lock);
+	sema_init(&cdma->sem, 0);
+
+	INIT_LIST_HEAD(&cdma->sync_queue);
+
+	cdma->event = CDMA_EVENT_NONE;
+	cdma->running = false;
+	cdma->torndown = false;
+
+	err = host1x->cdma_pb_op.init(pb);
+	if (err)
+		return err;
+	return 0;
+}
+
+/*
+ * Destroy a cdma
+ */
+void host1x_cdma_deinit(struct host1x_cdma *cdma)
+{
+	struct push_buffer *pb = &cdma->push_buffer;
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (cdma->running) {
+		pr_warn("%s: CDMA still running\n",
+				__func__);
+	} else {
+		host1x->cdma_pb_op.destroy(pb);
+		host1x->cdma_op.timeout_destroy(cdma);
+	}
+}
+
+/*
+ * Begin a cdma submit
+ */
+int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	mutex_lock(&cdma->lock);
+
+	if (job->timeout) {
+		/* init state on first submit with timeout value */
+		if (!cdma->timeout.initialized) {
+			int err;
+			err = host1x->cdma_op.timeout_init(cdma,
+					job->syncpt_id);
+			if (err) {
+				mutex_unlock(&cdma->lock);
+				return err;
+			}
+		}
+	}
+	if (!cdma->running)
+		host1x->cdma_op.start(cdma);
+
+	cdma->slots_free = 0;
+	cdma->slots_used = 0;
+	cdma->first_get = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	trace_host1x_cdma_begin(job->ch->dev->name);
+	return 0;
+}
+
+/*
+ * Push two words into a push buffer slot
+ * Blocks as necessary if the push buffer is full.
+ */
+void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
+{
+	host1x_cdma_push_gather(cdma, NULL, 0, op1, op2);
+}
+
+/*
+ * Push two words into a push buffer slot
+ * Blocks as necessary if the push buffer is full.
+ */
+void host1x_cdma_push_gather(struct host1x_cdma *cdma,
+		struct mem_handle *handle,
+		u32 offset, u32 op1, u32 op2)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	u32 slots_free = cdma->slots_free;
+	struct push_buffer *pb = &cdma->push_buffer;
+
+	if (slots_free == 0) {
+		host1x->cdma_op.kick(cdma);
+		slots_free = host1x_cdma_wait_locked(cdma,
+				CDMA_EVENT_PUSH_BUFFER_SPACE);
+	}
+	cdma->slots_free = slots_free - 1;
+	cdma->slots_used++;
+	host1x->cdma_pb_op.push_to(pb, handle, op1, op2);
+}
+
+/*
+ * End a cdma submit
+ * Kick off DMA, add job to the sync queue, and a number of slots to be freed
+ * from the pushbuffer. The handles for a submit must all be pinned at the same
+ * time, but they can be unpinned in smaller chunks.
+ */
+void host1x_cdma_end(struct host1x_cdma *cdma,
+		struct host1x_job *job)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	bool was_idle = list_empty(&cdma->sync_queue);
+
+	host1x->cdma_op.kick(cdma);
+
+	add_to_sync_queue(cdma,
+			job,
+			cdma->slots_used,
+			cdma->first_get);
+
+	/* start timer on idle -> active transitions */
+	if (job->timeout && was_idle)
+		cdma_start_timer_locked(cdma, job);
+
+	trace_host1x_cdma_end(job->ch->dev->name);
+	mutex_unlock(&cdma->lock);
+}
+
+/*
+ * Update cdma state according to current sync point values
+ */
+void host1x_cdma_update(struct host1x_cdma *cdma)
+{
+	mutex_lock(&cdma->lock);
+	update_cdma_locked(cdma);
+	mutex_unlock(&cdma->lock);
+}
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
new file mode 100644
index 0000000..d9cabef
--- /dev/null
+++ b/drivers/gpu/host1x/cdma.h
@@ -0,0 +1,107 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_CDMA_H
+#define __HOST1X_CDMA_H
+
+#include <linux/sched.h>
+#include <linux/semaphore.h>
+
+#include <linux/list.h>
+
+struct host1x_syncpt;
+struct host1x_userctx_timeout;
+struct host1x_job;
+struct mem_handle;
+struct platform_device;
+
+/*
+ * cdma
+ *
+ * This is in charge of a host command DMA channel.
+ * Sends ops to a push buffer, and takes responsibility for unpinning
+ * (& possibly freeing) of memory after those ops have completed.
+ * Producer:
+ *	begin
+ *		push - send ops to the push buffer
+ *	end - start command DMA and enqueue handles to be unpinned
+ * Consumer:
+ *	update - call to update sync queue and push buffer, unpin memory
+ */
+
+struct push_buffer {
+	u32 *mapped;			/* mapped pushbuffer memory */
+	dma_addr_t phys;		/* physical address of pushbuffer */
+	u32 fence;			/* index we've written */
+	u32 cur;			/* index to write to */
+	struct mem_handle **handle;	/* handle for each opcode pair */
+};
+
+struct buffer_timeout {
+	struct delayed_work wq;		/* work queue */
+	bool initialized;		/* timer one-time setup flag */
+	struct host1x_syncpt *syncpt;	/* buffer completion syncpt */
+	u32 syncpt_val;			/* syncpt value when completed */
+	ktime_t start_ktime;		/* starting time */
+	/* context timeout information */
+	int clientid;
+};
+
+enum cdma_event {
+	CDMA_EVENT_NONE,		/* not waiting for any event */
+	CDMA_EVENT_SYNC_QUEUE_EMPTY,	/* wait for empty sync queue */
+	CDMA_EVENT_PUSH_BUFFER_SPACE	/* wait for space in push buffer */
+};
+
+struct host1x_cdma {
+	struct mutex lock;		/* controls access to shared state */
+	struct semaphore sem;		/* signalled when event occurs */
+	enum cdma_event event;		/* event that sem is waiting for */
+	unsigned int slots_used;	/* pb slots used in current submit */
+	unsigned int slots_free;	/* pb slots free in current submit */
+	unsigned int first_get;		/* DMAGET value, where submit begins */
+	unsigned int last_put;		/* last value written to DMAPUT */
+	struct push_buffer push_buffer;	/* channel's push buffer */
+	struct list_head sync_queue;	/* job queue */
+	struct buffer_timeout timeout;	/* channel's timeout state/wq */
+	bool running;
+	bool torndown;
+};
+
+#define cdma_to_channel(cdma) container_of(cdma, struct host1x_channel, cdma)
+#define cdma_to_host1x(cdma) host1x_get_host(cdma_to_channel(cdma)->dev)
+#define cdma_to_memmgr(cdma) ((cdma_to_host1x(cdma))->memmgr)
+#define pb_to_cdma(pb) container_of(pb, struct host1x_cdma, push_buffer)
+
+int	host1x_cdma_init(struct host1x_cdma *cdma);
+void	host1x_cdma_deinit(struct host1x_cdma *cdma);
+void	host1x_cdma_stop(struct host1x_cdma *cdma);
+int	host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job);
+void	host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2);
+void	host1x_cdma_push_gather(struct host1x_cdma *cdma,
+		struct mem_handle *handle, u32 offset, u32 op1, u32 op2);
+void	host1x_cdma_end(struct host1x_cdma *cdma,
+		struct host1x_job *job);
+void	host1x_cdma_update(struct host1x_cdma *cdma);
+void	host1x_cdma_peek(struct host1x_cdma *cdma,
+		u32 dmaget, int slot, u32 *out);
+unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma,
+		enum cdma_event event);
+void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
+		struct platform_device *dev);
+#endif
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
new file mode 100644
index 0000000..ff647ac
--- /dev/null
+++ b/drivers/gpu/host1x/channel.c
@@ -0,0 +1,140 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "channel.h"
+#include "dev.h"
+#include "job.h"
+
+#include <linux/slab.h>
+#include <linux/module.h>
+
+/* Constructor for the host1x device list */
+void host1x_channel_list_init(struct host1x *host1x)
+{
+	INIT_LIST_HEAD(&host1x->chlist.list);
+	mutex_init(&host1x->chlist_mutex);
+}
+
+/*
+ * Iterator function for host1x device list
+ * It takes a fptr as an argument and calls that function for each
+ * device in the list
+ */
+void host1x_channel_for_all(struct host1x *host1x, void *data,
+	int (*fptr)(struct host1x_channel *ch, void *fdata))
+{
+	struct host1x_channel *ch;
+	int ret;
+
+	list_for_each_entry(ch, &host1x->chlist.list, list) {
+		if (ch && fptr) {
+			ret = fptr(ch, data);
+			if (ret) {
+				pr_info("%s: iterator error\n", __func__);
+				break;
+			}
+		}
+	}
+}
+
+
+int host1x_channel_submit(struct host1x_job *job)
+{
+	return host1x_get_host(job->ch->dev)->channel_op.submit(job);
+}
+
+struct host1x_channel *host1x_channel_get(struct host1x_channel *ch)
+{
+	int err = 0;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount == 0)
+		err = host1x_cdma_init(&ch->cdma);
+	if (!err)
+		ch->refcount++;
+
+	mutex_unlock(&ch->reflock);
+
+	return err ? NULL : ch;
+}
+
+void host1x_channel_put(struct host1x_channel *ch)
+{
+	mutex_lock(&ch->reflock);
+	if (ch->refcount == 1) {
+		host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
+		host1x_cdma_deinit(&ch->cdma);
+	}
+	ch->refcount--;
+	mutex_unlock(&ch->reflock);
+}
+
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
+{
+	struct host1x_channel *ch = NULL;
+	struct host1x *host1x = host1x_get_host(pdev);
+	int chindex;
+	int max_channels = host1x->info.nb_channels;
+	int err;
+
+	mutex_lock(&host1x->chlist_mutex);
+
+	chindex = host1x->allocated_channels;
+	if (chindex > max_channels)
+		goto fail;
+
+	ch = kzalloc(sizeof(*ch), GFP_KERNEL);
+	if (ch == NULL)
+		goto fail;
+
+	/* Link platform_device to host1x_channel */
+	err = host1x->channel_op.init(ch, host1x, chindex);
+	if (err < 0)
+		goto fail;
+
+	ch->dev = pdev;
+
+	/* Add to channel list */
+	list_add_tail(&ch->list, &host1x->chlist.list);
+
+	host1x->allocated_channels++;
+
+	mutex_unlock(&host1x->chlist_mutex);
+	return ch;
+
+fail:
+	dev_err(&pdev->dev, "failed to init channel\n");
+	kfree(ch);
+	mutex_unlock(&host1x->chlist_mutex);
+	return NULL;
+}
+
+void host1x_channel_free(struct host1x_channel *ch)
+{
+	struct host1x *host1x = host1x_get_host(ch->dev);
+	struct host1x_channel *chiter, *tmp;
+	list_for_each_entry_safe(chiter, tmp, &host1x->chlist.list, list) {
+		if (chiter == ch) {
+			list_del(&chiter->list);
+			kfree(ch);
+			host1x->allocated_channels--;
+
+			return;
+		}
+	}
+}
diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
new file mode 100644
index 0000000..41eb01e
--- /dev/null
+++ b/drivers/gpu/host1x/channel.h
@@ -0,0 +1,58 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_CHANNEL_H
+#define __HOST1X_CHANNEL_H
+
+#include <linux/cdev.h>
+#include <linux/io.h>
+#include "cdma.h"
+
+struct host1x;
+struct platform_device;
+
+/*
+ * host1x device list in debug-fs dump of host1x and client device
+ * as well as channel state
+ */
+struct host1x_channel {
+	struct list_head list;
+
+	int refcount;
+	int chid;
+	struct mutex reflock;
+	struct mutex submitlock;
+	void __iomem *regs;
+	struct device *node;
+	struct platform_device *dev;
+	struct cdev cdev;
+	struct host1x_cdma cdma;
+};
+
+/* channel list operations */
+void host1x_channel_list_init(struct host1x *);
+void host1x_channel_for_all(struct host1x *, void *data,
+	int (*fptr)(struct host1x_channel *ch, void *fdata));
+
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev);
+void host1x_channel_free(struct host1x_channel *ch);
+struct host1x_channel *host1x_channel_get(struct host1x_channel *ch);
+void host1x_channel_put(struct host1x_channel *ch);
+int host1x_channel_submit(struct host1x_job *job);
+
+#endif
diff --git a/drivers/gpu/host1x/cma.c b/drivers/gpu/host1x/cma.c
new file mode 100644
index 0000000..06b7959
--- /dev/null
+++ b/drivers/gpu/host1x/cma.c
@@ -0,0 +1,116 @@
+/*
+ * Tegra host1x CMA support
+ *
+ * Copyright (c) 2012-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <drm/drmP.h>
+#include <drm/drm.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <linux/mutex.h>
+
+#include "cma.h"
+#include "memmgr.h"
+
+static inline struct drm_gem_cma_object *to_cma_obj(struct mem_handle *h)
+{
+	return (struct drm_gem_cma_object *)(((u32)h) & MEMMGR_ID_MASK);
+}
+
+struct mem_handle *host1x_cma_alloc(size_t size, size_t align, int flags)
+{
+	return NULL;
+}
+
+void host1x_cma_put(struct mem_handle *handle)
+{
+	struct drm_gem_cma_object *obj = to_cma_obj(handle);
+	struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
+
+	mutex_lock(struct_mutex);
+	drm_gem_object_unreference(&obj->base);
+	mutex_unlock(struct_mutex);
+}
+
+struct sg_table *host1x_cma_pin(struct mem_handle *handle)
+{
+	return NULL;
+}
+
+void host1x_cma_unpin(struct mem_handle *handle, struct sg_table *sgt)
+{
+
+}
+
+
+void *host1x_cma_mmap(struct mem_handle *handle)
+{
+	return (to_cma_obj(handle))->vaddr;
+}
+
+void host1x_cma_munmap(struct mem_handle *handle, void *addr)
+{
+
+}
+
+void *host1x_cma_kmap(struct mem_handle *handle, unsigned int pagenum)
+{
+	return (to_cma_obj(handle))->vaddr + pagenum * PAGE_SIZE;
+}
+
+void host1x_cma_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr)
+{
+
+}
+
+struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev)
+{
+	struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
+	struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
+
+	mutex_lock(struct_mutex);
+	drm_gem_object_reference(&obj->base);
+	mutex_unlock(struct_mutex);
+
+	return (struct mem_handle *) ((u32)id | mem_mgr_type_cma);
+}
+
+int host1x_cma_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		long unsigned id_type_mask,
+		long unsigned id_type,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data,
+		dma_addr_t *phys_addr)
+{
+	int i;
+	int pin_count = 0;
+
+	for (i = 0; i < count; i++) {
+		struct mem_handle *handle;
+
+		if ((ids[i] & id_type_mask) != id_type)
+			continue;
+
+		handle = host1x_cma_get(ids[i], dev);
+
+		phys_addr[i] = (to_cma_obj(handle)->paddr);
+		unpin_data[pin_count].h = handle;
+
+		pin_count++;
+	}
+	return pin_count;
+}
diff --git a/drivers/gpu/host1x/cma.h b/drivers/gpu/host1x/cma.h
new file mode 100644
index 0000000..82ad710
--- /dev/null
+++ b/drivers/gpu/host1x/cma.h
@@ -0,0 +1,43 @@
+/*
+ * Tegra host1x cma memory manager
+ *
+ * Copyright (c) 2012-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_CMA_H
+#define __HOST1X_CMA_H
+
+#include "memmgr.h"
+
+struct platform_device;
+
+struct mem_handle *host1x_cma_alloc(size_t size, size_t align, int flags);
+void host1x_cma_put(struct mem_handle *handle);
+struct sg_table *host1x_cma_pin(struct mem_handle *handle);
+void host1x_cma_unpin(struct mem_handle *handle, struct sg_table *sgt);
+void *host1x_cma_mmap(struct mem_handle *handle);
+void host1x_cma_munmap(struct mem_handle *handle, void *addr);
+void *host1x_cma_kmap(struct mem_handle *handle, unsigned int pagenum);
+void host1x_cma_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr);
+struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev);
+int host1x_cma_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		long unsigned id_type_mask,
+		long unsigned id_type,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data,
+		dma_addr_t *phys_addr);
+#endif
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 7f9f389..80311ca 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -25,6 +25,7 @@
 #include <linux/io.h>
 #include "dev.h"
 #include "intr.h"
+#include "channel.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -46,6 +47,16 @@ u32 host1x_sync_readl(struct host1x *host1x, u32 r)
 	return readl(sync_regs + r);
 }
 
+void host1x_ch_writel(struct host1x_channel *ch, u32 v, u32 r)
+{
+	writel(v, ch->regs + r);
+}
+
+u32 host1x_ch_readl(struct host1x_channel *ch, u32 r)
+{
+	return readl(ch->regs + r);
+}
+
 static struct host1x_device_info host1x_info = {
 	.nb_channels	= 8,
 	.nb_pts		= 32,
@@ -135,6 +146,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_syncpt_reset(host);
 
+	host1x_channel_list_init(host);
+
 	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
 
 	dev_info(&dev->dev, "initialized\n");
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 8376092..2fefa78 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -18,11 +18,58 @@
 #define HOST1X_DEV_H
 
 #include <linux/platform_device.h>
+
+#include "channel.h"
 #include "syncpt.h"
 #include "intr.h"
 
 struct host1x;
+struct host1x_intr;
 struct host1x_syncpt;
+struct host1x_channel;
+struct host1x_cdma;
+struct host1x_job;
+struct push_buffer;
+struct dentry;
+struct mem_handle;
+struct platform_device;
+
+struct host1x_channel_ops {
+	int (*init)(struct host1x_channel *,
+		    struct host1x *,
+		    int chid);
+	int (*submit)(struct host1x_job *job);
+};
+
+struct host1x_cdma_ops {
+	void (*start)(struct host1x_cdma *);
+	void (*stop)(struct host1x_cdma *);
+	void (*kick)(struct  host1x_cdma *);
+	int (*timeout_init)(struct host1x_cdma *,
+			    u32 syncpt_id);
+	void (*timeout_destroy)(struct host1x_cdma *);
+	void (*timeout_teardown_begin)(struct host1x_cdma *);
+	void (*timeout_teardown_end)(struct host1x_cdma *,
+				     u32 getptr);
+	void (*timeout_cpu_incr)(struct host1x_cdma *,
+				 u32 getptr,
+				 u32 syncpt_incrs,
+				 u32 syncval,
+				 u32 nr_slots);
+};
+
+struct host1x_pushbuffer_ops {
+	void (*reset)(struct push_buffer *);
+	int (*init)(struct push_buffer *);
+	void (*destroy)(struct push_buffer *);
+	void (*push_to)(struct push_buffer *,
+			struct mem_handle *,
+			u32 op1, u32 op2);
+	void (*pop_from)(struct push_buffer *,
+			 unsigned int slots);
+	u32 (*space)(struct push_buffer *);
+	u32 (*putptr)(struct push_buffer *);
+};
 
 struct host1x_syncpt_ops {
 	void (*reset)(struct host1x_syncpt *);
@@ -64,9 +111,19 @@ struct host1x {
 	struct host1x_device_info info;
 	struct clk *clk;
 
+	/* Sync point dedicated to replacing waits for expired fences */
+	struct host1x_syncpt *nop_sp;
+
+	struct host1x_channel_ops channel_op;
+	struct host1x_cdma_ops cdma_op;
+	struct host1x_pushbuffer_ops cdma_pb_op;
 	struct host1x_syncpt_ops syncpt_op;
 	struct host1x_intr_ops intr_op;
 
+	struct mutex chlist_mutex;
+	struct host1x_channel chlist;
+	int allocated_channels;
+
 	struct dentry *debugfs;
 };
 
@@ -84,5 +141,7 @@ struct host1x *host1x_get_host(struct platform_device *_dev)
 
 void host1x_sync_writel(struct host1x *host1x, u32 r, u32 v);
 u32 host1x_sync_readl(struct host1x *host1x, u32 r);
+void host1x_ch_writel(struct host1x_channel *ch, u32 r, u32 v);
+u32 host1x_ch_readl(struct host1x_channel *ch, u32 r);
 
 #endif
diff --git a/drivers/gpu/host1x/host1x.h b/drivers/gpu/host1x/host1x.h
new file mode 100644
index 0000000..ded0660
--- /dev/null
+++ b/drivers/gpu/host1x/host1x.h
@@ -0,0 +1,29 @@
+/*
+ * Tegra host1x driver
+ *
+ * Copyright (c) 2009-2013, NVIDIA Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __LINUX_HOST1X_H
+#define __LINUX_HOST1X_H
+
+enum host1x_class {
+	NV_HOST1X_CLASS_ID		= 0x1,
+	NV_GRAPHICS_2D_CLASS_ID		= 0x51,
+};
+
+#endif
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
new file mode 100644
index 0000000..7a44418
--- /dev/null
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -0,0 +1,475 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/slab.h>
+#include <linux/scatterlist.h>
+#include <linux/dma-mapping.h>
+#include "cdma.h"
+#include "channel.h"
+#include "dev.h"
+#include "memmgr.h"
+
+#include "cdma_hw.h"
+
+static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get)
+{
+	return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop)
+		| HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst)
+		| HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
+}
+
+static void cdma_timeout_handler(struct work_struct *work);
+
+/*
+ * push_buffer
+ *
+ * The push buffer is a circular array of words to be fetched by command DMA.
+ * Note that it works slightly differently to the sync queue; fence == cur
+ * means that the push buffer is full, not empty.
+ */
+
+
+/**
+ * Reset to empty push buffer
+ */
+static void push_buffer_reset(struct push_buffer *pb)
+{
+	pb->fence = PUSH_BUFFER_SIZE - 8;
+	pb->cur = 0;
+}
+
+/**
+ * Init push buffer resources
+ */
+static void push_buffer_destroy(struct push_buffer *pb);
+static int push_buffer_init(struct push_buffer *pb)
+{
+	struct host1x_cdma *cdma = pb_to_cdma(pb);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	pb->mapped = NULL;
+	pb->phys = 0;
+	pb->handle = NULL;
+
+	host1x->cdma_pb_op.reset(pb);
+
+	/* allocate and map pushbuffer memory */
+	pb->mapped = dma_alloc_writecombine(&host1x->dev->dev,
+			PUSH_BUFFER_SIZE + 4, &pb->phys, GFP_KERNEL);
+	if (!pb->mapped)
+		goto fail;
+
+	/* memory for storing mem client and handles for each opcode pair */
+	pb->handle = kzalloc(HOST1X_GATHER_QUEUE_SIZE *
+				sizeof(struct mem_handle *),
+			GFP_KERNEL);
+	if (!pb->handle)
+		goto fail;
+
+	/* put the restart at the end of pushbuffer memory */
+	*(pb->mapped + (PUSH_BUFFER_SIZE >> 2)) =
+		host1x_opcode_restart(pb->phys);
+
+	return 0;
+
+fail:
+	push_buffer_destroy(pb);
+	return -ENOMEM;
+}
+
+/*
+ * Clean up push buffer resources
+ */
+static void push_buffer_destroy(struct push_buffer *pb)
+{
+	struct host1x_cdma *cdma = pb_to_cdma(pb);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (pb->phys != 0)
+		dma_free_writecombine(&host1x->dev->dev,
+				PUSH_BUFFER_SIZE + 4,
+				pb->mapped, pb->phys);
+
+	kfree(pb->handle);
+
+	pb->mapped = NULL;
+	pb->phys = 0;
+	pb->handle = NULL;
+}
+
+/*
+ * Push two words to the push buffer
+ * Caller must ensure push buffer is not full
+ */
+static void push_buffer_push_to(struct push_buffer *pb,
+		struct mem_handle *handle,
+		u32 op1, u32 op2)
+{
+	u32 cur = pb->cur;
+	u32 *p = (u32 *)((u32)pb->mapped + cur);
+	u32 cur_mem = (cur/8) & (HOST1X_GATHER_QUEUE_SIZE - 1);
+	WARN_ON(cur == pb->fence);
+	*(p++) = op1;
+	*(p++) = op2;
+	pb->handle[cur_mem] = handle;
+	pb->cur = (cur + 8) & (PUSH_BUFFER_SIZE - 1);
+}
+
+/*
+ * Pop a number of two word slots from the push buffer
+ * Caller must ensure push buffer is not empty
+ */
+static void push_buffer_pop_from(struct push_buffer *pb,
+		unsigned int slots)
+{
+	/* Clear the mem references for old items from pb */
+	unsigned int i;
+	u32 fence_mem = pb->fence/8;
+	for (i = 0; i < slots; i++) {
+		int cur_fence_mem = (fence_mem+i)
+				& (HOST1X_GATHER_QUEUE_SIZE - 1);
+		pb->handle[cur_fence_mem] = NULL;
+	}
+	/* Advance the next write position */
+	pb->fence = (pb->fence + slots * 8) & (PUSH_BUFFER_SIZE - 1);
+}
+
+/*
+ * Return the number of two word slots free in the push buffer
+ */
+static u32 push_buffer_space(struct push_buffer *pb)
+{
+	return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
+}
+
+static u32 push_buffer_putptr(struct push_buffer *pb)
+{
+	return pb->phys + pb->cur;
+}
+
+/*
+ * The syncpt incr buffer is filled with methods to increment syncpts, which
+ * is later GATHER-ed into the mainline PB. It's used when a timed out context
+ * is interleaved with other work, so needs to inline the syncpt increments
+ * to maintain the count (but otherwise does no work).
+ */
+
+/*
+ * Init timeout resources
+ */
+static int cdma_timeout_init(struct host1x_cdma *cdma,
+				 u32 syncpt_id)
+{
+	if (syncpt_id == NVSYNCPT_INVALID)
+		return -EINVAL;
+
+	INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
+	cdma->timeout.initialized = true;
+
+	return 0;
+}
+
+/*
+ * Clean up timeout resources
+ */
+static void cdma_timeout_destroy(struct host1x_cdma *cdma)
+{
+	if (cdma->timeout.initialized)
+		cancel_delayed_work(&cdma->timeout.wq);
+	cdma->timeout.initialized = false;
+}
+
+/*
+ * Increment timedout buffer's syncpt via CPU.
+ */
+static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
+				u32 syncpt_incrs, u32 syncval, u32 nr_slots)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct push_buffer *pb = &cdma->push_buffer;
+	u32 i, getidx;
+
+	for (i = 0; i < syncpt_incrs; i++)
+		host1x_syncpt_cpu_incr(cdma->timeout.syncpt);
+
+	/* after CPU incr, ensure shadow is up to date */
+	host1x_syncpt_load_min(cdma->timeout.syncpt);
+
+	/* NOP all the PB slots */
+	getidx = getptr - pb->phys;
+	while (nr_slots--) {
+		u32 *p = (u32 *)((u32)pb->mapped + getidx);
+		*(p++) = HOST1X_OPCODE_NOOP;
+		*(p++) = HOST1X_OPCODE_NOOP;
+		dev_dbg(&host1x->dev->dev, "%s: NOP at 0x%x\n",
+			__func__, pb->phys + getidx);
+		getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1);
+	}
+	wmb();
+}
+
+/*
+ * Start channel DMA
+ */
+static void cdma_start(struct host1x_cdma *cdma)
+{
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+
+	if (cdma->running)
+		return;
+
+	cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		HOST1X_CHANNEL_DMACTRL);
+
+	/* set base, put, end pointer (all of memory) */
+	host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART);
+	host1x_ch_writel(ch, cdma->last_put, HOST1X_CHANNEL_DMAPUT);
+	host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND);
+
+	/* reset GET */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, true, true),
+		HOST1X_CHANNEL_DMACTRL);
+
+	/* start the command DMA */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(false, false, false),
+		HOST1X_CHANNEL_DMACTRL);
+
+	cdma->running = true;
+}
+
+/*
+ * Similar to cdma_start(), but rather than starting from an idle
+ * state (where DMA GET is set to DMA PUT), on a timeout we restore
+ * DMA GET from an explicit value (so DMA may again be pending).
+ */
+static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+
+	if (cdma->running)
+		return;
+
+	cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		HOST1X_CHANNEL_DMACTRL);
+
+	/* set base, end pointer (all of memory) */
+	host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART);
+	host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND);
+
+	/* set GET, by loading the value in PUT (then reset GET) */
+	host1x_ch_writel(ch, getptr, HOST1X_CHANNEL_DMAPUT);
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, true, true),
+		HOST1X_CHANNEL_DMACTRL);
+
+	dev_dbg(&host1x->dev->dev,
+		"%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n",
+		__func__,
+		host1x_ch_readl(ch, HOST1X_CHANNEL_DMAGET),
+		host1x_ch_readl(ch, HOST1X_CHANNEL_DMAPUT),
+		cdma->last_put);
+
+	/* deassert GET reset and set PUT */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		HOST1X_CHANNEL_DMACTRL);
+	host1x_ch_writel(ch, cdma->last_put, HOST1X_CHANNEL_DMAPUT);
+
+	/* start the command DMA */
+	host1x_ch_writel(ch, host1x_channel_dmactrl(false, false, false),
+		HOST1X_CHANNEL_DMACTRL);
+
+	cdma->running = true;
+}
+
+/*
+ * Kick channel DMA into action by writing its PUT offset (if it has changed)
+ */
+static void cdma_kick(struct host1x_cdma *cdma)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 put;
+
+	put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
+
+	if (put != cdma->last_put) {
+		host1x_ch_writel(ch, put, HOST1X_CHANNEL_DMAPUT);
+		cdma->last_put = put;
+	}
+}
+
+static void cdma_stop(struct host1x_cdma *cdma)
+{
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+
+	mutex_lock(&cdma->lock);
+	if (cdma->running) {
+		host1x_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY);
+		host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+			HOST1X_CHANNEL_DMACTRL);
+		cdma->running = false;
+	}
+	mutex_unlock(&cdma->lock);
+}
+
+/*
+ * Stops both channel's command processor and CDMA immediately.
+ * Also, tears down the channel and resets corresponding module.
+ */
+static void cdma_timeout_teardown_begin(struct host1x_cdma *cdma)
+{
+	struct host1x *dev = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 cmdproc_stop;
+
+	if (cdma->torndown && !cdma->running) {
+		dev_warn(&dev->dev->dev, "Already torn down\n");
+		return;
+	}
+
+	dev_dbg(&dev->dev->dev,
+		"begin channel teardown (channel id %d)\n", ch->chid);
+
+	cmdproc_stop = host1x_sync_readl(dev, HOST1X_SYNC_CMDPROC_STOP);
+	cmdproc_stop |= BIT(ch->chid);
+	host1x_sync_writel(dev, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
+
+	dev_dbg(&dev->dev->dev,
+		"%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n",
+		__func__,
+		host1x_ch_readl(ch, HOST1X_CHANNEL_DMAGET),
+		host1x_ch_readl(ch, HOST1X_CHANNEL_DMAPUT),
+		cdma->last_put);
+
+	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
+		HOST1X_CHANNEL_DMACTRL);
+
+	host1x_sync_writel(dev, BIT(ch->chid), HOST1X_SYNC_CH_TEARDOWN);
+
+	cdma->running = false;
+	cdma->torndown = true;
+}
+
+static void cdma_timeout_teardown_end(struct host1x_cdma *cdma, u32 getptr)
+{
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u32 cmdproc_stop;
+
+	dev_dbg(&host1x->dev->dev,
+		"end channel teardown (id %d, DMAGET restart = 0x%x)\n",
+		ch->chid, getptr);
+
+	cmdproc_stop = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
+	cmdproc_stop &= ~(BIT(ch->chid));
+	host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
+
+	cdma->torndown = false;
+	cdma_timeout_restart(cdma, getptr);
+}
+
+/*
+ * If this timeout fires, it indicates the current sync_queue entry has
+ * exceeded its TTL and the userctx should be timed out and remaining
+ * submits already issued cleaned up (future submits return an error).
+ */
+static void cdma_timeout_handler(struct work_struct *work)
+{
+	struct host1x_cdma *cdma;
+	struct host1x *host1x;
+	struct host1x_channel *ch;
+
+	u32 syncpt_val;
+
+	u32 prev_cmdproc, cmdproc_stop;
+
+	cdma = container_of(to_delayed_work(work), struct host1x_cdma,
+			    timeout.wq);
+	host1x = cdma_to_host1x(cdma);
+	ch = cdma_to_channel(cdma);
+
+	mutex_lock(&cdma->lock);
+
+	if (!cdma->timeout.clientid) {
+		dev_dbg(&host1x->dev->dev,
+			 "cdma_timeout: expired, but has no clientid\n");
+		mutex_unlock(&cdma->lock);
+		return;
+	}
+
+	/* stop processing to get a clean snapshot */
+	prev_cmdproc = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
+	cmdproc_stop = prev_cmdproc | BIT(ch->chid);
+	host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
+
+	dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
+		prev_cmdproc, cmdproc_stop);
+
+	syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
+
+	/* has buffer actually completed? */
+	if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
+		dev_dbg(&host1x->dev->dev,
+			 "cdma_timeout: expired, but buffer had completed\n");
+		/* restore */
+		cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid));
+		host1x_sync_writel(host1x, cmdproc_stop,
+			HOST1X_SYNC_CMDPROC_STOP);
+		mutex_unlock(&cdma->lock);
+		return;
+	}
+
+	dev_warn(&host1x->dev->dev,
+		"%s: timeout: %d (%s), HW thresh %d, done %d\n",
+		__func__,
+		cdma->timeout.syncpt->id, cdma->timeout.syncpt->name,
+		syncpt_val, cdma->timeout.syncpt_val);
+
+	/* stop HW, resetting channel/module */
+	host1x->cdma_op.timeout_teardown_begin(cdma);
+
+	host1x_cdma_update_sync_queue(cdma, ch->dev);
+	mutex_unlock(&cdma->lock);
+}
+
+static const struct host1x_cdma_ops host1x_cdma_ops = {
+	.start = cdma_start,
+	.stop = cdma_stop,
+	.kick = cdma_kick,
+
+	.timeout_init = cdma_timeout_init,
+	.timeout_destroy = cdma_timeout_destroy,
+	.timeout_teardown_begin = cdma_timeout_teardown_begin,
+	.timeout_teardown_end = cdma_timeout_teardown_end,
+	.timeout_cpu_incr = cdma_timeout_cpu_incr,
+};
+
+static const struct host1x_pushbuffer_ops host1x_pushbuffer_ops = {
+	.reset = push_buffer_reset,
+	.init = push_buffer_init,
+	.destroy = push_buffer_destroy,
+	.push_to = push_buffer_push_to,
+	.pop_from = push_buffer_pop_from,
+	.space = push_buffer_space,
+	.putptr = push_buffer_putptr,
+};
+
diff --git a/drivers/gpu/host1x/hw/cdma_hw.h b/drivers/gpu/host1x/hw/cdma_hw.h
new file mode 100644
index 0000000..80a085a
--- /dev/null
+++ b/drivers/gpu/host1x/hw/cdma_hw.h
@@ -0,0 +1,37 @@
+/*
+ * Tegra host1x Command DMA
+ *
+ * Copyright (c) 2011-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_CDMA_HW_H
+#define __HOST1X_CDMA_HW_H
+
+/*
+ * Size of the sync queue. If it is too small, we won't be able to queue up
+ * many command buffers. If it is too large, we waste memory.
+ */
+#define HOST1X_SYNC_QUEUE_SIZE 512
+
+/*
+ * Number of gathers we allow to be queued up per channel. Must be a
+ * power of two. Currently sized such that pushbuffer is 4KB (512*8B).
+ */
+#define HOST1X_GATHER_QUEUE_SIZE 512
+
+/* 8 bytes per slot. (This number does not include the final RESTART.) */
+#define PUSH_BUFFER_SIZE (HOST1X_GATHER_QUEUE_SIZE * 8)
+
+#endif
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
new file mode 100644
index 0000000..905cfd2
--- /dev/null
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -0,0 +1,148 @@
+/*
+ * Tegra host1x Channel
+ *
+ * Copyright (c) 2010-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "host1x.h"
+#include "channel.h"
+#include "dev.h"
+#include <linux/slab.h>
+#include "intr.h"
+#include "job.h"
+#include <trace/events/host1x.h>
+
+static void submit_gathers(struct host1x_job *job)
+{
+	/* push user gathers */
+	int i;
+	for (i = 0 ; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		u32 op1 = host1x_opcode_gather(g->words);
+		u32 op2 = g->mem_base + g->offset;
+		host1x_cdma_push_gather(&job->ch->cdma,
+				job->gathers[i].ref,
+				job->gathers[i].offset,
+				op1, op2);
+	}
+}
+
+static int channel_submit(struct host1x_job *job)
+{
+	struct host1x_channel *ch = job->ch;
+	struct host1x_syncpt *sp;
+	u32 user_syncpt_incrs = job->syncpt_incrs;
+	u32 prev_max = 0;
+	u32 syncval;
+	int err;
+	void *completed_waiter = NULL;
+
+	sp = host1x_get_host(job->ch->dev)->syncpt + job->syncpt_id;
+	trace_host1x_channel_submit(ch->dev->name,
+			job->num_gathers, job->num_relocs, job->num_waitchk,
+			job->syncpt_id, job->syncpt_incrs);
+
+	/* before error checks, return current max */
+	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
+
+	/* get submit lock */
+	err = mutex_lock_interruptible(&ch->submitlock);
+	if (err)
+		goto error;
+
+	completed_waiter = host1x_intr_alloc_waiter();
+	if (!completed_waiter) {
+		mutex_unlock(&ch->submitlock);
+		err = -ENOMEM;
+		goto error;
+	}
+
+	/* begin a CDMA submit */
+	err = host1x_cdma_begin(&ch->cdma, job);
+	if (err) {
+		mutex_unlock(&ch->submitlock);
+		goto error;
+	}
+
+	if (job->serialize) {
+		/*
+		 * Force serialization by inserting a host wait for the
+		 * previous job to finish before this one can commence.
+		 */
+		host1x_cdma_push(&ch->cdma,
+				host1x_opcode_setclass(NV_HOST1X_CLASS_ID,
+					host1x_uclass_wait_syncpt_r(),
+					1),
+				host1x_class_host_wait_syncpt(job->syncpt_id,
+					host1x_syncpt_read_max(sp)));
+	}
+
+	syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs);
+
+	job->syncpt_end = syncval;
+
+	/* add a setclass for modules that require it */
+	if (job->class)
+		host1x_cdma_push(&ch->cdma,
+			host1x_opcode_setclass(job->class, 0, 0),
+			HOST1X_OPCODE_NOOP);
+
+	submit_gathers(job);
+
+	/* end CDMA submit & stash pinned hMems into sync queue */
+	host1x_cdma_end(&ch->cdma, job);
+
+	trace_host1x_channel_submitted(ch->dev->name,
+			prev_max, syncval);
+
+	/* schedule a submit complete interrupt */
+	err = host1x_intr_add_action(&host1x_get_host(ch->dev)->intr,
+			job->syncpt_id, syncval,
+			HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
+			completed_waiter,
+			NULL);
+	completed_waiter = NULL;
+	WARN(err, "Failed to set submit complete interrupt");
+
+	mutex_unlock(&ch->submitlock);
+
+	return 0;
+
+error:
+	kfree(completed_waiter);
+	return err;
+}
+
+static inline void __iomem *host1x_channel_regs(void __iomem *p, int ndx)
+{
+	p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES;
+	return p;
+}
+
+static int host1x_channel_init(struct host1x_channel *ch,
+	struct host1x *dev, int index)
+{
+	ch->chid = index;
+	mutex_init(&ch->reflock);
+	mutex_init(&ch->submitlock);
+
+	ch->regs = host1x_channel_regs(dev->regs, index);
+	return 0;
+}
+
+static const struct host1x_channel_ops host1x_channel_ops = {
+	.init = host1x_channel_init,
+	.submit = channel_submit,
+};
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index 3d633a3..7569a1e 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -23,13 +23,19 @@
 
 #include "hw/host1x01.h"
 #include "dev.h"
+#include "channel.h"
 #include "hw/host1x01_hardware.h"
 
+#include "hw/channel_hw.c"
+#include "hw/cdma_hw.c"
 #include "hw/syncpt_hw.c"
 #include "hw/intr_hw.c"
 
 int host1x01_init(struct host1x *host)
 {
+	host->channel_op = host1x_channel_ops;
+	host->cdma_op = host1x_cdma_ops;
+	host->cdma_pb_op = host1x_pushbuffer_ops;
 	host->syncpt_op = host1x_syncpt_ops;
 	host->intr_op = host1x_intr_ops;
 
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
index c1d5324..03873c0 100644
--- a/drivers/gpu/host1x/hw/host1x01_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -21,6 +21,130 @@
 
 #include <linux/types.h>
 #include <linux/bitops.h>
+#include "hw_host1x01_channel.h"
 #include "hw_host1x01_sync.h"
+#include "hw_host1x01_uclass.h"
+
+/* channel registers */
+#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384
+
+static inline u32 host1x_class_host_wait_syncpt(
+	unsigned indx, unsigned threshold)
+{
+	return host1x_uclass_wait_syncpt_indx_f(indx)
+		| host1x_uclass_wait_syncpt_thresh_f(threshold);
+}
+
+static inline u32 host1x_class_host_load_syncpt_base(
+	unsigned indx, unsigned threshold)
+{
+	return host1x_uclass_load_syncpt_base_base_indx_f(indx)
+		| host1x_uclass_load_syncpt_base_value_f(threshold);
+}
+
+static inline u32 host1x_class_host_wait_syncpt_base(
+	unsigned indx, unsigned base_indx, unsigned offset)
+{
+	return host1x_uclass_wait_syncpt_base_indx_f(indx)
+		| host1x_uclass_wait_syncpt_base_base_indx_f(base_indx)
+		| host1x_uclass_wait_syncpt_base_offset_f(offset);
+}
+
+static inline u32 host1x_class_host_incr_syncpt_base(
+	unsigned base_indx, unsigned offset)
+{
+	return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx)
+		| host1x_uclass_incr_syncpt_base_offset_f(offset);
+}
+
+static inline u32 host1x_class_host_incr_syncpt(
+	unsigned cond, unsigned indx)
+{
+	return host1x_uclass_incr_syncpt_cond_f(cond)
+		| host1x_uclass_incr_syncpt_indx_f(indx);
+}
+
+static inline u32 host1x_class_host_indoff_reg_write(
+	unsigned mod_id, unsigned offset, bool auto_inc)
+{
+	u32 v = host1x_uclass_indoff_indbe_f(0xf)
+		| host1x_uclass_indoff_indmodid_f(mod_id)
+		| host1x_uclass_indoff_indroffset_f(offset);
+	if (auto_inc)
+		v |= host1x_uclass_indoff_autoinc_f(1);
+	return v;
+}
+
+static inline u32 host1x_class_host_indoff_reg_read(
+	unsigned mod_id, unsigned offset, bool auto_inc)
+{
+	u32 v = host1x_uclass_indoff_indmodid_f(mod_id)
+		| host1x_uclass_indoff_indroffset_f(offset)
+		| host1x_uclass_indoff_rwn_read_v();
+	if (auto_inc)
+		v |= host1x_uclass_indoff_autoinc_f(1);
+	return v;
+}
+
+
+/* cdma opcodes */
+static inline u32 host1x_opcode_setclass(
+	unsigned class_id, unsigned offset, unsigned mask)
+{
+	return (0 << 28) | (offset << 16) | (class_id << 6) | mask;
+}
+
+static inline u32 host1x_opcode_incr(unsigned offset, unsigned count)
+{
+	return (1 << 28) | (offset << 16) | count;
+}
+
+static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count)
+{
+	return (2 << 28) | (offset << 16) | count;
+}
+
+static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask)
+{
+	return (3 << 28) | (offset << 16) | mask;
+}
+
+static inline u32 host1x_opcode_imm(unsigned offset, unsigned value)
+{
+	return (4 << 28) | (offset << 16) | value;
+}
+
+static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx)
+{
+	return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(),
+		host1x_class_host_incr_syncpt(cond, indx));
+}
+
+static inline u32 host1x_opcode_restart(unsigned address)
+{
+	return (5 << 28) | (address >> 4);
+}
+
+static inline u32 host1x_opcode_gather(unsigned count)
+{
+	return (6 << 28) | count;
+}
+
+static inline u32 host1x_opcode_gather_nonincr(unsigned offset,	unsigned count)
+{
+	return (6 << 28) | (offset << 16) | BIT(15) | count;
+}
+
+static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
+{
+	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
+}
+
+#define HOST1X_OPCODE_NOOP host1x_opcode_nonincr(0, 0)
+
+static inline u32 host1x_mask2(unsigned x, unsigned y)
+{
+	return 1 | (1 << (y - x));
+}
 
 #endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
new file mode 100644
index 0000000..dad4fee
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
@@ -0,0 +1,102 @@
+/*
+ * Copyright (c) 2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x_channel_host1x_h__
+#define __hw_host1x_channel_host1x_h__
+
+static inline u32 host1x_channel_dmastart_r(void)
+{
+	return 0x14;
+}
+#define HOST1X_CHANNEL_DMASTART \
+	host1x_channel_dmastart_r()
+static inline u32 host1x_channel_dmaput_r(void)
+{
+	return 0x18;
+}
+#define HOST1X_CHANNEL_DMAPUT \
+	host1x_channel_dmaput_r()
+static inline u32 host1x_channel_dmaget_r(void)
+{
+	return 0x1c;
+}
+#define HOST1X_CHANNEL_DMAGET \
+	host1x_channel_dmaget_r()
+static inline u32 host1x_channel_dmaend_r(void)
+{
+	return 0x20;
+}
+#define HOST1X_CHANNEL_DMAEND \
+	host1x_channel_dmaend_r()
+static inline u32 host1x_channel_dmactrl_r(void)
+{
+	return 0x24;
+}
+#define HOST1X_CHANNEL_DMACTRL \
+	host1x_channel_dmactrl_r()
+static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v)
+{
+	return (v & 0x1) << 0;
+}
+#define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \
+	host1x_channel_dmactrl_dmastop_f(v)
+static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v)
+{
+	return (v & 0x1) << 1;
+}
+#define HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(v) \
+	host1x_channel_dmactrl_dmagetrst_f(v)
+static inline u32 host1x_channel_dmactrl_dmainitget_f(u32 v)
+{
+	return (v & 0x1) << 2;
+}
+#define HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(v) \
+	host1x_channel_dmactrl_dmainitget_f(v)
+#endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index 5da9afb..3073d37 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -69,6 +69,18 @@ static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
 }
 #define HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 \
 	host1x_sync_syncpt_thresh_int_enable_cpu0_r()
+static inline u32 host1x_sync_cmdproc_stop_r(void)
+{
+	return 0xac;
+}
+#define HOST1X_SYNC_CMDPROC_STOP \
+	host1x_sync_cmdproc_stop_r()
+static inline u32 host1x_sync_ch_teardown_r(void)
+{
+	return 0xb0;
+}
+#define HOST1X_SYNC_CH_TEARDOWN \
+	host1x_sync_ch_teardown_r()
 static inline u32 host1x_sync_usec_clk_r(void)
 {
 	return 0x1a4;
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
new file mode 100644
index 0000000..7af6609
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
@@ -0,0 +1,168 @@
+/*
+ * Copyright (c) 2012-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+ /*
+  * Function naming determines intended use:
+  *
+  *     <x>_r(void) : Returns the offset for register <x>.
+  *
+  *     <x>_w(void) : Returns the word offset for word (4 byte) element <x>.
+  *
+  *     <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits.
+  *
+  *     <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted
+  *         and masked to place it at field <y> of register <x>.  This value
+  *         can be |'d with others to produce a full register value for
+  *         register <x>.
+  *
+  *     <x>_<y>_m(void) : Returns a mask for field <y> of register <x>.  This
+  *         value can be ~'d and then &'d to clear the value of field <y> for
+  *         register <x>.
+  *
+  *     <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted
+  *         to place it at field <y> of register <x>.  This value can be |'d
+  *         with others to produce a full register value for <x>.
+  *
+  *     <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register
+  *         <x> value 'r' after being shifted to place its LSB at bit 0.
+  *         This value is suitable for direct comparison with other unshifted
+  *         values appropriate for use in field <y> of register <x>.
+  *
+  *     <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for
+  *         field <y> of register <x>.  This value is suitable for direct
+  *         comparison with unshifted values appropriate for use in field <y>
+  *         of register <x>.
+  */
+
+#ifndef __hw_host1x_uclass_host1x_h__
+#define __hw_host1x_uclass_host1x_h__
+
+static inline u32 host1x_uclass_incr_syncpt_r(void)
+{
+	return 0x0;
+}
+#define HOST1X_UCLASS_INCR_SYNCPT \
+	host1x_uclass_incr_syncpt_r()
+static inline u32 host1x_uclass_incr_syncpt_cond_f(u32 v)
+{
+	return (v & 0xff) << 8;
+}
+#define HOST1X_UCLASS_INCR_SYNCPT_COND_F(v) \
+	host1x_uclass_incr_syncpt_cond_f(v)
+static inline u32 host1x_uclass_incr_syncpt_indx_f(u32 v)
+{
+	return (v & 0xff) << 0;
+}
+#define HOST1X_UCLASS_INCR_SYNCPT_INDX_F(v) \
+	host1x_uclass_incr_syncpt_indx_f(v)
+static inline u32 host1x_uclass_wait_syncpt_r(void)
+{
+	return 0x8;
+}
+#define HOST1X_UCLASS_WAIT_SYNCPT \
+	host1x_uclass_wait_syncpt_r()
+static inline u32 host1x_uclass_wait_syncpt_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+#define HOST1X_UCLASS_WAIT_SYNCPT_INDX_F(v) \
+	host1x_uclass_wait_syncpt_indx_f(v)
+static inline u32 host1x_uclass_wait_syncpt_thresh_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+#define HOST1X_UCLASS_WAIT_SYNCPT_THRESH_F(v) \
+	host1x_uclass_wait_syncpt_thresh_f(v)
+static inline u32 host1x_uclass_wait_syncpt_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+#define HOST1X_UCLASS_WAIT_SYNCPT_BASE_INDX_F(v) \
+	host1x_uclass_wait_syncpt_base_indx_f(v)
+static inline u32 host1x_uclass_wait_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 16;
+}
+#define HOST1X_UCLASS_WAIT_SYNCPT_BASE_BASE_INDX_F(v) \
+	host1x_uclass_wait_syncpt_base_base_indx_f(v)
+static inline u32 host1x_uclass_wait_syncpt_base_offset_f(u32 v)
+{
+	return (v & 0xffff) << 0;
+}
+#define HOST1X_UCLASS_WAIT_SYNCPT_BASE_OFFSET_F(v) \
+	host1x_uclass_wait_syncpt_base_offset_f(v)
+static inline u32 host1x_uclass_load_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+#define HOST1X_UCLASS_LOAD_SYNCPT_BASE_BASE_INDX_F(v) \
+	host1x_uclass_load_syncpt_base_base_indx_f(v)
+static inline u32 host1x_uclass_load_syncpt_base_value_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+#define HOST1X_UCLASS_LOAD_SYNCPT_BASE_VALUE_F(v) \
+	host1x_uclass_load_syncpt_base_value_f(v)
+static inline u32 host1x_uclass_incr_syncpt_base_base_indx_f(u32 v)
+{
+	return (v & 0xff) << 24;
+}
+#define HOST1X_UCLASS_INCR_SYNCPT_BASE_BASE_INDX_F(v) \
+	host1x_uclass_incr_syncpt_base_base_indx_f(v)
+static inline u32 host1x_uclass_incr_syncpt_base_offset_f(u32 v)
+{
+	return (v & 0xffffff) << 0;
+}
+#define HOST1X_UCLASS_INCR_SYNCPT_BASE_OFFSET_F(v) \
+	host1x_uclass_incr_syncpt_base_offset_f(v)
+static inline u32 host1x_uclass_indoff_r(void)
+{
+	return 0x2d;
+}
+#define HOST1X_UCLASS_INDOFF \
+	host1x_uclass_indoff_r()
+static inline u32 host1x_uclass_indoff_indbe_f(u32 v)
+{
+	return (v & 0xf) << 28;
+}
+#define HOST1X_UCLASS_INDOFF_INDBE_F(v) \
+	host1x_uclass_indoff_indbe_f(v)
+static inline u32 host1x_uclass_indoff_autoinc_f(u32 v)
+{
+	return (v & 0x1) << 27;
+}
+#define HOST1X_UCLASS_INDOFF_AUTOINC_F(v) \
+	host1x_uclass_indoff_autoinc_f(v)
+static inline u32 host1x_uclass_indoff_indmodid_f(u32 v)
+{
+	return (v & 0xff) << 18;
+}
+#define HOST1X_UCLASS_INDOFF_INDMODID_F(v) \
+	host1x_uclass_indoff_indmodid_f(v)
+static inline u32 host1x_uclass_indoff_indroffset_f(u32 v)
+{
+	return (v & 0xffff) << 2;
+}
+#define HOST1X_UCLASS_INDOFF_INDROFFSET_F(v) \
+	host1x_uclass_indoff_indroffset_f(v)
+static inline u32 host1x_uclass_indoff_rwn_read_v(void)
+{
+	return 1;
+}
+#define HOST1X_UCLASS_INDOFF_INDROFFSET_F(v) \
+	host1x_uclass_indoff_indroffset_f(v)
+#endif
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
index 16e3ada..ba48cee 100644
--- a/drivers/gpu/host1x/hw/syncpt_hw.c
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -97,6 +97,15 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp)
 	wmb();
 }
 
+/* remove a wait pointed to by patch_addr */
+static int syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr)
+{
+	u32 override = host1x_class_host_wait_syncpt(
+			NVSYNCPT_GRAPHICS_HOST, 0);
+	__raw_writel(override, patch_addr);
+	return 0;
+}
+
 static const char *syncpt_name(struct host1x_syncpt *sp)
 {
 	struct host1x_device_info *info = &sp->dev->info;
@@ -141,6 +150,7 @@ static const struct host1x_syncpt_ops host1x_syncpt_ops = {
 	.read_wait_base = syncpt_read_wait_base,
 	.load_min = syncpt_load_min,
 	.cpu_incr = syncpt_cpu_incr,
+	.patch_wait = syncpt_patch_wait,
 	.debug = syncpt_debug,
 	.name = syncpt_name,
 };
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 26099b8..9d0b5f1 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -20,6 +20,8 @@
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/irq.h>
+#include <trace/events/host1x.h>
+#include "channel.h"
 #include "dev.h"
 
 /* Wait list management */
@@ -74,7 +76,7 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,
 			struct list_head completed[HOST1X_INTR_ACTION_COUNT])
 {
 	struct list_head *dest;
-	struct host1x_waitlist *waiter, *next;
+	struct host1x_waitlist *waiter, *next, *prev;
 
 	list_for_each_entry_safe(waiter, next, head, list) {
 		if ((s32)(waiter->thresh - sync) > 0)
@@ -82,6 +84,17 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,
 
 		dest = completed + waiter->action;
 
+		/* consolidate submit cleanups */
+		if (waiter->action == HOST1X_INTR_ACTION_SUBMIT_COMPLETE
+			&& !list_empty(dest)) {
+			prev = list_entry(dest->prev,
+					struct host1x_waitlist, list);
+			if (prev->data == waiter->data) {
+				prev->count++;
+				dest = NULL;
+			}
+		}
+
 		/* PENDING->REMOVED or CANCELLED->HANDLED */
 		if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) {
 			list_del(&waiter->list);
@@ -104,6 +117,19 @@ static void reset_threshold_interrupt(struct host1x_intr *intr,
 	host1x->intr_op.enable_syncpt_intr(intr, id);
 }
 
+static void action_submit_complete(struct host1x_waitlist *waiter)
+{
+	struct host1x_channel *channel = waiter->data;
+	int nr_completed = waiter->count;
+
+	host1x_cdma_update(&channel->cdma);
+
+	/*  Add nr_completed to trace */
+	trace_host1x_channel_submit_complete(channel->dev->name,
+			nr_completed, waiter->thresh);
+
+}
+
 static void action_wakeup(struct host1x_waitlist *waiter)
 {
 	wait_queue_head_t *wq = waiter->data;
@@ -121,6 +147,7 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
 typedef void (*action_handler)(struct host1x_waitlist *waiter);
 
 static action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
+	action_submit_complete,
 	action_wakeup,
 	action_wakeup_interruptible,
 };
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index 679a7b4..979b929 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -24,6 +24,12 @@
 
 enum host1x_intr_action {
 	/*
+	 * Perform cleanup after a submit has completed.
+	 * 'data' points to a channel
+	 */
+	HOST1X_INTR_ACTION_SUBMIT_COMPLETE = 0,
+
+	/*
 	 * Wake up a  task.
 	 * 'data' points to a wait_queue_head_t
 	 */
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
new file mode 100644
index 0000000..cc9c84a
--- /dev/null
+++ b/drivers/gpu/host1x/job.c
@@ -0,0 +1,612 @@
+/*
+ * Tegra host1x Job
+ *
+ * Copyright (c) 2010-2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/kref.h>
+#include <linux/err.h>
+#include <linux/vmalloc.h>
+#include <linux/scatterlist.h>
+#include <trace/events/host1x.h>
+#include <linux/dma-mapping.h>
+#include "job.h"
+#include "channel.h"
+#include "syncpt.h"
+#include "dev.h"
+#include "memmgr.h"
+
+#ifdef CONFIG_TEGRA_HOST1X_FIREWALL
+static int host1x_firewall = 1;
+#else
+static int host1x_firewall;
+#endif
+
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
+		u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
+{
+	struct host1x_job *job = NULL;
+	int num_unpins = num_cmdbufs + num_relocs;
+	s64 total;
+	void *mem;
+
+	/* Check that we're not going to overflow */
+	total = sizeof(struct host1x_job)
+			+ num_relocs * sizeof(struct host1x_reloc)
+			+ num_unpins * sizeof(struct host1x_job_unpin_data)
+			+ num_waitchks * sizeof(struct host1x_waitchk)
+			+ num_cmdbufs * sizeof(struct host1x_job_gather)
+			+ num_unpins * sizeof(dma_addr_t)
+			+ num_unpins * sizeof(u32 *);
+	if (total > ULONG_MAX)
+		return NULL;
+
+	mem = job = kzalloc(total, GFP_KERNEL);
+	if (!job)
+		return NULL;
+
+	kref_init(&job->ref);
+	job->ch = ch;
+
+	/* First init state to zero */
+
+	/*
+	 * Redistribute memory to the structs.
+	 * Overflows and negative conditions have
+	 * already been checked in job_alloc().
+	 */
+	mem += sizeof(struct host1x_job);
+	job->relocarray = num_relocs ? mem : NULL;
+	mem += num_relocs * sizeof(struct host1x_reloc);
+	job->unpins = num_unpins ? mem : NULL;
+	mem += num_unpins * sizeof(struct host1x_job_unpin_data);
+	job->waitchk = num_waitchks ? mem : NULL;
+	mem += num_waitchks * sizeof(struct host1x_waitchk);
+	job->gathers = num_cmdbufs ? mem : NULL;
+	mem += num_cmdbufs * sizeof(struct host1x_job_gather);
+	job->addr_phys = num_unpins ? mem : NULL;
+	mem += num_unpins * sizeof(dma_addr_t);
+	job->pin_ids = num_unpins ? mem : NULL;
+
+	job->reloc_addr_phys = job->addr_phys;
+	job->gather_addr_phys = &job->addr_phys[num_relocs];
+
+	return job;
+}
+
+void host1x_job_get(struct host1x_job *job)
+{
+	kref_get(&job->ref);
+}
+
+static void job_free(struct kref *ref)
+{
+	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
+
+	kfree(job);
+}
+
+void host1x_job_put(struct host1x_job *job)
+{
+	kref_put(&job->ref, job_free);
+}
+
+void host1x_job_add_gather(struct host1x_job *job,
+		u32 mem_id, u32 words, u32 offset)
+{
+	struct host1x_job_gather *cur_gather =
+			&job->gathers[job->num_gathers];
+
+	cur_gather->words = words;
+	cur_gather->mem_id = mem_id;
+	cur_gather->offset = offset;
+	job->num_gathers++;
+}
+
+/*
+ * Check driver supplied waitchk structs for syncpt thresholds
+ * that have already been satisfied and NULL the comparison (to
+ * avoid a wrap condition in the HW).
+ */
+static int do_waitchks(struct host1x_job *job, struct host1x *host,
+		u32 patch_mem, struct mem_handle *h)
+{
+	int i;
+
+	/* compare syncpt vs wait threshold */
+	for (i = 0; i < job->num_waitchk; i++) {
+		struct host1x_waitchk *wait = &job->waitchk[i];
+		struct host1x_syncpt *sp =
+			host1x_syncpt_get(host, wait->syncpt_id);
+
+		/* validate syncpt id */
+		if (wait->syncpt_id > host1x_syncpt_nb_pts(host))
+			continue;
+
+		/* skip all other gathers */
+		if (patch_mem != wait->mem)
+			continue;
+
+		trace_host1x_syncpt_wait_check(wait->mem, wait->offset,
+				wait->syncpt_id, wait->thresh,
+				host1x_syncpt_read_min(sp));
+		if (host1x_syncpt_is_expired(
+			host1x_syncpt_get(host, wait->syncpt_id),
+			wait->thresh)) {
+			struct host1x_syncpt *sp =
+				host1x_syncpt_get(host, wait->syncpt_id);
+
+			void *patch_addr = NULL;
+
+			/*
+			 * NULL an already satisfied WAIT_SYNCPT host method,
+			 * by patching its args in the command stream. The
+			 * method data is changed to reference a reserved
+			 * (never given out or incr) NVSYNCPT_GRAPHICS_HOST
+			 * syncpt with a matching threshold value of 0, so
+			 * is guaranteed to be popped by the host HW.
+			 */
+			dev_dbg(&host->dev->dev,
+			    "drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n",
+			    wait->syncpt_id, sp->name, wait->thresh,
+			    host1x_syncpt_read_min(sp));
+
+			/* patch the wait */
+			patch_addr = host1x_memmgr_kmap(h,
+					wait->offset >> PAGE_SHIFT);
+			if (patch_addr) {
+				host1x_syncpt_patch_wait(sp,
+					(patch_addr +
+						(wait->offset & ~PAGE_MASK)));
+				host1x_memmgr_kunmap(h,
+						wait->offset >> PAGE_SHIFT,
+						patch_addr);
+			} else {
+				pr_err("Couldn't map cmdbuf for wait check\n");
+			}
+		}
+
+		wait->mem = 0;
+	}
+	return 0;
+}
+
+
+static int pin_job_mem(struct host1x_job *job)
+{
+	int i;
+	int count = 0;
+	int result;
+
+	for (i = 0; i < job->num_relocs; i++) {
+		struct host1x_reloc *reloc = &job->relocarray[i];
+		job->pin_ids[count] = reloc->target;
+		count++;
+	}
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		job->pin_ids[count] = g->mem_id;
+		count++;
+	}
+
+	/* validate array and pin unique ids, get refs for unpinning */
+	result = host1x_memmgr_pin_array_ids(job->ch->dev,
+		job->pin_ids, job->addr_phys,
+		count,
+		job->unpins);
+
+	if (result > 0)
+		job->num_unpins = result;
+
+	return result;
+}
+
+static int do_relocs(struct host1x_job *job,
+		u32 cmdbuf_mem, struct mem_handle *h)
+{
+	int i = 0;
+	int last_page = -1;
+	void *cmdbuf_page_addr = NULL;
+
+	/* pin & patch the relocs for one gather */
+	while (i < job->num_relocs) {
+		struct host1x_reloc *reloc = &job->relocarray[i];
+
+		/* skip all other gathers */
+		if (cmdbuf_mem != reloc->cmdbuf_mem) {
+			i++;
+			continue;
+		}
+
+		if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
+			if (cmdbuf_page_addr)
+				host1x_memmgr_kunmap(h,
+						last_page, cmdbuf_page_addr);
+
+			cmdbuf_page_addr = host1x_memmgr_kmap(h,
+					reloc->cmdbuf_offset >> PAGE_SHIFT);
+			last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
+
+			if (unlikely(!cmdbuf_page_addr)) {
+				pr_err("Couldn't map cmdbuf for relocation\n");
+				return -ENOMEM;
+			}
+		}
+
+		__raw_writel(
+			(job->reloc_addr_phys[i] +
+				reloc->target_offset) >> reloc->shift,
+			(cmdbuf_page_addr +
+				(reloc->cmdbuf_offset & ~PAGE_MASK)));
+
+		/* remove completed reloc from the job */
+		if (i != job->num_relocs - 1) {
+			struct host1x_reloc *reloc_last =
+				&job->relocarray[job->num_relocs - 1];
+			reloc->cmdbuf_mem	= reloc_last->cmdbuf_mem;
+			reloc->cmdbuf_offset	= reloc_last->cmdbuf_offset;
+			reloc->target		= reloc_last->target;
+			reloc->target_offset	= reloc_last->target_offset;
+			reloc->shift		= reloc_last->shift;
+			job->reloc_addr_phys[i] =
+				job->reloc_addr_phys[job->num_relocs - 1];
+			job->num_relocs--;
+		} else {
+			break;
+		}
+	}
+
+	if (cmdbuf_page_addr)
+		host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
+
+	return 0;
+}
+
+static int check_reloc(struct host1x_reloc *reloc,
+		u32 cmdbuf_id, int offset)
+{
+	int err = 0;
+	if (reloc->cmdbuf_mem != cmdbuf_id
+			|| reloc->cmdbuf_offset != offset * sizeof(u32))
+		err = -EINVAL;
+
+	return err;
+}
+
+static int check_mask(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 mask)
+{
+	while (mask) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (mask & 1) {
+			if (job->is_addr_reg(pdev, class, reg)) {
+				if (!*num_relocs ||
+					check_reloc(*reloc, cmdbuf_id, *offset))
+					return -EINVAL;
+				(*reloc)++;
+				(*num_relocs)--;
+			}
+			(*words)--;
+			(*offset)++;
+		}
+		mask >>= 1;
+		reg += 1;
+	}
+
+	return 0;
+}
+
+static int check_incr(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 count)
+{
+	while (count) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (job->is_addr_reg(pdev, class, reg)) {
+			if (!*num_relocs ||
+				check_reloc(*reloc, cmdbuf_id, *offset))
+				return -EINVAL;
+			(*reloc)++;
+			(*num_relocs)--;
+		}
+		reg += 1;
+		(*words)--;
+		(*offset)++;
+		count--;
+	}
+
+	return 0;
+}
+
+static int check_nonincr(struct host1x_job *job,
+		struct platform_device *pdev,
+		struct host1x_reloc **reloc, int *num_relocs,
+		u32 cmdbuf_id, int *offset,
+		u32 *words, u32 class, u32 reg, u32 count)
+{
+	int is_addr_reg = job->is_addr_reg(pdev, class, reg);
+
+	while (count) {
+		if (*words == 0)
+			return -EINVAL;
+
+		if (is_addr_reg) {
+			if (!*num_relocs ||
+				check_reloc(*reloc, cmdbuf_id, *offset))
+				return -EINVAL;
+			(*reloc)++;
+			(*num_relocs)--;
+		}
+		(*words)--;
+		(*offset)++;
+		count--;
+	}
+
+	return 0;
+}
+
+static int validate(struct host1x_job *job, struct platform_device *pdev,
+		struct host1x_job_gather *g)
+{
+	struct host1x_reloc *reloc = job->relocarray;
+	int num_relocs = job->num_relocs;
+	u32 *cmdbuf_base;
+	int offset = 0;
+	unsigned int words;
+	int err = 0;
+	int class = 0;
+
+	if (!job->is_addr_reg)
+		return 0;
+
+	cmdbuf_base = host1x_memmgr_mmap(g->ref);
+	if (!cmdbuf_base)
+		return -ENOMEM;
+
+	words = g->words;
+	while (words && !err) {
+		u32 word = cmdbuf_base[offset];
+		u32 opcode = (word & 0xf0000000) >> 28;
+		u32 mask = 0;
+		u32 reg = 0;
+		u32 count = 0;
+
+		words--;
+		offset++;
+
+		switch (opcode) {
+		case 0:
+			class = word >> 6 & 0x3ff;
+			mask = word & 0x3f;
+			reg = word >> 16 & 0xfff;
+			err = check_mask(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, mask);
+			if (err)
+				goto out;
+			break;
+		case 1:
+			reg = word >> 16 & 0xfff;
+			count = word & 0xffff;
+			err = check_incr(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, count);
+			if (err)
+				goto out;
+			break;
+
+		case 2:
+			reg = word >> 16 & 0xfff;
+			count = word & 0xffff;
+			err = check_nonincr(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, count);
+			if (err)
+				goto out;
+			break;
+
+		case 3:
+			mask = word & 0xffff;
+			reg = word >> 16 & 0xfff;
+			err = check_mask(job, pdev,
+					&reloc, &num_relocs, g->mem_id,
+					&offset, &words, class, reg, mask);
+			if (err)
+				goto out;
+			break;
+		case 4:
+		case 5:
+		case 14:
+			break;
+		default:
+			err = -EINVAL;
+			break;
+		}
+	}
+
+	/* No relocs should remain at this point */
+	if (num_relocs)
+		err = -EINVAL;
+
+out:
+	host1x_memmgr_munmap(g->ref, cmdbuf_base);
+
+	return err;
+}
+
+static inline int copy_gathers(struct host1x_job *job,
+		struct platform_device *pdev)
+{
+	size_t size = 0;
+	size_t offset = 0;
+	int i;
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		size += g->words * sizeof(u32);
+	}
+
+	job->gather_copy_mapped = dma_alloc_writecombine(&pdev->dev,
+			size, &job->gather_copy, GFP_KERNEL);
+	if (IS_ERR(job->gather_copy_mapped)) {
+		int err = PTR_ERR(job->gather_copy_mapped);
+		job->gather_copy_mapped = NULL;
+		return err;
+	}
+
+	job->gather_copy_size = size;
+
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+		void *gather = host1x_memmgr_mmap(g->ref);
+		memcpy(job->gather_copy_mapped + offset,
+				gather + g->offset,
+				g->words * sizeof(u32));
+
+		g->mem_base = job->gather_copy;
+		g->offset = offset;
+		g->mem_id = 0;
+		g->ref = 0;
+
+		host1x_memmgr_munmap(g->ref, gather);
+		offset += g->words * sizeof(u32);
+	}
+
+	return 0;
+}
+
+int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev)
+{
+	int err = 0, i = 0, j = 0;
+	struct host1x *host = host1x_get_host(pdev);
+	DECLARE_BITMAP(waitchk_mask, host1x_syncpt_nb_pts(host));
+
+	bitmap_zero(waitchk_mask, host1x_syncpt_nb_pts(host));
+	for (i = 0; i < job->num_waitchk; i++) {
+		u32 syncpt_id = job->waitchk[i].syncpt_id;
+		if (syncpt_id < host1x_syncpt_nb_pts(host))
+			set_bit(syncpt_id, waitchk_mask);
+	}
+
+	/* get current syncpt values for waitchk */
+	for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
+		host1x_syncpt_load_min(host->syncpt + i);
+
+	/* pin memory */
+	err = pin_job_mem(job);
+	if (err <= 0)
+		goto out;
+
+	/* patch gathers */
+	for (i = 0; i < job->num_gathers; i++) {
+		struct host1x_job_gather *g = &job->gathers[i];
+
+		/* process each gather mem only once */
+		if (!g->ref) {
+			g->ref = host1x_memmgr_get(g->mem_id, job->ch->dev);
+			if (IS_ERR(g->ref)) {
+				err = PTR_ERR(g->ref);
+				g->ref = NULL;
+				break;
+			}
+
+			g->mem_base = job->gather_addr_phys[i];
+
+			for (j = 0; j < job->num_gathers; j++) {
+				struct host1x_job_gather *tmp =
+					&job->gathers[j];
+				if (!tmp->ref && tmp->mem_id == g->mem_id) {
+					tmp->ref = g->ref;
+					tmp->mem_base = g->mem_base;
+				}
+			}
+			err = 0;
+			if (host1x_firewall)
+				err = validate(job, pdev, g);
+			if (err)
+				dev_err(&pdev->dev,
+					"Job validate returned %d\n", err);
+			if (!err)
+				err = do_relocs(job, g->mem_id,  g->ref);
+			if (!err)
+				err = do_waitchks(job, host,
+						g->mem_id, g->ref);
+			host1x_memmgr_put(g->ref);
+			if (err)
+				break;
+		}
+	}
+
+	if (host1x_firewall && !err) {
+		err = copy_gathers(job, pdev);
+		if (err) {
+			host1x_job_unpin(job);
+			return err;
+		}
+	}
+
+out:
+	wmb();
+
+	return err;
+}
+
+void host1x_job_unpin(struct host1x_job *job)
+{
+	int i;
+
+	for (i = 0; i < job->num_unpins; i++) {
+		struct host1x_job_unpin_data *unpin = &job->unpins[i];
+		host1x_memmgr_unpin(unpin->h, unpin->mem);
+		host1x_memmgr_put(unpin->h);
+	}
+	job->num_unpins = 0;
+
+	if (job->gather_copy_size)
+		dma_free_writecombine(&job->ch->dev->dev,
+			job->gather_copy_size,
+			job->gather_copy_mapped, job->gather_copy);
+}
+
+/*
+ * Debug routine used to dump job entries
+ */
+void host1x_job_dump(struct device *dev, struct host1x_job *job)
+{
+	dev_dbg(dev, "    SYNCPT_ID   %d\n",
+		job->syncpt_id);
+	dev_dbg(dev, "    SYNCPT_VAL  %d\n",
+		job->syncpt_end);
+	dev_dbg(dev, "    FIRST_GET   0x%x\n",
+		job->first_get);
+	dev_dbg(dev, "    TIMEOUT     %d\n",
+		job->timeout);
+	dev_dbg(dev, "    NUM_SLOTS   %d\n",
+		job->num_slots);
+	dev_dbg(dev, "    NUM_HANDLES %d\n",
+		job->num_unpins);
+}
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
new file mode 100644
index 0000000..428c670
--- /dev/null
+++ b/drivers/gpu/host1x/job.h
@@ -0,0 +1,164 @@
+/*
+ * Tegra host1x Job
+ *
+ * Copyright (c) 2011-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HOST1X_JOB_H
+#define __HOST1X_JOB_H
+
+struct platform_device;
+
+struct host1x_job_gather {
+	u32 words;
+	dma_addr_t mem_base;
+	u32 mem_id;
+	int offset;
+	struct mem_handle *ref;
+};
+
+struct host1x_cmdbuf {
+	__u32 mem;
+	__u32 offset;
+	__u32 words;
+	__u32 pad;
+};
+
+struct host1x_reloc {
+	__u32 cmdbuf_mem;
+	__u32 cmdbuf_offset;
+	__u32 target;
+	__u32 target_offset;
+	__u32 shift;
+	__u32 pad;
+};
+
+struct host1x_waitchk {
+	__u32 mem;
+	__u32 offset;
+	__u32 syncpt_id;
+	__u32 thresh;
+};
+
+/*
+ * Each submit is tracked as a host1x_job.
+ */
+struct host1x_job {
+	/* When refcount goes to zero, job can be freed */
+	struct kref ref;
+
+	/* List entry */
+	struct list_head list;
+
+	/* Channel where job is submitted to */
+	struct host1x_channel *ch;
+
+	int clientid;
+
+	/* Gathers and their memory */
+	struct host1x_job_gather *gathers;
+	int num_gathers;
+
+	/* Wait checks to be processed at submit time */
+	struct host1x_waitchk *waitchk;
+	int num_waitchk;
+	u32 waitchk_mask;
+
+	/* Array of handles to be pinned & unpinned */
+	struct host1x_reloc *relocarray;
+	int num_relocs;
+	struct host1x_job_unpin_data *unpins;
+	int num_unpins;
+
+	dma_addr_t *addr_phys;
+	dma_addr_t *gather_addr_phys;
+	dma_addr_t *reloc_addr_phys;
+
+	/* Sync point id, number of increments and end related to the submit */
+	u32 syncpt_id;
+	u32 syncpt_incrs;
+	u32 syncpt_end;
+
+	/* Maximum time to wait for this job */
+	int timeout;
+
+	/* Null kickoff prevents submit from being sent to hardware */
+	bool null_kickoff;
+
+	/* Index and number of slots used in the push buffer */
+	int first_get;
+	int num_slots;
+
+	/* Copy of gathers */
+	size_t gather_copy_size;
+	dma_addr_t gather_copy;
+	u8 *gather_copy_mapped;
+
+	/* Temporary space for unpin ids */
+	long unsigned int *pin_ids;
+
+	/* Check if register is marked as an address reg */
+	int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);
+
+	/* Request a SETCLASS to this class */
+	u32 class;
+
+	/* Add a channel wait for previous ops to complete */
+	u32 serialize;
+};
+/*
+ * Allocate memory for a job. Just enough memory will be allocated to
+ * accomodate the submit.
+ */
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
+		u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks);
+
+/*
+ * Add a gather to a job.
+ */
+void host1x_job_add_gather(struct host1x_job *job,
+		u32 mem_id, u32 words, u32 offset);
+
+/*
+ * Increment reference going to host1x_job.
+ */
+void host1x_job_get(struct host1x_job *job);
+
+/*
+ * Decrement reference job, free if goes to zero.
+ */
+void host1x_job_put(struct host1x_job *job);
+
+/*
+ * Pin memory related to job. This handles relocation of addresses to the
+ * host1x address space. Handles both the gather memory and any other memory
+ * referred to from the gather buffers.
+ *
+ * Handles also patching out host waits that would wait for an expired sync
+ * point value.
+ */
+int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev);
+
+/*
+ * Unpin memory related to job.
+ */
+void host1x_job_unpin(struct host1x_job *job);
+
+/*
+ * Dump contents of job to debug output.
+ */
+void host1x_job_dump(struct device *dev, struct host1x_job *job);
+
+#endif
diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
new file mode 100644
index 0000000..eceb782
--- /dev/null
+++ b/drivers/gpu/host1x/memmgr.c
@@ -0,0 +1,173 @@
+/*
+ * Tegra host1x Memory Management Abstraction
+ *
+ * Copyright (c) 2012-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/err.h>
+
+#include "memmgr.h"
+#include "cma.h"
+
+struct mem_handle *host1x_memmgr_alloc(size_t size, size_t align, int flags)
+{
+	return NULL;
+}
+
+struct mem_handle *host1x_memmgr_get(u32 id, struct platform_device *dev)
+{
+	struct mem_handle *h = NULL;
+
+	switch (host1x_memmgr_type(id)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		h = (struct mem_handle *) host1x_cma_get(id, dev);
+		break;
+#endif
+	default:
+		break;
+	}
+
+	return h;
+}
+
+void host1x_memmgr_put(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_put(handle);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+struct sg_table *host1x_memmgr_pin(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_pin(handle);
+		break;
+#endif
+	default:
+		return NULL;
+		break;
+	}
+}
+
+void host1x_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_unpin(handle, sgt);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+void *host1x_memmgr_mmap(struct mem_handle *handle)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_mmap(handle);
+		break;
+#endif
+	default:
+		return NULL;
+		break;
+	}
+}
+
+void host1x_memmgr_munmap(struct mem_handle *handle, void *addr)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_munmap(handle, addr);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+void *host1x_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		return host1x_cma_kmap(handle, pagenum);
+		break;
+#endif
+	default:
+		return NULL;
+		break;
+	}
+}
+
+void host1x_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr)
+{
+	switch (host1x_memmgr_type((u32)handle)) {
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	case mem_mgr_type_cma:
+		host1x_cma_kunmap(handle, pagenum, addr);
+		break;
+#endif
+	default:
+		break;
+	}
+}
+
+int host1x_memmgr_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		dma_addr_t *phys_addr,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data)
+{
+	int pin_count = 0;
+
+#if defined(CONFIG_TEGRA_HOST1X_CMA)
+	{
+		int cma_count = host1x_cma_pin_array_ids(dev,
+			ids, MEMMGR_TYPE_MASK,
+			mem_mgr_type_cma,
+			count, &unpin_data[pin_count],
+			phys_addr);
+
+		if (cma_count < 0) {
+			/* clean up previous handles */
+			while (pin_count) {
+				pin_count--;
+				/* unpin, put */
+				host1x_memmgr_unpin(unpin_data[pin_count].h,
+						unpin_data[pin_count].mem);
+				host1x_memmgr_put(unpin_data[pin_count].h);
+			}
+			return cma_count;
+		}
+		pin_count += cma_count;
+	}
+#endif
+	return pin_count;
+}
diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
new file mode 100644
index 0000000..a265fe8
--- /dev/null
+++ b/drivers/gpu/host1x/memmgr.h
@@ -0,0 +1,72 @@
+/*
+ * Tegra host1x Memory Management Abstraction header
+ *
+ * Copyright (c) 2012-2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _HOST1X_MEM_MGR_H
+#define _HOST1X_MEM_MGR_H
+
+struct mem_handle;
+struct platform_device;
+
+struct host1x_job_unpin_data {
+	struct mem_handle *h;
+	struct sg_table *mem;
+};
+
+enum mem_mgr_flag {
+	mem_mgr_flag_uncacheable = 0,
+	mem_mgr_flag_write_combine = 1,
+};
+
+/* Buffer encapsulation */
+enum mem_mgr_type {
+	mem_mgr_type_cma = 2,
+};
+
+#define MEMMGR_TYPE_MASK	0x3
+#define MEMMGR_ID_MASK		~0x3
+
+static inline int host1x_memmgr_type(u32 id) { return id & MEMMGR_TYPE_MASK; }
+static inline int host1x_memmgr_id(u32 id) { return id & MEMMGR_ID_MASK; }
+static inline unsigned int host1x_memmgr_host1x_id(u32 type, u32 handle)
+{
+	if (host1x_memmgr_type(type) != type ||
+		host1x_memmgr_id(handle) != handle)
+		return 0;
+
+	return handle | type;
+}
+
+struct mem_handle *host1x_memmgr_alloc(size_t size, size_t align,
+		int flags);
+struct mem_handle *host1x_memmgr_get(u32 id, struct platform_device *dev);
+void host1x_memmgr_put(struct mem_handle *handle);
+struct sg_table *host1x_memmgr_pin(struct mem_handle *handle);
+void host1x_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt);
+void *host1x_memmgr_mmap(struct mem_handle *handle);
+void host1x_memmgr_munmap(struct mem_handle *handle, void *addr);
+void *host1x_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum);
+void host1x_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum,
+		void *addr);
+
+int host1x_memmgr_pin_array_ids(struct platform_device *dev,
+		long unsigned *ids,
+		dma_addr_t *phys_addr,
+		u32 count,
+		struct host1x_job_unpin_data *unpin_data);
+
+#endif
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 32e2b42..f21c688 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -287,6 +287,12 @@ void host1x_syncpt_debug(struct host1x_syncpt *sp)
 	sp->dev->syncpt_op.debug(sp);
 }
 
+/* remove a wait pointed to by patch_addr */
+int host1x_syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr)
+{
+	return sp->dev->syncpt_op.patch_wait(sp, patch_addr);
+}
+
 int host1x_syncpt_init(struct host1x *host)
 {
 	struct host1x_syncpt *syncpt, *sp;
@@ -305,6 +311,11 @@ int host1x_syncpt_init(struct host1x *host)
 
 	host->syncpt = syncpt;
 
+	/* Allocate sync point to use for clearing waits for expired fences */
+	host->nop_sp = _host1x_syncpt_alloc(host, NULL, 0);
+	if (!host->nop_sp)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index b46d044..255a3a3 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -26,6 +26,7 @@
 struct host1x;
 
 #define NVSYNCPT_INVALID			(-1)
+#define NVSYNCPT_GRAPHICS_HOST			0
 
 struct host1x_syncpt {
 	int id;
@@ -145,6 +146,9 @@ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
 		sp->id < host1x_syncpt_nb_pts(sp->dev);
 }
 
+/* Patch a wait by replacing it with a wait for syncpt 0 value 0 */
+int host1x_syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr);
+
 /* Return id of the sync point */
 u32 host1x_syncpt_id(struct host1x_syncpt *sp);
 
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
index 3c14cac..c63d75c 100644
--- a/include/trace/events/host1x.h
+++ b/include/trace/events/host1x.h
@@ -37,6 +37,190 @@ DECLARE_EVENT_CLASS(host1x,
 	TP_printk("name=%s", __entry->name)
 );
 
+DEFINE_EVENT(host1x, host1x_channel_open,
+	TP_PROTO(const char *name),
+	TP_ARGS(name)
+);
+
+DEFINE_EVENT(host1x, host1x_channel_release,
+	TP_PROTO(const char *name),
+	TP_ARGS(name)
+);
+
+DEFINE_EVENT(host1x, host1x_cdma_begin,
+	TP_PROTO(const char *name),
+	TP_ARGS(name)
+);
+
+DEFINE_EVENT(host1x, host1x_cdma_end,
+	TP_PROTO(const char *name),
+	TP_ARGS(name)
+);
+
+TRACE_EVENT(host1x,
+	TP_PROTO(const char *name, int timeout),
+
+	TP_ARGS(name, timeout),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(int, timeout)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->timeout = timeout;
+	),
+
+	TP_printk("name=%s, timeout=%d",
+		__entry->name, __entry->timeout)
+);
+
+TRACE_EVENT(host1x_cdma_push,
+	TP_PROTO(const char *name, u32 op1, u32 op2),
+
+	TP_ARGS(name, op1, op2),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, op1)
+		__field(u32, op2)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->op1 = op1;
+		__entry->op2 = op2;
+	),
+
+	TP_printk("name=%s, op1=%08x, op2=%08x",
+		__entry->name, __entry->op1, __entry->op2)
+);
+
+TRACE_EVENT(host1x_cdma_push_gather,
+	TP_PROTO(const char *name, u32 mem_id,
+			u32 words, u32 offset, void *cmdbuf),
+
+	TP_ARGS(name, mem_id, words, offset, cmdbuf),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, mem_id)
+		__field(u32, words)
+		__field(u32, offset)
+		__field(bool, cmdbuf)
+		__dynamic_array(u32, cmdbuf, words)
+	),
+
+	TP_fast_assign(
+		if (cmdbuf) {
+			memcpy(__get_dynamic_array(cmdbuf), cmdbuf+offset,
+					words * sizeof(u32));
+		}
+		__entry->cmdbuf = cmdbuf;
+		__entry->name = name;
+		__entry->mem_id = mem_id;
+		__entry->words = words;
+		__entry->offset = offset;
+	),
+
+	TP_printk("name=%s, mem_id=%08x, words=%u, offset=%d, contents=[%s]",
+	  __entry->name, __entry->mem_id,
+	  __entry->words, __entry->offset,
+	  __print_hex(__get_dynamic_array(cmdbuf),
+		  __entry->cmdbuf ? __entry->words * 4 : 0))
+);
+
+TRACE_EVENT(host1x_channel_submit,
+	TP_PROTO(const char *name, u32 cmdbufs, u32 relocs, u32 waitchks,
+			u32 syncpt_id, u32 syncpt_incrs),
+
+	TP_ARGS(name, cmdbufs, relocs, waitchks, syncpt_id, syncpt_incrs),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, cmdbufs)
+		__field(u32, relocs)
+		__field(u32, waitchks)
+		__field(u32, syncpt_id)
+		__field(u32, syncpt_incrs)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->cmdbufs = cmdbufs;
+		__entry->relocs = relocs;
+		__entry->waitchks = waitchks;
+		__entry->syncpt_id = syncpt_id;
+		__entry->syncpt_incrs = syncpt_incrs;
+	),
+
+	TP_printk("name=%s, cmdbufs=%u, relocs=%u, waitchks=%d,"
+		"syncpt_id=%u, syncpt_incrs=%u",
+	  __entry->name, __entry->cmdbufs, __entry->relocs, __entry->waitchks,
+	  __entry->syncpt_id, __entry->syncpt_incrs)
+);
+
+TRACE_EVENT(host1x_channel_submitted,
+	TP_PROTO(const char *name, u32 syncpt_base, u32 syncpt_max),
+
+	TP_ARGS(name, syncpt_base, syncpt_max),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, syncpt_base)
+		__field(u32, syncpt_max)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->syncpt_base = syncpt_base;
+		__entry->syncpt_max = syncpt_max;
+	),
+
+	TP_printk("name=%s, syncpt_base=%d, syncpt_max=%d",
+		__entry->name, __entry->syncpt_base, __entry->syncpt_max)
+);
+
+TRACE_EVENT(host1x_channel_submit_complete,
+	TP_PROTO(const char *name, int count, u32 thresh),
+
+	TP_ARGS(name, count, thresh),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(int, count)
+		__field(u32, thresh)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->count = count;
+		__entry->thresh = thresh;
+	),
+
+	TP_printk("name=%s, count=%d, thresh=%d",
+		__entry->name, __entry->count, __entry->thresh)
+);
+
+TRACE_EVENT(host1x_wait_cdma,
+	TP_PROTO(const char *name, u32 eventid),
+
+	TP_ARGS(name, eventid),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, eventid)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->eventid = eventid;
+	),
+
+	TP_printk("name=%s, event=%d", __entry->name, __entry->eventid)
+);
+
 TRACE_EVENT(host1x_syncpt_load_min,
 	TP_PROTO(u32 id, u32 val),
 
@@ -55,6 +239,33 @@ TRACE_EVENT(host1x_syncpt_load_min,
 	TP_printk("id=%d, val=%d", __entry->id, __entry->val)
 );
 
+TRACE_EVENT(host1x_syncpt_wait_check,
+	TP_PROTO(u32 mem_id, u32 offset, u32 syncpt_id, u32 thresh, u32 min),
+
+	TP_ARGS(mem_id, offset, syncpt_id, thresh, min),
+
+	TP_STRUCT__entry(
+		__field(u32, mem_id)
+		__field(u32, offset)
+		__field(u32, syncpt_id)
+		__field(u32, thresh)
+		__field(u32, min)
+	),
+
+	TP_fast_assign(
+		__entry->mem_id = mem_id;
+		__entry->offset = offset;
+		__entry->syncpt_id = syncpt_id;
+		__entry->thresh = thresh;
+		__entry->min = min;
+	),
+
+	TP_printk("mem_id=%08x, offset=%05x, id=%d, thresh=%d, current=%d",
+		__entry->mem_id, __entry->offset,
+		__entry->syncpt_id, __entry->thresh,
+		__entry->min)
+);
+
 #endif /*  _TRACE_HOST1X_H */
 
 /* This part must be outside protection */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
                   ` (2 preceding siblings ...)
  2013-01-15 11:43 ` [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support Terje Bergstrom
@ 2013-01-15 11:44 ` Terje Bergstrom
  2013-02-04 11:03   ` Thierry Reding
  2013-01-15 11:44 ` [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x Terje Bergstrom
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:44 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

Add support for host1x debugging. Adds debugfs entries, and dumps
channel state to UART in case of stuck job.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Makefile                 |    1 +
 drivers/gpu/host1x/cdma.c                   |   34 +++
 drivers/gpu/host1x/debug.c                  |  215 ++++++++++++++
 drivers/gpu/host1x/debug.h                  |   50 ++++
 drivers/gpu/host1x/dev.c                    |    3 +
 drivers/gpu/host1x/dev.h                    |   17 ++
 drivers/gpu/host1x/hw/cdma_hw.c             |    3 +
 drivers/gpu/host1x/hw/debug_hw.c            |  400 +++++++++++++++++++++++++++
 drivers/gpu/host1x/hw/host1x01.c            |    2 +
 drivers/gpu/host1x/hw/hw_host1x01_channel.h |   18 ++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h    |  115 ++++++++
 drivers/gpu/host1x/hw/syncpt_hw.c           |    1 +
 drivers/gpu/host1x/syncpt.c                 |    3 +
 13 files changed, 862 insertions(+)
 create mode 100644 drivers/gpu/host1x/debug.c
 create mode 100644 drivers/gpu/host1x/debug.h
 create mode 100644 drivers/gpu/host1x/hw/debug_hw.c

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index cdd87c8..697d49a 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -7,6 +7,7 @@ host1x-y = \
 	cdma.o \
 	channel.o \
 	job.o \
+	debug.o \
 	memmgr.o \
 	hw/host1x01.o
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index d6a38d2..12dd46c 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -19,6 +19,7 @@
 #include "cdma.h"
 #include "channel.h"
 #include "dev.h"
+#include "debug.h"
 #include "memmgr.h"
 #include "job.h"
 #include <asm/cacheflush.h>
@@ -370,12 +371,42 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 	return 0;
 }
 
+static void trace_write_gather(struct host1x_cdma *cdma,
+		struct mem_handle *ref,
+		u32 offset, u32 words)
+{
+	void *mem = NULL;
+
+	if (host1x_debug_trace_cmdbuf)
+		mem = host1x_memmgr_mmap(ref);
+
+	if (mem) {
+		u32 i;
+		/*
+		 * Write in batches of 128 as there seems to be a limit
+		 * of how much you can output to ftrace at once.
+		 */
+		for (i = 0; i < words; i += TRACE_MAX_LENGTH) {
+			trace_host1x_cdma_push_gather(
+				cdma_to_channel(cdma)->dev->name,
+				(u32)ref,
+				min(words - i, TRACE_MAX_LENGTH),
+				offset + i * sizeof(u32),
+				mem);
+		}
+		host1x_memmgr_munmap(ref, mem);
+	}
+}
+
 /*
  * Push two words into a push buffer slot
  * Blocks as necessary if the push buffer is full.
  */
 void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
 {
+	if (host1x_debug_trace_cmdbuf)
+		trace_host1x_cdma_push(cdma_to_channel(cdma)->dev->name,
+				op1, op2);
 	host1x_cdma_push_gather(cdma, NULL, 0, op1, op2);
 }
 
@@ -391,6 +422,9 @@ void host1x_cdma_push_gather(struct host1x_cdma *cdma,
 	u32 slots_free = cdma->slots_free;
 	struct push_buffer *pb = &cdma->push_buffer;
 
+	if (handle)
+		trace_write_gather(cdma, handle, offset, op1 & 0xffff);
+
 	if (slots_free == 0) {
 		host1x->cdma_op.kick(cdma);
 		slots_free = host1x_cdma_wait_locked(cdma,
diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
new file mode 100644
index 0000000..29cbe93
--- /dev/null
+++ b/drivers/gpu/host1x/debug.c
@@ -0,0 +1,215 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling <konkers@android.com>
+ *
+ * Copyright (C) 2011-2012 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/uaccess.h>
+
+#include <linux/io.h>
+
+#include "dev.h"
+#include "debug.h"
+#include "channel.h"
+
+static pid_t host1x_debug_null_kickoff_pid;
+unsigned int host1x_debug_trace_cmdbuf;
+
+static pid_t host1x_debug_force_timeout_pid;
+static u32 host1x_debug_force_timeout_val;
+static u32 host1x_debug_force_timeout_channel;
+
+void host1x_debug_output(struct output *o, const char *fmt, ...)
+{
+	va_list args;
+	int len;
+
+	va_start(args, fmt);
+	len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+	va_end(args);
+	o->fn(o->ctx, o->buf, len);
+}
+
+static int show_channels(struct host1x_channel *ch, void *data)
+{
+	struct host1x *m = host1x_get_host(ch->dev);
+	struct output *o = data;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount) {
+		mutex_lock(&ch->cdma.lock);
+		m->debug_op.show_channel_fifo(m, ch, o, ch->chid);
+		m->debug_op.show_channel_cdma(m, ch, o, ch->chid);
+		mutex_unlock(&ch->cdma.lock);
+	}
+	mutex_unlock(&ch->reflock);
+
+	return 0;
+}
+
+static void show_syncpts(struct host1x *m, struct output *o)
+{
+	int i;
+	host1x_debug_output(o, "---- syncpts ----\n");
+	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
+		u32 max = host1x_syncpt_read_max(m->syncpt + i);
+		u32 min = host1x_syncpt_load_min(m->syncpt + i);
+		if (!min && !max)
+			continue;
+		host1x_debug_output(o, "id %d (%s) min %d max %d\n",
+			i, m->syncpt[i].name,
+			min, max);
+	}
+
+	for (i = 0; i < host1x_syncpt_nb_bases(m); i++) {
+		u32 base_val;
+		base_val = host1x_syncpt_read_wait_base(m->syncpt + i);
+		if (base_val)
+			host1x_debug_output(o, "waitbase id %d val %d\n",
+					i, base_val);
+	}
+
+	host1x_debug_output(o, "\n");
+}
+
+static void show_all(struct host1x *m, struct output *o)
+{
+	m->debug_op.show_mlocks(m, o);
+	show_syncpts(m, o);
+	host1x_debug_output(o, "---- channels ----\n");
+	host1x_channel_for_all(m, o, show_channels);
+}
+
+#ifdef CONFIG_DEBUG_FS
+static int show_channels_no_fifo(struct host1x_channel *ch, void *data)
+{
+	struct host1x *host1x = host1x_get_host(ch->dev);
+	struct output *o = data;
+
+	mutex_lock(&ch->reflock);
+	if (ch->refcount) {
+		mutex_lock(&ch->cdma.lock);
+		host1x->debug_op.show_channel_cdma(host1x, ch, o, ch->chid);
+		mutex_unlock(&ch->cdma.lock);
+	}
+	mutex_unlock(&ch->reflock);
+
+	return 0;
+}
+
+static void show_all_no_fifo(struct host1x *host1x, struct output *o)
+{
+	host1x->debug_op.show_mlocks(host1x, o);
+	show_syncpts(host1x, o);
+	host1x_debug_output(o, "---- channels ----\n");
+	host1x_channel_for_all(host1x, o, show_channels_no_fifo);
+}
+
+static int host1x_debug_show_all(struct seq_file *s, void *unused)
+{
+	struct output o = {
+		.fn = write_to_seqfile,
+		.ctx = s
+	};
+	show_all(s->private, &o);
+	return 0;
+}
+
+static int host1x_debug_show(struct seq_file *s, void *unused)
+{
+	struct output o = {
+		.fn = write_to_seqfile,
+		.ctx = s
+	};
+	show_all_no_fifo(s->private, &o);
+	return 0;
+}
+
+static int host1x_debug_open_all(struct inode *inode, struct file *file)
+{
+	return single_open(file, host1x_debug_show_all, inode->i_private);
+}
+
+static const struct file_operations host1x_debug_all_fops = {
+	.open		= host1x_debug_open_all,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int host1x_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, host1x_debug_show, inode->i_private);
+}
+
+static const struct file_operations host1x_debug_fops = {
+	.open		= host1x_debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+void host1x_debug_init(struct host1x *host1x)
+{
+	struct dentry *de = debugfs_create_dir("tegra-host1x", NULL);
+
+	if (!de)
+		return;
+
+	/* Store the created entry */
+	host1x->debugfs = de;
+
+	debugfs_create_file("status", S_IRUGO, de,
+			host1x, &host1x_debug_fops);
+	debugfs_create_file("status_all", S_IRUGO, de,
+			host1x, &host1x_debug_all_fops);
+
+	debugfs_create_u32("null_kickoff_pid", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_null_kickoff_pid);
+	debugfs_create_u32("trace_cmdbuf", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_trace_cmdbuf);
+
+	if (host1x->debug_op.debug_init)
+		host1x->debug_op.debug_init(de);
+
+	debugfs_create_u32("force_timeout_pid", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_pid);
+	debugfs_create_u32("force_timeout_val", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_val);
+	debugfs_create_u32("force_timeout_channel", S_IRUGO|S_IWUSR, de,
+			&host1x_debug_force_timeout_channel);
+}
+
+void host1x_debug_deinit(struct host1x *host1x)
+{
+	debugfs_remove_recursive(host1x->debugfs);
+}
+#else
+void host1x_debug_init(struct host1x *host1x)
+{
+}
+void host1x_debug_deinit(struct host1x *host1x)
+{
+}
+#endif
+
+void host1x_debug_dump(struct host1x *host1x)
+{
+	struct output o = {
+		.fn = write_to_printk
+	};
+	show_all(host1x, &o);
+}
diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
new file mode 100644
index 0000000..fd3560b
--- /dev/null
+++ b/drivers/gpu/host1x/debug.h
@@ -0,0 +1,50 @@
+/*
+ * Tegra host1x Debug
+ *
+ * Copyright (c) 2011-2012 NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __NVHOST_DEBUG_H
+#define __NVHOST_DEBUG_H
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+struct host1x;
+
+struct output {
+	void (*fn)(void *ctx, const char *str, size_t len);
+	void *ctx;
+	char buf[256];
+};
+
+static inline void write_to_seqfile(void *ctx, const char *str, size_t len)
+{
+	seq_write((struct seq_file *)ctx, str, len);
+}
+
+static inline void write_to_printk(void *ctx, const char *str, size_t len)
+{
+	pr_info("%s", str);
+}
+
+void __printf(2, 3) host1x_debug_output(struct output *o, const char *fmt, ...);
+
+extern unsigned int host1x_debug_trace_cmdbuf;
+
+void host1x_debug_init(struct host1x *master);
+void host1x_debug_deinit(struct host1x *master);
+void host1x_debug_dump(struct host1x *master);
+
+#endif /*__NVHOST_DEBUG_H */
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 80311ca..5aa7d28 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -26,6 +26,7 @@
 #include "dev.h"
 #include "intr.h"
 #include "channel.h"
+#include "debug.h"
 #include "hw/host1x01.h"
 
 #define CREATE_TRACE_POINTS
@@ -150,6 +151,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_intr_start(&host->intr, clk_get_rate(host->clk));
 
+	host1x_debug_init(host);
+
 	dev_info(&dev->dev, "initialized\n");
 
 	return 0;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 2fefa78..467a92e 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -33,6 +33,7 @@ struct push_buffer;
 struct dentry;
 struct mem_handle;
 struct platform_device;
+struct output;
 
 struct host1x_channel_ops {
 	int (*init)(struct host1x_channel *,
@@ -71,6 +72,21 @@ struct host1x_pushbuffer_ops {
 	u32 (*putptr)(struct push_buffer *);
 };
 
+struct host1x_debug_ops {
+	void (*debug_init)(struct dentry *de);
+	void (*show_channel_cdma)(struct host1x *,
+				  struct host1x_channel *,
+				  struct output *,
+				  int chid);
+	void (*show_channel_fifo)(struct host1x *,
+				  struct host1x_channel *,
+				  struct output *,
+				  int chid);
+	void (*show_mlocks)(struct host1x *m,
+			    struct output *o);
+
+};
+
 struct host1x_syncpt_ops {
 	void (*reset)(struct host1x_syncpt *);
 	void (*reset_wait_base)(struct host1x_syncpt *);
@@ -117,6 +133,7 @@ struct host1x {
 	struct host1x_channel_ops channel_op;
 	struct host1x_cdma_ops cdma_op;
 	struct host1x_pushbuffer_ops cdma_pb_op;
+	struct host1x_debug_ops debug_op;
 	struct host1x_syncpt_ops syncpt_op;
 	struct host1x_intr_ops intr_op;
 
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 7a44418..2228246 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -22,6 +22,7 @@
 #include "cdma.h"
 #include "channel.h"
 #include "dev.h"
+#include "debug.h"
 #include "memmgr.h"
 
 #include "cdma_hw.h"
@@ -407,6 +408,8 @@ static void cdma_timeout_handler(struct work_struct *work)
 	host1x = cdma_to_host1x(cdma);
 	ch = cdma_to_channel(cdma);
 
+	host1x_debug_dump(cdma_to_host1x(cdma));
+
 	mutex_lock(&cdma->lock);
 
 	if (!cdma->timeout.clientid) {
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
new file mode 100644
index 0000000..0b8d466
--- /dev/null
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -0,0 +1,400 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Erik Gilling <konkers@android.com>
+ *
+ * Copyright (C) 2011 NVIDIA Corporation
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+
+#include <linux/io.h>
+
+#include "dev.h"
+#include "debug.h"
+#include "cdma.h"
+#include "channel.h"
+#include "memmgr.h"
+
+#define NVHOST_DEBUG_MAX_PAGE_OFFSET 102400
+
+enum {
+	NVHOST_DBG_STATE_CMD = 0,
+	NVHOST_DBG_STATE_DATA = 1,
+	NVHOST_DBG_STATE_GATHER = 2
+};
+
+static int show_channel_command(struct output *o, u32 addr, u32 val, int *count)
+{
+	unsigned mask;
+	unsigned subop;
+
+	switch (val >> 28) {
+	case 0x0:
+		mask = val & 0x3f;
+		if (mask) {
+			host1x_debug_output(o,
+				"SETCL(class=%03x, offset=%03x, mask=%02x, [",
+				val >> 6 & 0x3ff, val >> 16 & 0xfff, mask);
+			*count = hweight8(mask);
+			return NVHOST_DBG_STATE_DATA;
+		} else {
+			host1x_debug_output(o, "SETCL(class=%03x)\n",
+				val >> 6 & 0x3ff);
+			return NVHOST_DBG_STATE_CMD;
+		}
+
+	case 0x1:
+		host1x_debug_output(o, "INCR(offset=%03x, [",
+			val >> 16 & 0xfff);
+		*count = val & 0xffff;
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x2:
+		host1x_debug_output(o, "NONINCR(offset=%03x, [",
+			val >> 16 & 0xfff);
+		*count = val & 0xffff;
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x3:
+		mask = val & 0xffff;
+		host1x_debug_output(o, "MASK(offset=%03x, mask=%03x, [",
+			   val >> 16 & 0xfff, mask);
+		*count = hweight16(mask);
+		return NVHOST_DBG_STATE_DATA;
+
+	case 0x4:
+		host1x_debug_output(o, "IMM(offset=%03x, data=%03x)\n",
+			   val >> 16 & 0xfff, val & 0xffff);
+		return NVHOST_DBG_STATE_CMD;
+
+	case 0x5:
+		host1x_debug_output(o, "RESTART(offset=%08x)\n", val << 4);
+		return NVHOST_DBG_STATE_CMD;
+
+	case 0x6:
+		host1x_debug_output(o,
+			"GATHER(offset=%03x, insert=%d, type=%d, count=%04x, addr=[",
+			val >> 16 & 0xfff, val >> 15 & 0x1, val >> 14 & 0x1,
+			val & 0x3fff);
+		*count = val & 0x3fff; /* TODO: insert */
+		return NVHOST_DBG_STATE_GATHER;
+
+	case 0xe:
+		subop = val >> 24 & 0xf;
+		if (subop == 0)
+			host1x_debug_output(o, "ACQUIRE_MLOCK(index=%d)\n",
+				val & 0xff);
+		else if (subop == 1)
+			host1x_debug_output(o, "RELEASE_MLOCK(index=%d)\n",
+				val & 0xff);
+		else
+			host1x_debug_output(o, "EXTEND_UNKNOWN(%08x)\n", val);
+		return NVHOST_DBG_STATE_CMD;
+
+	default:
+		return NVHOST_DBG_STATE_CMD;
+	}
+}
+
+static void show_channel_gather(struct output *o, u32 addr,
+		phys_addr_t phys_addr, u32 words, struct host1x_cdma *cdma);
+
+static void show_channel_word(struct output *o, int *state, int *count,
+		u32 addr, u32 val, struct host1x_cdma *cdma)
+{
+	static int start_count, dont_print;
+
+	switch (*state) {
+	case NVHOST_DBG_STATE_CMD:
+		if (addr)
+			host1x_debug_output(o, "%08x: %08x:", addr, val);
+		else
+			host1x_debug_output(o, "%08x:", val);
+
+		*state = show_channel_command(o, addr, val, count);
+		dont_print = 0;
+		start_count = *count;
+		if (*state == NVHOST_DBG_STATE_DATA && *count == 0) {
+			*state = NVHOST_DBG_STATE_CMD;
+			host1x_debug_output(o, "])\n");
+		}
+		break;
+
+	case NVHOST_DBG_STATE_DATA:
+		(*count)--;
+		if (start_count - *count < 64)
+			host1x_debug_output(o, "%08x%s",
+				val, *count > 0 ? ", " : "])\n");
+		else if (!dont_print && (*count > 0)) {
+			host1x_debug_output(o, "[truncated; %d more words]\n",
+				*count);
+			dont_print = 1;
+		}
+		if (*count == 0)
+			*state = NVHOST_DBG_STATE_CMD;
+		break;
+
+	case NVHOST_DBG_STATE_GATHER:
+		*state = NVHOST_DBG_STATE_CMD;
+		host1x_debug_output(o, "%08x]):\n", val);
+		if (cdma) {
+			show_channel_gather(o, addr, val,
+					*count, cdma);
+		}
+		break;
+	}
+}
+
+static void do_show_channel_gather(struct output *o,
+		phys_addr_t phys_addr,
+		u32 words, struct host1x_cdma *cdma,
+		phys_addr_t pin_addr, u32 *map_addr)
+{
+	/* Map dmaget cursor to corresponding mem handle */
+	u32 offset;
+	int state, count, i;
+
+	offset = phys_addr - pin_addr;
+	/*
+	 * Sometimes we're given different hardware address to the same
+	 * page - in these cases the offset will get an invalid number and
+	 * we just have to bail out.
+	 */
+	if (offset > NVHOST_DEBUG_MAX_PAGE_OFFSET) {
+		host1x_debug_output(o, "[address mismatch]\n");
+	} else {
+		/* GATHER buffer starts always with commands */
+		state = NVHOST_DBG_STATE_CMD;
+		for (i = 0; i < words; i++)
+			show_channel_word(o, &state, &count,
+					phys_addr + i * 4,
+					*(map_addr + offset/4 + i),
+					cdma);
+	}
+}
+
+static void show_channel_gather(struct output *o, u32 addr,
+		phys_addr_t phys_addr,
+		u32 words, struct host1x_cdma *cdma)
+{
+	/* Map dmaget cursor to corresponding mem handle */
+	struct push_buffer *pb = &cdma->push_buffer;
+	u32 cur = addr - pb->phys;
+	struct mem_handle *mem = pb->handle[cur/8];
+	u32 *map_addr, offset;
+	struct sg_table *sgt;
+
+	if (!mem) {
+		host1x_debug_output(o, "[already deallocated]\n");
+		return;
+	}
+
+	map_addr = host1x_memmgr_mmap(mem);
+	if (!map_addr) {
+		host1x_debug_output(o, "[could not mmap]\n");
+		return;
+	}
+
+	/* Get base address from mem */
+	sgt = host1x_memmgr_pin(mem);
+	if (IS_ERR(sgt)) {
+		host1x_debug_output(o, "[couldn't pin]\n");
+		host1x_memmgr_munmap(mem, map_addr);
+		return;
+	}
+
+	offset = phys_addr - sg_dma_address(sgt->sgl);
+	do_show_channel_gather(o, phys_addr, words, cdma,
+			sg_dma_address(sgt->sgl), map_addr);
+	host1x_memmgr_unpin(mem, sgt);
+	host1x_memmgr_munmap(mem, map_addr);
+}
+
+static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
+{
+	struct host1x_job *job;
+
+	list_for_each_entry(job, &cdma->sync_queue, list) {
+		int i;
+		host1x_debug_output(o,
+				"\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
+				" first_get=%08x, timeout=%d"
+				" num_slots=%d, num_handles=%d\n",
+				job,
+				job->syncpt_id,
+				job->syncpt_end,
+				job->first_get,
+				job->timeout,
+				job->num_slots,
+				job->num_unpins);
+
+		for (i = 0; i < job->num_gathers; i++) {
+			struct host1x_job_gather *g = &job->gathers[i];
+			u32 *mapped = host1x_memmgr_mmap(g->ref);
+			if (!mapped) {
+				host1x_debug_output(o, "[could not mmap]\n");
+				continue;
+			}
+
+			host1x_debug_output(o,
+				"    GATHER at %08x+%04x, %d words\n",
+				g->mem_base, g->offset, g->words);
+
+			do_show_channel_gather(o, g->mem_base + g->offset,
+					g->words, cdma, g->mem_base, mapped);
+			host1x_memmgr_munmap(g->ref, mapped);
+		}
+	}
+}
+
+static void host1x_debug_show_channel_cdma(struct host1x *m,
+	struct host1x_channel *ch, struct output *o, int chid)
+{
+	struct host1x_channel *channel = ch;
+	struct host1x_cdma *cdma = &channel->cdma;
+	u32 dmaput, dmaget, dmactrl;
+	u32 cbstat, cbread;
+	u32 val, base, baseval;
+
+	dmaput = host1x_ch_readl(channel, HOST1X_CHANNEL_DMAPUT);
+	dmaget = host1x_ch_readl(channel, HOST1X_CHANNEL_DMAGET);
+	dmactrl = host1x_ch_readl(channel, HOST1X_CHANNEL_DMACTRL);
+	cbread = host1x_sync_readl(m, HOST1X_SYNC_CBREAD0 + 4 * chid);
+	cbstat = host1x_sync_readl(m, HOST1X_SYNC_CBSTAT_0 + 4 * chid);
+
+	host1x_debug_output(o, "%d-%s: ", chid,
+			    channel->dev->name);
+
+	if (HOST1X_CHANNEL_DMACTRL_DMASTOP_V(dmactrl)
+		|| !channel->cdma.push_buffer.mapped) {
+		host1x_debug_output(o, "inactive\n\n");
+		return;
+	}
+
+	switch (cbstat) {
+	case 0x00010008:
+		host1x_debug_output(o, "waiting on syncpt %d val %d\n",
+			cbread >> 24, cbread & 0xffffff);
+		break;
+
+	case 0x00010009:
+		base = (cbread >> 16) & 0xff;
+		baseval = host1x_sync_readl(m,
+				HOST1X_SYNC_SYNCPT_BASE_0 + 4 * base);
+		val = cbread & 0xffff;
+		host1x_debug_output(o, "waiting on syncpt %d val %d "
+			  "(base %d = %d; offset = %d)\n",
+			cbread >> 24, baseval + val,
+			base, baseval, val);
+		break;
+
+	default:
+		host1x_debug_output(o,
+				"active class %02x, offset %04x, val %08x\n",
+				HOST1X_SYNC_CBSTAT_0_CBCLASS0_V(cbstat),
+				HOST1X_SYNC_CBSTAT_0_CBOFFSET0_V(cbstat),
+				cbread);
+		break;
+	}
+
+	host1x_debug_output(o, "DMAPUT %08x, DMAGET %08x, DMACTL %08x\n",
+		dmaput, dmaget, dmactrl);
+	host1x_debug_output(o, "CBREAD %08x, CBSTAT %08x\n", cbread, cbstat);
+
+	show_channel_gathers(o, cdma);
+	host1x_debug_output(o, "\n");
+}
+
+static void host1x_debug_show_channel_fifo(struct host1x *m,
+	struct host1x_channel *ch, struct output *o, int chid)
+{
+	u32 val, rd_ptr, wr_ptr, start, end;
+	struct host1x_channel *channel = ch;
+	int state, count;
+
+	host1x_debug_output(o, "%d: fifo:\n", chid);
+
+	val = host1x_ch_readl(channel, HOST1X_CHANNEL_FIFOSTAT);
+	host1x_debug_output(o, "FIFOSTAT %08x\n", val);
+	if (HOST1X_CHANNEL_FIFOSTAT_CFEMPTY_V(val)) {
+		host1x_debug_output(o, "[empty]\n");
+		return;
+	}
+
+	host1x_sync_writel(m, 0x0, HOST1X_SYNC_CFPEEK_CTRL);
+	host1x_sync_writel(m, HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(1)
+			| HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_CHANNR_F(chid),
+		HOST1X_SYNC_CFPEEK_CTRL);
+
+	val = host1x_sync_readl(m, HOST1X_SYNC_CFPEEK_PTRS);
+	rd_ptr = HOST1X_SYNC_CFPEEK_PTRS_CF_RD_PTR_V(val);
+	wr_ptr = HOST1X_SYNC_CFPEEK_PTRS_CF_WR_PTR_V(val);
+
+	val = host1x_sync_readl(m, HOST1X_SYNC_CF0_SETUP + 4 * chid);
+	start = HOST1X_SYNC_CF0_SETUP_CF0_BASE_V(val);
+	end = HOST1X_SYNC_CF0_SETUP_CF0_LIMIT_V(val);
+
+	state = NVHOST_DBG_STATE_CMD;
+
+	do {
+		host1x_sync_writel(m, 0x0, HOST1X_SYNC_CFPEEK_CTRL);
+		host1x_sync_writel(m, HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(1)
+				| HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_CHANNR_F(chid)
+				| HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ADDR_F(rd_ptr),
+			HOST1X_SYNC_CFPEEK_CTRL);
+		val = host1x_sync_readl(m, HOST1X_SYNC_CFPEEK_READ);
+
+		show_channel_word(o, &state, &count, 0, val, NULL);
+
+		if (rd_ptr == end)
+			rd_ptr = start;
+		else
+			rd_ptr++;
+	} while (rd_ptr != wr_ptr);
+
+	if (state == NVHOST_DBG_STATE_DATA)
+		host1x_debug_output(o, ", ...])\n");
+	host1x_debug_output(o, "\n");
+
+	host1x_sync_writel(m, 0x0, HOST1X_SYNC_CFPEEK_CTRL);
+}
+
+static void host1x_debug_show_mlocks(struct host1x *m, struct output *o)
+{
+	int i;
+
+	host1x_debug_output(o, "---- mlocks ----\n");
+	for (i = 0; i < host1x_syncpt_nb_mlocks(m); i++) {
+		u32 owner = host1x_sync_readl(m,
+				HOST1X_SYNC_MLOCK_OWNER_0 + i);
+		if (HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_CH_OWNS_0_V(owner))
+			host1x_debug_output(o, "%d: locked by channel %d\n",
+				i,
+				HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_OWNER_CHID_0_F(
+					owner));
+		else if (HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_CPU_OWNS_0_V(owner))
+			host1x_debug_output(o, "%d: locked by cpu\n", i);
+		else
+			host1x_debug_output(o, "%d: unlocked\n", i);
+	}
+	host1x_debug_output(o, "\n");
+}
+
+static const struct host1x_debug_ops host1x_debug_ops = {
+	.show_channel_cdma = host1x_debug_show_channel_cdma,
+	.show_channel_fifo = host1x_debug_show_channel_fifo,
+	.show_mlocks = host1x_debug_show_mlocks,
+};
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
index 7569a1e..1bc1552 100644
--- a/drivers/gpu/host1x/hw/host1x01.c
+++ b/drivers/gpu/host1x/hw/host1x01.c
@@ -28,6 +28,7 @@
 
 #include "hw/channel_hw.c"
 #include "hw/cdma_hw.c"
+#include "hw/debug_hw.c"
 #include "hw/syncpt_hw.c"
 #include "hw/intr_hw.c"
 
@@ -36,6 +37,7 @@ int host1x01_init(struct host1x *host)
 	host->channel_op = host1x_channel_ops;
 	host->cdma_op = host1x_cdma_ops;
 	host->cdma_pb_op = host1x_pushbuffer_ops;
+	host->debug_op = host1x_debug_ops;
 	host->syncpt_op = host1x_syncpt_ops;
 	host->intr_op = host1x_intr_ops;
 
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
index dad4fee..79bcd5a 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
@@ -51,6 +51,18 @@
 #ifndef __hw_host1x_channel_host1x_h__
 #define __hw_host1x_channel_host1x_h__
 
+static inline u32 host1x_channel_fifostat_r(void)
+{
+	return 0x0;
+}
+#define HOST1X_CHANNEL_FIFOSTAT \
+	host1x_channel_fifostat_r()
+static inline u32 host1x_channel_fifostat_cfempty_v(u32 r)
+{
+	return (r >> 10) & 0x1;
+}
+#define HOST1X_CHANNEL_FIFOSTAT_CFEMPTY_V(r) \
+	host1x_channel_fifostat_cfempty_v(r)
 static inline u32 host1x_channel_dmastart_r(void)
 {
 	return 0x14;
@@ -87,6 +99,12 @@ static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v)
 }
 #define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \
 	host1x_channel_dmactrl_dmastop_f(v)
+static inline u32 host1x_channel_dmactrl_dmastop_v(u32 r)
+{
+	return (r >> 0) & 0x1;
+}
+#define HOST1X_CHANNEL_DMACTRL_DMASTOP_V(r) \
+	host1x_channel_dmactrl_dmastop_v(r)
 static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v)
 {
 	return (v & 0x1) << 1;
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index 3073d37..22daa3f 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -69,6 +69,24 @@ static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void)
 }
 #define HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 \
 	host1x_sync_syncpt_thresh_int_enable_cpu0_r()
+static inline u32 host1x_sync_cf0_setup_r(void)
+{
+	return 0x80;
+}
+#define HOST1X_SYNC_CF0_SETUP \
+	host1x_sync_cf0_setup_r()
+static inline u32 host1x_sync_cf0_setup_cf0_base_v(u32 r)
+{
+	return (r >> 0) & 0x1ff;
+}
+#define HOST1X_SYNC_CF0_SETUP_CF0_BASE_V(r) \
+	host1x_sync_cf0_setup_cf0_base_v(r)
+static inline u32 host1x_sync_cf0_setup_cf0_limit_v(u32 r)
+{
+	return (r >> 16) & 0x1ff;
+}
+#define HOST1X_SYNC_CF0_SETUP_CF0_LIMIT_V(r) \
+	host1x_sync_cf0_setup_cf0_limit_v(r)
 static inline u32 host1x_sync_cmdproc_stop_r(void)
 {
 	return 0xac;
@@ -99,6 +117,30 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_owner_0_r(void)
+{
+	return 0x340;
+}
+#define HOST1X_SYNC_MLOCK_OWNER_0 \
+	host1x_sync_mlock_owner_0_r()
+static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(u32 v)
+{
+	return (v & 0xf) << 8;
+}
+#define HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_OWNER_CHID_0_F(v) \
+	host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(v)
+static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(u32 r)
+{
+	return (r >> 1) & 0x1;
+}
+#define HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_CPU_OWNS_0_V(r) \
+	host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(r)
+static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(u32 r)
+{
+	return (r >> 0) & 0x1;
+}
+#define HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_CH_OWNS_0_V(r) \
+	host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(r)
 static inline u32 host1x_sync_syncpt_0_r(void)
 {
 	return 0x400;
@@ -123,4 +165,77 @@ static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
 }
 #define HOST1X_SYNC_SYNCPT_CPU_INCR \
 	host1x_sync_syncpt_cpu_incr_r()
+static inline u32 host1x_sync_cbread0_r(void)
+{
+	return 0x720;
+}
+#define HOST1X_SYNC_CBREAD0 \
+	host1x_sync_cbread0_r()
+static inline u32 host1x_sync_cfpeek_ctrl_r(void)
+{
+	return 0x74c;
+}
+#define HOST1X_SYNC_CFPEEK_CTRL \
+	host1x_sync_cfpeek_ctrl_r()
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
+{
+	return (v & 0x1ff) << 0;
+}
+#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ADDR_F(v) \
+	host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v)
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_f(u32 v)
+{
+	return (v & 0x7) << 16;
+}
+#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_CHANNR_F(v) \
+	host1x_sync_cfpeek_ctrl_cfpeek_channr_f(v)
+static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
+{
+	return (v & 0x1) << 31;
+}
+#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(v) \
+	host1x_sync_cfpeek_ctrl_cfpeek_ena_f(v)
+static inline u32 host1x_sync_cfpeek_read_r(void)
+{
+	return 0x750;
+}
+#define HOST1X_SYNC_CFPEEK_READ \
+	host1x_sync_cfpeek_read_r()
+static inline u32 host1x_sync_cfpeek_ptrs_r(void)
+{
+	return 0x754;
+}
+#define HOST1X_SYNC_CFPEEK_PTRS \
+	host1x_sync_cfpeek_ptrs_r()
+static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(u32 r)
+{
+	return (r >> 0) & 0x1ff;
+}
+#define HOST1X_SYNC_CFPEEK_PTRS_CF_RD_PTR_V(r) \
+	host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(r)
+static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(u32 r)
+{
+	return (r >> 16) & 0x1ff;
+}
+#define HOST1X_SYNC_CFPEEK_PTRS_CF_WR_PTR_V(r) \
+	host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(r)
+static inline u32 host1x_sync_cbstat_0_r(void)
+{
+	return 0x758;
+}
+#define HOST1X_SYNC_CBSTAT_0 \
+	host1x_sync_cbstat_0_r()
+static inline u32 host1x_sync_cbstat_0_cboffset0_v(u32 r)
+{
+	return (r >> 0) & 0xffff;
+}
+#define HOST1X_SYNC_CBSTAT_0_CBOFFSET0_V(r) \
+	host1x_sync_cbstat_0_cboffset0_v(r)
+static inline u32 host1x_sync_cbstat_0_cbclass0_v(u32 r)
+{
+	return (r >> 16) & 0x3ff;
+}
+#define HOST1X_SYNC_CBSTAT_0_CBCLASS0_V(r) \
+	host1x_sync_cbstat_0_cbclass0_v(r)
+
 #endif /* __hw_host1x01_sync_h__ */
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
index ba48cee..c64c3b0 100644
--- a/drivers/gpu/host1x/hw/syncpt_hw.c
+++ b/drivers/gpu/host1x/hw/syncpt_hw.c
@@ -90,6 +90,7 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp)
 		dev_err(&dev->dev->dev,
 			"Trying to increment syncpoint id %d beyond max\n",
 			sp->id);
+		host1x_debug_dump(sp->dev);
 		return;
 	}
 	host1x_sync_writel(dev, BIT_MASK(sp->id),
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index f21c688..191f65f 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -23,6 +23,7 @@
 #include "syncpt.h"
 #include "dev.h"
 #include "intr.h"
+#include "debug.h"
 #include <trace/events/host1x.h>
 
 #define MAX_SYNCPT_LENGTH	5
@@ -211,6 +212,8 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp,
 				 current->comm, sp->id, sp->name,
 				 thresh, timeout);
 			sp->dev->syncpt_op.debug(sp);
+			if (check_count == MAX_STUCK_CHECK_COUNT)
+				host1x_debug_dump(sp->dev);
 			check_count++;
 		}
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
                   ` (3 preceding siblings ...)
  2013-01-15 11:44 ` [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support Terje Bergstrom
@ 2013-01-15 11:44 ` Terje Bergstrom
  2013-02-04 11:08   ` Thierry Reding
  2013-01-15 11:44 ` [PATCHv5,RESEND 6/8] gpu: host1x: Remove second host1x driver Terje Bergstrom
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:44 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

Make drm part of host1x driver.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/drm/Kconfig                        |    2 --
 drivers/gpu/drm/Makefile                       |    1 -
 drivers/gpu/drm/tegra/Makefile                 |    7 -------
 drivers/gpu/host1x/Kconfig                     |    3 +++
 drivers/gpu/host1x/Makefile                    |    6 ++++++
 drivers/gpu/{drm/tegra => host1x/drm}/Kconfig  |    0
 drivers/gpu/{drm/tegra => host1x/drm}/dc.c     |    0
 drivers/gpu/{drm/tegra => host1x/drm}/dc.h     |    0
 drivers/gpu/{drm/tegra => host1x/drm}/drm.c    |    0
 drivers/gpu/{drm/tegra => host1x/drm}/drm.h    |    6 +++---
 drivers/gpu/{drm/tegra => host1x/drm}/fb.c     |    0
 drivers/gpu/{drm/tegra => host1x/drm}/hdmi.c   |    0
 drivers/gpu/{drm/tegra => host1x/drm}/hdmi.h   |    0
 drivers/gpu/{drm/tegra => host1x/drm}/host1x.c |    0
 drivers/gpu/{drm/tegra => host1x/drm}/output.c |    0
 drivers/gpu/{drm/tegra => host1x/drm}/rgb.c    |    0
 drivers/gpu/host1x/host1x_client.h             |   25 ++++++++++++++++++++++++
 17 files changed, 37 insertions(+), 13 deletions(-)
 delete mode 100644 drivers/gpu/drm/tegra/Makefile
 rename drivers/gpu/{drm/tegra => host1x/drm}/Kconfig (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/dc.c (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/dc.h (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/drm.c (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/drm.h (98%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/fb.c (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/hdmi.c (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/hdmi.h (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/host1x.c (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/output.c (100%)
 rename drivers/gpu/{drm/tegra => host1x/drm}/rgb.c (100%)
 create mode 100644 drivers/gpu/host1x/host1x_client.h

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 983201b..18321b68b 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -210,5 +210,3 @@ source "drivers/gpu/drm/mgag200/Kconfig"
 source "drivers/gpu/drm/cirrus/Kconfig"
 
 source "drivers/gpu/drm/shmobile/Kconfig"
-
-source "drivers/gpu/drm/tegra/Kconfig"
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 6f58c81..f54c72a 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -49,5 +49,4 @@ obj-$(CONFIG_DRM_GMA500) += gma500/
 obj-$(CONFIG_DRM_UDL) += udl/
 obj-$(CONFIG_DRM_AST) += ast/
 obj-$(CONFIG_DRM_SHMOBILE) +=shmobile/
-obj-$(CONFIG_DRM_TEGRA) += tegra/
 obj-y			+= i2c/
diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
deleted file mode 100644
index 80f73d1..0000000
--- a/drivers/gpu/drm/tegra/Makefile
+++ /dev/null
@@ -1,7 +0,0 @@
-ccflags-y := -Iinclude/drm
-ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
-
-tegra-drm-y := drm.o fb.o dc.o host1x.o
-tegra-drm-y += output.o rgb.o hdmi.o
-
-obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
index 57680a6..558b660 100644
--- a/drivers/gpu/host1x/Kconfig
+++ b/drivers/gpu/host1x/Kconfig
@@ -1,4 +1,5 @@
 config TEGRA_HOST1X
+	depends on DRM
 	tristate "Tegra host1x driver"
 	help
 	  Driver for the Tegra host1x hardware.
@@ -26,4 +27,6 @@ config TEGRA_HOST1X_FIREWALL
 
 	  If unsure, choose Y.
 
+source "drivers/gpu/host1x/drm/Kconfig"
+
 endif
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 697d49a..ffc8bf1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -12,4 +12,10 @@ host1x-y = \
 	hw/host1x01.o
 
 host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
+
+ccflags-y += -Iinclude/drm
+ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
+
+host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o
+host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/host1x/drm/Kconfig
similarity index 100%
rename from drivers/gpu/drm/tegra/Kconfig
rename to drivers/gpu/host1x/drm/Kconfig
diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/host1x/drm/dc.c
similarity index 100%
rename from drivers/gpu/drm/tegra/dc.c
rename to drivers/gpu/host1x/drm/dc.c
diff --git a/drivers/gpu/drm/tegra/dc.h b/drivers/gpu/host1x/drm/dc.h
similarity index 100%
rename from drivers/gpu/drm/tegra/dc.h
rename to drivers/gpu/host1x/drm/dc.h
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/host1x/drm/drm.c
similarity index 100%
rename from drivers/gpu/drm/tegra/drm.c
rename to drivers/gpu/host1x/drm/drm.c
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/host1x/drm/drm.h
similarity index 98%
rename from drivers/gpu/drm/tegra/drm.h
rename to drivers/gpu/host1x/drm/drm.h
index 741b5dc..e68b4ac 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/host1x/drm/drm.h
@@ -7,8 +7,8 @@
  * published by the Free Software Foundation.
  */
 
-#ifndef TEGRA_DRM_H
-#define TEGRA_DRM_H 1
+#ifndef HOST1X_DRM_H
+#define HOST1X_DRM_H 1
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
@@ -213,4 +213,4 @@ extern struct platform_driver tegra_hdmi_driver;
 extern struct platform_driver tegra_dc_driver;
 extern struct drm_driver tegra_drm_driver;
 
-#endif /* TEGRA_DRM_H */
+#endif /* HOST1X_DRM_H */
diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/host1x/drm/fb.c
similarity index 100%
rename from drivers/gpu/drm/tegra/fb.c
rename to drivers/gpu/host1x/drm/fb.c
diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/host1x/drm/hdmi.c
similarity index 100%
rename from drivers/gpu/drm/tegra/hdmi.c
rename to drivers/gpu/host1x/drm/hdmi.c
diff --git a/drivers/gpu/drm/tegra/hdmi.h b/drivers/gpu/host1x/drm/hdmi.h
similarity index 100%
rename from drivers/gpu/drm/tegra/hdmi.h
rename to drivers/gpu/host1x/drm/hdmi.h
diff --git a/drivers/gpu/drm/tegra/host1x.c b/drivers/gpu/host1x/drm/host1x.c
similarity index 100%
rename from drivers/gpu/drm/tegra/host1x.c
rename to drivers/gpu/host1x/drm/host1x.c
diff --git a/drivers/gpu/drm/tegra/output.c b/drivers/gpu/host1x/drm/output.c
similarity index 100%
rename from drivers/gpu/drm/tegra/output.c
rename to drivers/gpu/host1x/drm/output.c
diff --git a/drivers/gpu/drm/tegra/rgb.c b/drivers/gpu/host1x/drm/rgb.c
similarity index 100%
rename from drivers/gpu/drm/tegra/rgb.c
rename to drivers/gpu/host1x/drm/rgb.c
diff --git a/drivers/gpu/host1x/host1x_client.h b/drivers/gpu/host1x/host1x_client.h
new file mode 100644
index 0000000..fdd2920
--- /dev/null
+++ b/drivers/gpu/host1x/host1x_client.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2013, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HOST1X_CLIENT_H
+#define HOST1X_CLIENT_H
+
+struct platform_device;
+
+void host1x_set_drm_data(struct platform_device *pdev, void *data);
+void *host1x_get_drm_data(struct platform_device *pdev);
+
+#endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCHv5,RESEND 6/8] gpu: host1x: Remove second host1x driver
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
                   ` (4 preceding siblings ...)
  2013-01-15 11:44 ` [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x Terje Bergstrom
@ 2013-01-15 11:44 ` Terje Bergstrom
  2013-02-04 11:23   ` Thierry Reding
  2013-01-15 11:44 ` [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks Terje Bergstrom
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:44 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

Remove second host1x driver, and bind tegra-drm to the new host1x driver. The
logic to parse device tree and track clients is moved to drm.c.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Makefile        |    2 +-
 drivers/gpu/host1x/dev.c           |   58 +++++++++-
 drivers/gpu/host1x/dev.h           |    6 +
 drivers/gpu/host1x/drm/Kconfig     |    2 +-
 drivers/gpu/host1x/drm/dc.c        |    7 +-
 drivers/gpu/host1x/drm/drm.c       |  213 +++++++++++++++++++++++++++++++++++-
 drivers/gpu/host1x/drm/drm.h       |    3 -
 drivers/gpu/host1x/drm/hdmi.c      |    7 +-
 drivers/gpu/host1x/host1x_client.h |    9 ++
 9 files changed, 294 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index ffc8bf1..c35ee19 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -16,6 +16,6 @@ host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
 ccflags-y += -Iinclude/drm
 ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
-host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o
+host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o
 host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 5aa7d28..17ee01c 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -28,12 +28,25 @@
 #include "channel.h"
 #include "debug.h"
 #include "hw/host1x01.h"
+#include "host1x_client.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/host1x.h>
 
 #define DRIVER_NAME		"tegra-host1x"
 
+void host1x_set_drm_data(struct platform_device *pdev, void *data)
+{
+	struct host1x *host1x = platform_get_drvdata(pdev);
+	host1x->drm_data = data;
+}
+
+void *host1x_get_drm_data(struct platform_device *pdev)
+{
+	struct host1x *host1x = platform_get_drvdata(pdev);
+	return host1x->drm_data;
+}
+
 void host1x_sync_writel(struct host1x *host1x, u32 v, u32 r)
 {
 	void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset;
@@ -153,6 +166,8 @@ static int host1x_probe(struct platform_device *dev)
 
 	host1x_debug_init(host);
 
+	host1x_drm_alloc(dev);
+
 	dev_info(&dev->dev, "initialized\n");
 
 	return 0;
@@ -173,7 +188,7 @@ static int __exit host1x_remove(struct platform_device *dev)
 	return 0;
 }
 
-static struct platform_driver platform_driver = {
+static struct platform_driver tegra_host1x_driver = {
 	.probe = host1x_probe,
 	.remove = __exit_p(host1x_remove),
 	.driver = {
@@ -183,8 +198,47 @@ static struct platform_driver platform_driver = {
 	},
 };
 
-module_platform_driver(platform_driver);
+static int __init tegra_host1x_init(void)
+{
+	int err;
+
+	err = platform_driver_register(&tegra_host1x_driver);
+	if (err < 0)
+		return err;
+
+#ifdef CONFIG_TEGRA_DRM
+	err = platform_driver_register(&tegra_dc_driver);
+	if (err < 0)
+		goto unregister_host1x;
+
+	err = platform_driver_register(&tegra_hdmi_driver);
+	if (err < 0)
+		goto unregister_dc;
+#endif
+
+	return 0;
+
+#ifdef CONFIG_TEGRA_DRM
+unregister_dc:
+	platform_driver_unregister(&tegra_dc_driver);
+unregister_host1x:
+	platform_driver_unregister(&tegra_host1x_driver);
+	return err;
+#endif
+}
+module_init(tegra_host1x_init);
+
+static void __exit tegra_host1x_exit(void)
+{
+#ifdef CONFIG_TEGRA_DRM
+	platform_driver_unregister(&tegra_hdmi_driver);
+	platform_driver_unregister(&tegra_dc_driver);
+#endif
+	platform_driver_unregister(&tegra_host1x_driver);
+}
+module_exit(tegra_host1x_exit);
 
+MODULE_AUTHOR("Thierry Reding <thierry.reding@avionic-design.de>");
 MODULE_AUTHOR("Terje Bergstrom <tbergstrom@nvidia.com>");
 MODULE_DESCRIPTION("Host1x driver for Tegra products");
 MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 467a92e..ff3a365 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -142,6 +142,8 @@ struct host1x {
 	int allocated_channels;
 
 	struct dentry *debugfs;
+
+	void *drm_data;
 };
 
 static inline
@@ -161,4 +163,8 @@ u32 host1x_sync_readl(struct host1x *host1x, u32 r);
 void host1x_ch_writel(struct host1x_channel *ch, u32 r, u32 v);
 u32 host1x_ch_readl(struct host1x_channel *ch, u32 r);
 
+extern struct platform_driver tegra_hdmi_driver;
+extern struct platform_driver tegra_dc_driver;
+extern struct platform_driver tegra_gr2d_driver;
+
 #endif
diff --git a/drivers/gpu/host1x/drm/Kconfig b/drivers/gpu/host1x/drm/Kconfig
index be1daf7..7db9b3a 100644
--- a/drivers/gpu/host1x/drm/Kconfig
+++ b/drivers/gpu/host1x/drm/Kconfig
@@ -1,5 +1,5 @@
 config DRM_TEGRA
-	tristate "NVIDIA Tegra DRM"
+	bool "NVIDIA Tegra DRM"
 	depends on DRM && OF && ARCH_TEGRA
 	select DRM_KMS_HELPER
 	select DRM_GEM_CMA_HELPER
diff --git a/drivers/gpu/host1x/drm/dc.c b/drivers/gpu/host1x/drm/dc.c
index 656b2e3..ac31e96 100644
--- a/drivers/gpu/host1x/drm/dc.c
+++ b/drivers/gpu/host1x/drm/dc.c
@@ -17,6 +17,7 @@
 
 #include "drm.h"
 #include "dc.h"
+#include "host1x_client.h"
 
 struct tegra_dc_window {
 	fixed20_12 x;
@@ -736,7 +737,8 @@ static const struct host1x_client_ops dc_client_ops = {
 
 static int tegra_dc_probe(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
+	struct host1x *host1x =
+		host1x_get_drm_data(to_platform_device(pdev->dev.parent));
 	struct resource *regs;
 	struct tegra_dc *dc;
 	int err;
@@ -800,7 +802,8 @@ static int tegra_dc_probe(struct platform_device *pdev)
 
 static int tegra_dc_remove(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
+	struct host1x *host1x =
+		host1x_get_drm_data(to_platform_device(pdev->dev.parent));
 	struct tegra_dc *dc = platform_get_drvdata(pdev);
 	int err;
 
diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
index 3a503c9..bef9051 100644
--- a/drivers/gpu/host1x/drm/drm.c
+++ b/drivers/gpu/host1x/drm/drm.c
@@ -16,6 +16,7 @@
 #include <asm/dma-iommu.h>
 
 #include "drm.h"
+#include "host1x_client.h"
 
 #define DRIVER_NAME "tegra"
 #define DRIVER_DESC "NVIDIA Tegra graphics"
@@ -24,13 +25,221 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+struct host1x_drm_client {
+	struct host1x_client *client;
+	struct device_node *np;
+	struct list_head list;
+};
+
+static int host1x_add_drm_client(struct host1x *host1x, struct device_node *np)
+{
+	struct host1x_drm_client *client;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&client->list);
+	client->np = of_node_get(np);
+
+	list_add_tail(&client->list, &host1x->drm_clients);
+
+	return 0;
+}
+
+static int host1x_activate_drm_client(struct host1x *host1x,
+				      struct host1x_drm_client *drm,
+				      struct host1x_client *client)
+{
+	mutex_lock(&host1x->drm_clients_lock);
+	list_del_init(&drm->list);
+	list_add_tail(&drm->list, &host1x->drm_active);
+	drm->client = client;
+	mutex_unlock(&host1x->drm_clients_lock);
+
+	return 0;
+}
+
+static int host1x_remove_drm_client(struct host1x *host1x,
+				    struct host1x_drm_client *client)
+{
+	mutex_lock(&host1x->drm_clients_lock);
+	list_del_init(&client->list);
+	mutex_unlock(&host1x->drm_clients_lock);
+
+	of_node_put(client->np);
+	kfree(client);
+
+	return 0;
+}
+
+static int host1x_parse_dt(struct host1x *host1x)
+{
+	static const char * const compat[] = {
+		"nvidia,tegra20-dc",
+		"nvidia,tegra20-hdmi",
+		"nvidia,tegra30-dc",
+		"nvidia,tegra30-hdmi",
+	};
+	unsigned int i;
+	int err;
+
+	for (i = 0; i < ARRAY_SIZE(compat); i++) {
+		struct device_node *np;
+
+		for_each_child_of_node(host1x->dev->of_node, np) {
+			if (of_device_is_compatible(np, compat[i]) &&
+			    of_device_is_available(np)) {
+				err = host1x_add_drm_client(host1x, np);
+				if (err < 0)
+					return err;
+			}
+		}
+	}
+
+	return 0;
+}
+
+int host1x_drm_alloc(struct platform_device *pdev)
+{
+	struct host1x *host1x;
+	int err;
+
+	host1x = devm_kzalloc(&pdev->dev, sizeof(*host1x), GFP_KERNEL);
+	if (!host1x)
+		return -ENOMEM;
+
+	mutex_init(&host1x->drm_clients_lock);
+	INIT_LIST_HEAD(&host1x->drm_clients);
+	INIT_LIST_HEAD(&host1x->drm_active);
+	mutex_init(&host1x->clients_lock);
+	INIT_LIST_HEAD(&host1x->clients);
+	host1x->dev = &pdev->dev;
+
+	err = host1x_parse_dt(host1x);
+	if (err < 0) {
+		dev_err(&pdev->dev, "failed to parse DT: %d\n", err);
+		return err;
+	}
+
+	host1x_set_drm_data(pdev, host1x);
+
+	return 0;
+}
+
+int host1x_drm_init(struct host1x *host1x, struct drm_device *drm)
+{
+	struct host1x_client *client;
+
+	mutex_lock(&host1x->clients_lock);
+
+	list_for_each_entry(client, &host1x->clients, list) {
+		if (client->ops && client->ops->drm_init) {
+			int err = client->ops->drm_init(client, drm);
+			if (err < 0) {
+				dev_err(host1x->dev,
+					"DRM setup failed for %s: %d\n",
+					dev_name(client->dev), err);
+				return err;
+			}
+		}
+	}
+
+	mutex_unlock(&host1x->clients_lock);
+
+	return 0;
+}
+
+int host1x_drm_exit(struct host1x *host1x)
+{
+	struct platform_device *pdev = to_platform_device(host1x->dev);
+	struct host1x_client *client;
+
+	if (!host1x->drm)
+		return 0;
+
+	mutex_lock(&host1x->clients_lock);
+
+	list_for_each_entry_reverse(client, &host1x->clients, list) {
+		if (client->ops && client->ops->drm_exit) {
+			int err = client->ops->drm_exit(client);
+			if (err < 0) {
+				dev_err(host1x->dev,
+					"DRM cleanup failed for %s: %d\n",
+					dev_name(client->dev), err);
+				return err;
+			}
+		}
+	}
+
+	mutex_unlock(&host1x->clients_lock);
+
+	drm_platform_exit(&tegra_drm_driver, pdev);
+	host1x->drm = NULL;
+
+	return 0;
+}
+
+int host1x_register_client(struct host1x *host1x, struct host1x_client *client)
+{
+	struct host1x_drm_client *drm, *tmp;
+	int err;
+
+	mutex_lock(&host1x->clients_lock);
+	list_add_tail(&client->list, &host1x->clients);
+	mutex_unlock(&host1x->clients_lock);
+
+	list_for_each_entry_safe(drm, tmp, &host1x->drm_clients, list)
+		if (drm->np == client->dev->of_node)
+			host1x_activate_drm_client(host1x, drm, client);
+
+	if (list_empty(&host1x->drm_clients)) {
+		struct platform_device *pdev = to_platform_device(host1x->dev);
+
+		err = drm_platform_init(&tegra_drm_driver, pdev);
+		if (err < 0) {
+			dev_err(host1x->dev, "drm_platform_init(): %d\n", err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+int host1x_unregister_client(struct host1x *host1x,
+			     struct host1x_client *client)
+{
+	struct host1x_drm_client *drm, *tmp;
+	int err;
+
+	list_for_each_entry_safe(drm, tmp, &host1x->drm_active, list) {
+		if (drm->client == client) {
+			err = host1x_drm_exit(host1x);
+			if (err < 0) {
+				dev_err(host1x->dev, "host1x_drm_exit(): %d\n",
+					err);
+				return err;
+			}
+
+			host1x_remove_drm_client(host1x, drm);
+			break;
+		}
+	}
+
+	mutex_lock(&host1x->clients_lock);
+	list_del_init(&client->list);
+	mutex_unlock(&host1x->clients_lock);
+
+	return 0;
+}
+
 static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 {
-	struct device *dev = drm->dev;
+	struct platform_device *pdev = to_platform_device(drm->dev);
 	struct host1x *host1x;
 	int err;
 
-	host1x = dev_get_drvdata(dev);
+	host1x = host1x_get_drm_data(pdev);
 	drm->dev_private = host1x;
 	host1x->drm = drm;
 
diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h
index e68b4ac..e7101d5 100644
--- a/drivers/gpu/host1x/drm/drm.h
+++ b/drivers/gpu/host1x/drm/drm.h
@@ -208,9 +208,6 @@ extern int tegra_output_exit(struct tegra_output *output);
 extern int tegra_drm_fb_init(struct drm_device *drm);
 extern void tegra_drm_fb_exit(struct drm_device *drm);
 
-extern struct platform_driver tegra_host1x_driver;
-extern struct platform_driver tegra_hdmi_driver;
-extern struct platform_driver tegra_dc_driver;
 extern struct drm_driver tegra_drm_driver;
 
 #endif /* HOST1X_DRM_H */
diff --git a/drivers/gpu/host1x/drm/hdmi.c b/drivers/gpu/host1x/drm/hdmi.c
index e060c7e..2f1e7b4 100644
--- a/drivers/gpu/host1x/drm/hdmi.c
+++ b/drivers/gpu/host1x/drm/hdmi.c
@@ -20,6 +20,7 @@
 #include "hdmi.h"
 #include "drm.h"
 #include "dc.h"
+#include "host1x_client.h"
 
 struct tegra_hdmi {
 	struct host1x_client client;
@@ -1198,7 +1199,8 @@ static const struct host1x_client_ops hdmi_client_ops = {
 
 static int tegra_hdmi_probe(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
+	struct host1x *host1x =
+		host1x_get_drm_data(to_platform_device(pdev->dev.parent));
 	struct tegra_hdmi *hdmi;
 	struct resource *regs;
 	int err;
@@ -1287,7 +1289,8 @@ static int tegra_hdmi_probe(struct platform_device *pdev)
 
 static int tegra_hdmi_remove(struct platform_device *pdev)
 {
-	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);
+	struct host1x *host1x =
+		host1x_get_drm_data(to_platform_device(pdev->dev.parent));
 	struct tegra_hdmi *hdmi = platform_get_drvdata(pdev);
 	int err;
 
diff --git a/drivers/gpu/host1x/host1x_client.h b/drivers/gpu/host1x/host1x_client.h
index fdd2920..938df7e 100644
--- a/drivers/gpu/host1x/host1x_client.h
+++ b/drivers/gpu/host1x/host1x_client.h
@@ -19,6 +19,15 @@
 
 struct platform_device;
 
+#ifdef CONFIG_DRM_TEGRA
+int host1x_drm_alloc(struct platform_device *pdev);
+#else
+static inline int host1x_drm_alloc(struct platform_device *pdev)
+{
+	return 0;
+}
+#endif
+
 void host1x_set_drm_data(struct platform_device *pdev, void *data);
 void *host1x_get_drm_data(struct platform_device *pdev);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
                   ` (5 preceding siblings ...)
  2013-01-15 11:44 ` [PATCHv5,RESEND 6/8] gpu: host1x: Remove second host1x driver Terje Bergstrom
@ 2013-01-15 11:44 ` Terje Bergstrom
  2013-02-04 11:26   ` Thierry Reding
  2013-01-15 11:44 ` [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device Terje Bergstrom
  2013-01-22  9:03 ` [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergström
  8 siblings, 1 reply; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:44 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
of 2D clock to that driver alias.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 arch/arm/mach-tegra/board-dt-tegra20.c    |    1 +
 arch/arm/mach-tegra/board-dt-tegra30.c    |    1 +
 arch/arm/mach-tegra/tegra20_clocks_data.c |    2 +-
 arch/arm/mach-tegra/tegra30_clocks_data.c |    1 +
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c b/arch/arm/mach-tegra/board-dt-tegra20.c
index 171ba3c..9fcc800 100644
--- a/arch/arm/mach-tegra/board-dt-tegra20.c
+++ b/arch/arm/mach-tegra/board-dt-tegra20.c
@@ -96,6 +96,7 @@ static struct of_dev_auxdata tegra20_auxdata_lookup[] __initdata = {
 	OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000D800, "spi_tegra.2", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000DA00, "spi_tegra.3", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-host1x", 0x50000000, "host1x", NULL),
+	OF_DEV_AUXDATA("nvidia,tegra20-gr2d", 0x54140000, "gr2d", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54200000, "tegradc.0", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54240000, "tegradc.1", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra20-hdmi", 0x54280000, "hdmi", NULL),
diff --git a/arch/arm/mach-tegra/board-dt-tegra30.c b/arch/arm/mach-tegra/board-dt-tegra30.c
index cfe5fc0..0b4a1f0 100644
--- a/arch/arm/mach-tegra/board-dt-tegra30.c
+++ b/arch/arm/mach-tegra/board-dt-tegra30.c
@@ -59,6 +59,7 @@ static struct of_dev_auxdata tegra30_auxdata_lookup[] __initdata = {
 	OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DC00, "spi_tegra.4", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DE00, "spi_tegra.5", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-host1x", 0x50000000, "host1x", NULL),
+	OF_DEV_AUXDATA("nvidia,tegra30-gr2d", 0x54140000, "gr2d", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54200000, "tegradc.0", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54240000, "tegradc.1", NULL),
 	OF_DEV_AUXDATA("nvidia,tegra30-hdmi", 0x54280000, "hdmi", NULL),
diff --git a/arch/arm/mach-tegra/tegra20_clocks_data.c b/arch/arm/mach-tegra/tegra20_clocks_data.c
index a23a073..15d440a 100644
--- a/arch/arm/mach-tegra/tegra20_clocks_data.c
+++ b/arch/arm/mach-tegra/tegra20_clocks_data.c
@@ -1041,7 +1041,7 @@ static struct clk_duplicate tegra_clk_duplicates[] = {
 	CLK_DUPLICATE("usbd",	"utmip-pad",	NULL),
 	CLK_DUPLICATE("usbd",	"tegra-ehci.0",	NULL),
 	CLK_DUPLICATE("usbd",	"tegra-otg",	NULL),
-	CLK_DUPLICATE("2d",	"tegra_grhost",	"gr2d"),
+	CLK_DUPLICATE("2d",	"gr2d",	"gr2d"),
 	CLK_DUPLICATE("3d",	"tegra_grhost",	"gr3d"),
 	CLK_DUPLICATE("epp",	"tegra_grhost",	"epp"),
 	CLK_DUPLICATE("mpe",	"tegra_grhost",	"mpe"),
diff --git a/arch/arm/mach-tegra/tegra30_clocks_data.c b/arch/arm/mach-tegra/tegra30_clocks_data.c
index 741d264..5c4b7b7 100644
--- a/arch/arm/mach-tegra/tegra30_clocks_data.c
+++ b/arch/arm/mach-tegra/tegra30_clocks_data.c
@@ -1338,6 +1338,7 @@ static struct clk_duplicate tegra_clk_duplicates[] = {
 	CLK_DUPLICATE("pll_p", "tegradc.0", "parent"),
 	CLK_DUPLICATE("pll_p", "tegradc.1", "parent"),
 	CLK_DUPLICATE("pll_d2_out0", "hdmi", "parent"),
+	CLK_DUPLICATE("2d", "gr2d", "gr2d"),
 };
 
 static struct clk *tegra_ptr_clks[] = {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
                   ` (6 preceding siblings ...)
  2013-01-15 11:44 ` [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks Terje Bergstrom
@ 2013-01-15 11:44 ` Terje Bergstrom
  2013-02-04 12:56   ` Thierry Reding
  2013-01-22  9:03 ` [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergström
  8 siblings, 1 reply; 49+ messages in thread
From: Terje Bergstrom @ 2013-01-15 11:44 UTC (permalink / raw)
  To: amerilainen, airlied, thierry.reding
  Cc: dri-devel, linux-tegra, linux-kernel, Terje Bergstrom

Add client driver for 2D device, and IOCTLs to pass work to host1x
channel for 2D.

Also adds functions that can be called to access sync points from DRM.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/gpu/host1x/Makefile   |    1 +
 drivers/gpu/host1x/dev.c      |    7 +
 drivers/gpu/host1x/drm/drm.c  |  226 +++++++++++++++++++++++++++-
 drivers/gpu/host1x/drm/drm.h  |   28 ++++
 drivers/gpu/host1x/drm/gr2d.c |  325 +++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/syncpt.c   |    5 +
 drivers/gpu/host1x/syncpt.h   |    3 +
 include/drm/tegra_drm.h       |  131 +++++++++++++++++
 8 files changed, 725 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/host1x/drm/gr2d.c
 create mode 100644 include/drm/tegra_drm.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index c35ee19..c2120ad 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -18,4 +18,5 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
 host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o
 host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
+host1x-$(CONFIG_DRM_TEGRA) += drm/gr2d.o
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 17ee01c..40d9938 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -214,11 +214,17 @@ static int __init tegra_host1x_init(void)
 	err = platform_driver_register(&tegra_hdmi_driver);
 	if (err < 0)
 		goto unregister_dc;
+
+	err = platform_driver_register(&tegra_gr2d_driver);
+	if (err < 0)
+		goto unregister_hdmi;
 #endif
 
 	return 0;
 
 #ifdef CONFIG_TEGRA_DRM
+unregister_hdmi:
+	platform_driver_unregister(&tegra_hdmi_driver);
 unregister_dc:
 	platform_driver_unregister(&tegra_dc_driver);
 unregister_host1x:
@@ -231,6 +237,7 @@ module_init(tegra_host1x_init);
 static void __exit tegra_host1x_exit(void)
 {
 #ifdef CONFIG_TEGRA_DRM
+	platform_driver_unregister(&tegra_gr2d_driver);
 	platform_driver_unregister(&tegra_hdmi_driver);
 	platform_driver_unregister(&tegra_dc_driver);
 #endif
diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
index bef9051..f8f8508 100644
--- a/drivers/gpu/host1x/drm/drm.c
+++ b/drivers/gpu/host1x/drm/drm.c
@@ -14,9 +14,11 @@
 #include <mach/clk.h>
 #include <linux/dma-mapping.h>
 #include <asm/dma-iommu.h>
+#include <drm/tegra_drm.h>
 
 #include "drm.h"
 #include "host1x_client.h"
+#include "syncpt.h"
 
 #define DRIVER_NAME "tegra"
 #define DRIVER_DESC "NVIDIA Tegra graphics"
@@ -78,8 +80,10 @@ static int host1x_parse_dt(struct host1x *host1x)
 	static const char * const compat[] = {
 		"nvidia,tegra20-dc",
 		"nvidia,tegra20-hdmi",
+		"nvidia,tegra20-gr2d",
 		"nvidia,tegra30-dc",
 		"nvidia,tegra30-hdmi",
+		"nvidia,tegra30-gr2d",
 	};
 	unsigned int i;
 	int err;
@@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)
 
 static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 {
-	return 0;
+	struct host1x_drm_fpriv *fpriv;
+	int err = 0;
+
+	fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
+	if (!fpriv)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&fpriv->contexts);
+	filp->driver_priv = fpriv;
+
+	return err;
+}
+
+static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
+{
+	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp);
+	struct host1x_drm_context *context, *tmp;
+
+	list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
+		context->client->ops->close_channel(context);
+		kfree(context);
+	}
+	kfree(fpriv);
 }
 
 static void tegra_drm_lastclose(struct drm_device *drm)
@@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm)
 	drm_fbdev_cma_restore_mode(host1x->fbdev);
 }
 
+static int
+tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct host1x *host1x = drm->dev_private;
+	struct tegra_drm_syncpt_read_args *args = data;
+	struct host1x_syncpt *sp =
+		host1x_syncpt_get_bydev(host1x->dev, args->id);
+
+	if (!sp)
+		return -EINVAL;
+
+	args->value = host1x_syncpt_read_min(sp);
+	return 0;
+}
+
+static int
+tegra_drm_ioctl_syncpt_incr(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct host1x *host1x = drm->dev_private;
+	struct tegra_drm_syncpt_incr_args *args = data;
+	struct host1x_syncpt *sp =
+		host1x_syncpt_get_bydev(host1x->dev, args->id);
+
+	if (!sp)
+		return -EINVAL;
+
+	host1x_syncpt_incr(sp);
+	return 0;
+}
+
+static int
+tegra_drm_ioctl_syncpt_wait(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct host1x *host1x = drm->dev_private;
+	struct tegra_drm_syncpt_wait_args *args = data;
+	struct host1x_syncpt *sp =
+		host1x_syncpt_get_bydev(host1x->dev, args->id);
+
+	if (!sp)
+		return -EINVAL;
+
+	return host1x_syncpt_wait(sp, args->thresh,
+			args->timeout, &args->value);
+}
+
+static int
+tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_open_channel_args *args = data;
+	struct host1x_client *client;
+	struct host1x_drm_context *context;
+	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
+	struct host1x *host1x = drm->dev_private;
+	int err = 0;
+
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (!context)
+		return -ENOMEM;
+
+	list_for_each_entry(client, &host1x->clients, list) {
+		if (client->class == args->class) {
+			context->client = client;
+			err = client->ops->open_channel(client, context);
+			if (err)
+				goto out;
+
+			list_add(&context->list, &fpriv->contexts);
+			args->context = (uintptr_t)context;
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	if (err)
+		kfree(context);
+
+	return err;
+}
+
+static int
+tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_open_channel_args *args = data;
+	struct host1x_drm_context *context, *tmp;
+	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			context->client->ops->close_channel(context);
+			list_del(&context->list);
+			kfree(context);
+			goto out;
+		}
+	}
+	err = -EINVAL;
+
+out:
+	return err;
+}
+
+static int
+tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_get_channel_param_args *args = data;
+	struct host1x_drm_context *context;
+	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry(context, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			args->value =
+				context->client->ops->get_syncpoint(context,
+						args->param);
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	return err;
+}
+
+static int
+tegra_drm_ioctl_submit(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_drm_submit_args *args = data;
+	struct host1x_drm_context *context;
+	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
+	int err = 0;
+
+	list_for_each_entry(context, &fpriv->contexts, list) {
+		if ((uintptr_t)context == args->context) {
+			err = context->client->ops->submit(context, args, drm,
+				file_priv);
+			goto out;
+		}
+	}
+	err = -ENODEV;
+
+out:
+	return err;
+
+}
+
+static int
+tegra_drm_create_ioctl(struct drm_device *drm, void *data,
+			 struct drm_file *file_priv)
+{
+	struct tegra_gem_create *args = data;
+	struct drm_gem_cma_object *cma_obj;
+	int ret;
+
+	cma_obj = drm_gem_cma_create(drm, args->size);
+	if (IS_ERR(cma_obj))
+		goto err_cma_create;
+
+	ret = drm_gem_handle_create(file_priv, &cma_obj->base, &args->handle);
+	if (ret)
+		goto err_handle_create;
+
+	args->offset = cma_obj->base.map_list.hash.key << PAGE_SHIFT;
+
+	drm_gem_object_unreference(&cma_obj->base);
+
+	return 0;
+
+err_handle_create:
+	drm_gem_cma_free_object(&cma_obj->base);
+err_cma_create:
+	return -ENOMEM;
+}
+
 static struct drm_ioctl_desc tegra_drm_ioctls[] = {
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE,
+			tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_READ,
+			tegra_drm_ioctl_syncpt_read, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_INCR,
+			tegra_drm_ioctl_syncpt_incr, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_WAIT,
+			tegra_drm_ioctl_syncpt_wait, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_OPEN_CHANNEL,
+			tegra_drm_ioctl_open_channel, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_CLOSE_CHANNEL,
+			tegra_drm_ioctl_close_channel, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_GET_SYNCPOINT,
+			tegra_drm_ioctl_get_syncpoint, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(TEGRA_DRM_SUBMIT,
+			tegra_drm_ioctl_submit, DRM_UNLOCKED),
 };
 
 static const struct file_operations tegra_drm_fops = {
@@ -303,6 +526,7 @@ struct drm_driver tegra_drm_driver = {
 	.load = tegra_drm_load,
 	.unload = tegra_drm_unload,
 	.open = tegra_drm_open,
+	.preclose = tegra_drm_close,
 	.lastclose = tegra_drm_lastclose,
 
 	.gem_free_object = drm_gem_cma_free_object,
diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h
index e7101d5..dc4c128 100644
--- a/drivers/gpu/host1x/drm/drm.h
+++ b/drivers/gpu/host1x/drm/drm.h
@@ -17,6 +17,7 @@
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_fixed.h>
+#include <drm/tegra_drm.h>
 
 struct tegra_framebuffer {
 	struct drm_framebuffer base;
@@ -49,17 +50,44 @@ struct host1x {
 
 struct host1x_client;
 
+struct host1x_drm_context {
+	struct host1x_client *client;
+	struct host1x_channel *channel;
+	struct list_head list;
+};
+
 struct host1x_client_ops {
 	int (*drm_init)(struct host1x_client *client, struct drm_device *drm);
 	int (*drm_exit)(struct host1x_client *client);
+	int (*open_channel)(struct host1x_client *,
+			struct host1x_drm_context *);
+	void (*close_channel)(struct host1x_drm_context *);
+	u32 (*get_syncpoint)(struct host1x_drm_context *, int index);
+	int (*submit)(struct host1x_drm_context *,
+			struct tegra_drm_submit_args *,
+			struct drm_device *,
+			struct drm_file *);
+};
+
+struct host1x_drm_fpriv {
+	struct list_head contexts;
 };
 
+static inline struct host1x_drm_fpriv *
+host1x_drm_fpriv(struct drm_file *file_priv)
+{
+	return file_priv ? file_priv->driver_priv : NULL;
+}
+
 struct host1x_client {
 	struct host1x *host1x;
 	struct device *dev;
 
 	const struct host1x_client_ops *ops;
 
+	u32 class;
+	struct host1x_channel *channel;
+
 	struct list_head list;
 };
 
diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
new file mode 100644
index 0000000..dc7d6c6
--- /dev/null
+++ b/drivers/gpu/host1x/drm/gr2d.c
@@ -0,0 +1,325 @@
+/*
+ * drivers/video/tegra/host/gr2d/gr2d.c
+ *
+ * Tegra Graphics 2D
+ *
+ * Copyright (c) 2012, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/clk.h>
+#include <drm/tegra_drm.h>
+#include "drm.h"
+#include "job.h"
+#include "channel.h"
+#include "host1x.h"
+#include "syncpt.h"
+#include "memmgr.h"
+#include "host1x_client.h"
+
+struct gr2d {
+	struct host1x_client client;
+	struct clk *clk;
+	struct host1x_syncpt *syncpt;
+	struct host1x_channel *channel;
+};
+
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg);
+
+static int gr2d_client_init(struct host1x_client *client,
+		struct drm_device *drm)
+{
+	return 0;
+}
+
+static int gr2d_client_exit(struct host1x_client *client)
+{
+	return 0;
+}
+
+static int gr2d_open_channel(struct host1x_client *client,
+		struct host1x_drm_context *context)
+{
+	struct gr2d *gr2d = dev_get_drvdata(client->dev);
+	context->channel = host1x_channel_get(gr2d->channel);
+
+	if (!context->channel)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void gr2d_close_channel(struct host1x_drm_context *context)
+{
+	host1x_channel_put(context->channel);
+}
+
+static u32 gr2d_get_syncpoint(struct host1x_drm_context *context, int index)
+{
+	struct gr2d *gr2d = dev_get_drvdata(context->client->dev);
+	if (index != 0)
+		return UINT_MAX;
+
+	return host1x_syncpt_id(gr2d->syncpt);
+}
+
+static u32 handle_cma_to_host1x(struct drm_device *drm,
+				struct drm_file *file_priv, u32 gem_handle)
+{
+	struct drm_gem_object *obj;
+	struct drm_gem_cma_object *cma_obj;
+	u32 host1x_handle;
+
+	obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
+	if (!obj)
+		return 0;
+
+	cma_obj = to_drm_gem_cma_obj(obj);
+	host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
+	drm_gem_object_unreference(obj);
+
+	return host1x_handle;
+}
+
+static int gr2d_submit(struct host1x_drm_context *context,
+		struct tegra_drm_submit_args *args,
+		struct drm_device *drm,
+		struct drm_file *file_priv)
+{
+	struct host1x_job *job;
+	int num_cmdbufs = args->num_cmdbufs;
+	int num_relocs = args->num_relocs;
+	int num_waitchks = args->num_waitchks;
+	struct tegra_drm_cmdbuf __user *cmdbufs =
+		(void * __user)(uintptr_t)args->cmdbufs;
+	struct tegra_drm_reloc __user *relocs =
+		(void * __user)(uintptr_t)args->relocs;
+	struct tegra_drm_waitchk __user *waitchks =
+		(void * __user)(uintptr_t)args->waitchks;
+	struct tegra_drm_syncpt_incr syncpt_incr;
+	int err;
+
+	/* We don't yet support other than one syncpt_incr struct per submit */
+	if (args->num_syncpt_incrs != 1)
+		return -EINVAL;
+
+	job = host1x_job_alloc(context->channel,
+			args->num_cmdbufs,
+			args->num_relocs,
+			args->num_waitchks);
+	if (!job)
+		return -ENOMEM;
+
+	job->num_relocs = args->num_relocs;
+	job->num_waitchk = args->num_waitchks;
+	job->clientid = (u32)args->context;
+	job->class = context->client->class;
+	job->serialize = true;
+
+	while (num_cmdbufs) {
+		struct tegra_drm_cmdbuf cmdbuf;
+		err = copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf));
+		if (err)
+			goto fail;
+
+		cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, cmdbuf.mem);
+		if (!cmdbuf.mem)
+			goto fail;
+
+		host1x_job_add_gather(job,
+				cmdbuf.mem, cmdbuf.words, cmdbuf.offset);
+		num_cmdbufs--;
+		cmdbufs++;
+	}
+
+	err = copy_from_user(job->relocarray,
+			relocs, sizeof(*relocs) * num_relocs);
+	if (err)
+		goto fail;
+
+	while (num_relocs--) {
+		job->relocarray[num_relocs].cmdbuf_mem =
+			handle_cma_to_host1x(drm, file_priv,
+			job->relocarray[num_relocs].cmdbuf_mem);
+		job->relocarray[num_relocs].target =
+			handle_cma_to_host1x(drm, file_priv,
+			job->relocarray[num_relocs].target);
+
+		if (!job->relocarray[num_relocs].target ||
+			!job->relocarray[num_relocs].cmdbuf_mem)
+			goto fail;
+	}
+
+	err = copy_from_user(job->waitchk,
+			waitchks, sizeof(*waitchks) * num_waitchks);
+	if (err)
+		goto fail;
+
+	err = host1x_job_pin(job, to_platform_device(context->client->dev));
+	if (err)
+		goto fail;
+
+	err = copy_from_user(&syncpt_incr,
+			(void * __user)(uintptr_t)args->syncpt_incrs,
+			sizeof(syncpt_incr));
+	if (err)
+		goto fail;
+
+	job->syncpt_id = syncpt_incr.syncpt_id;
+	job->syncpt_incrs = syncpt_incr.syncpt_incrs;
+	job->timeout = 10000;
+	job->is_addr_reg = gr2d_is_addr_reg;
+	if (args->timeout && args->timeout < 10000)
+		job->timeout = args->timeout;
+
+	err = host1x_channel_submit(job);
+	if (err)
+		goto fail_submit;
+
+	args->fence = job->syncpt_end;
+
+	host1x_job_put(job);
+	return 0;
+
+fail_submit:
+	host1x_job_unpin(job);
+fail:
+	host1x_job_put(job);
+	return err;
+}
+
+static struct host1x_client_ops gr2d_client_ops = {
+	.drm_init = gr2d_client_init,
+	.drm_exit = gr2d_client_exit,
+	.open_channel = gr2d_open_channel,
+	.close_channel = gr2d_close_channel,
+	.get_syncpoint = gr2d_get_syncpoint,
+	.submit = gr2d_submit,
+};
+
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg)
+{
+	int ret;
+
+	if (class == NV_HOST1X_CLASS_ID)
+		ret = reg == 0x2b;
+	else
+		switch (reg) {
+		case 0x1a:
+		case 0x1b:
+		case 0x26:
+		case 0x2b:
+		case 0x2c:
+		case 0x2d:
+		case 0x31:
+		case 0x32:
+		case 0x48:
+		case 0x49:
+		case 0x4a:
+		case 0x4b:
+		case 0x4c:
+			ret = 1;
+			break;
+		default:
+			ret = 0;
+			break;
+		}
+
+	return ret;
+}
+
+static struct of_device_id gr2d_match[] = {
+	{ .compatible = "nvidia,tegra30-gr2d" },
+	{ .compatible = "nvidia,tegra20-gr2d" },
+	{ },
+};
+
+static int gr2d_probe(struct platform_device *dev)
+{
+	struct host1x *host1x =
+		host1x_get_drm_data(to_platform_device(dev->dev.parent));
+	int err;
+	struct gr2d *gr2d = NULL;
+
+	gr2d = devm_kzalloc(&dev->dev, sizeof(*gr2d), GFP_KERNEL);
+	if (!gr2d)
+		return -ENOMEM;
+
+	gr2d->clk = devm_clk_get(&dev->dev, "gr2d");
+	if (IS_ERR(gr2d->clk)) {
+		dev_err(&dev->dev, "cannot get clock\n");
+		return PTR_ERR(gr2d->clk);
+	}
+
+	err = clk_prepare_enable(gr2d->clk);
+	if (err) {
+		dev_err(&dev->dev, "cannot turn on clock\n");
+		return err;
+	}
+
+	gr2d->channel = host1x_channel_alloc(dev);
+	if (!gr2d->channel)
+		return -ENOMEM;
+
+	gr2d->syncpt = host1x_syncpt_alloc(dev, 0);
+	if (!gr2d->syncpt) {
+		host1x_channel_free(gr2d->channel);
+		return -ENOMEM;
+	}
+
+	gr2d->client.ops = &gr2d_client_ops;
+	gr2d->client.dev = &dev->dev;
+	gr2d->client.class = NV_GRAPHICS_2D_CLASS_ID;
+
+	err = host1x_register_client(host1x, &gr2d->client);
+	if (err < 0) {
+		dev_err(&dev->dev, "failed to register host1x client: %d\n",
+			err);
+		return err;
+	}
+
+	platform_set_drvdata(dev, gr2d);
+	return 0;
+}
+
+static int __exit gr2d_remove(struct platform_device *dev)
+{
+	struct host1x *host1x =
+		host1x_get_drm_data(to_platform_device(dev->dev.parent));
+	struct gr2d *gr2d = platform_get_drvdata(dev);
+	int err;
+
+	err = host1x_unregister_client(host1x, &gr2d->client);
+	if (err < 0) {
+		dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
+			err);
+		return err;
+	}
+
+	host1x_syncpt_free(gr2d->syncpt);
+	return 0;
+}
+
+struct platform_driver tegra_gr2d_driver = {
+	.probe = gr2d_probe,
+	.remove = __exit_p(gr2d_remove),
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = "gr2d",
+		.of_match_table = gr2d_match,
+	}
+};
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 191f65f..ccaaece 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -392,3 +392,8 @@ struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id)
 {
 	return dev->syncpt + id;
 }
+
+struct host1x_syncpt *host1x_syncpt_get_bydev(struct device *dev, u32 id)
+{
+	return host1x_syncpt_get(dev_get_drvdata(dev), id);
+}
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 255a3a3..d15d4c5 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -110,6 +110,9 @@ static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp)
 /* Return pointer to struct denoting sync point id. */
 struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id);
 
+/* Return pointer to struct denoting sync point id, when given client pdev. */
+struct host1x_syncpt *host1x_syncpt_get_bydev(struct device *dev, u32 id);
+
 /* Request incrementing a sync point. */
 void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
 
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
new file mode 100644
index 0000000..11fb019
--- /dev/null
+++ b/include/drm/tegra_drm.h
@@ -0,0 +1,131 @@
+/*
+ * Copyright (c) 2012, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _TEGRA_DRM_H_
+#define _TEGRA_DRM_H_
+
+struct tegra_gem_create {
+	__u64 size;
+	unsigned int flags;
+	unsigned int handle;
+	unsigned int offset;
+};
+
+struct tegra_gem_invalidate {
+	unsigned int handle;
+};
+
+struct tegra_gem_flush {
+	unsigned int handle;
+};
+
+struct tegra_drm_syncpt_read_args {
+	__u32 id;
+	__u32 value;
+};
+
+struct tegra_drm_syncpt_incr_args {
+	__u32 id;
+	__u32 pad;
+};
+
+struct tegra_drm_syncpt_wait_args {
+	__u32 id;
+	__u32 thresh;
+	__s32 timeout;
+	__u32 value;
+};
+
+#define DRM_TEGRA_NO_TIMEOUT	(-1)
+
+struct tegra_drm_open_channel_args {
+	__u32 class;
+	__u32 pad;
+	__u64 context;
+};
+
+struct tegra_drm_get_channel_param_args {
+	__u64 context;
+	__u32 param;
+	__u32 value;
+};
+
+struct tegra_drm_syncpt_incr {
+	__u32 syncpt_id;
+	__u32 syncpt_incrs;
+};
+
+struct tegra_drm_cmdbuf {
+	__u32 mem;
+	__u32 offset;
+	__u32 words;
+	__u32 pad;
+};
+
+struct tegra_drm_reloc {
+	__u32 cmdbuf_mem;
+	__u32 cmdbuf_offset;
+	__u32 target;
+	__u32 target_offset;
+	__u32 shift;
+	__u32 pad;
+};
+
+struct tegra_drm_waitchk {
+	__u32 mem;
+	__u32 offset;
+	__u32 syncpt_id;
+	__u32 thresh;
+};
+
+struct tegra_drm_submit_args {
+	__u64 context;
+	__u32 num_syncpt_incrs;
+	__u32 num_cmdbufs;
+	__u32 num_relocs;
+	__u32 submit_version;
+	__u32 num_waitchks;
+	__u32 waitchk_mask;
+	__u32 timeout;
+	__u32 pad;
+	__u64 syncpt_incrs;
+	__u64 cmdbufs;
+	__u64 relocs;
+	__u64 waitchks;
+	__u32 fence;		/* Return value */
+
+	__u32 reserved[5];	/* future expansion */
+};
+
+#define DRM_TEGRA_GEM_CREATE		0x00
+#define DRM_TEGRA_DRM_SYNCPT_READ	0x01
+#define DRM_TEGRA_DRM_SYNCPT_INCR	0x02
+#define DRM_TEGRA_DRM_SYNCPT_WAIT	0x03
+#define DRM_TEGRA_DRM_OPEN_CHANNEL	0x04
+#define DRM_TEGRA_DRM_CLOSE_CHANNEL	0x05
+#define DRM_TEGRA_DRM_GET_SYNCPOINT	0x06
+#define DRM_TEGRA_DRM_SUBMIT		0x08
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct tegra_gem_create)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_READ, struct tegra_drm_syncpt_read_args)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_INCR, struct tegra_drm_syncpt_incr_args)
+#define DRM_IOCTL_TEGRA_DRM_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_WAIT, struct tegra_drm_syncpt_wait_args)
+#define DRM_IOCTL_TEGRA_DRM_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_OPEN_CHANNEL, struct tegra_drm_open_channel_args)
+#define DRM_IOCTL_TEGRA_DRM_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_CLOSE_CHANNEL, struct tegra_drm_open_channel_args)
+#define DRM_IOCTL_TEGRA_DRM_GET_SYNCPOINT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_GET_SYNCPOINT, struct tegra_drm_get_channel_param_args)
+#define DRM_IOCTL_TEGRA_DRM_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SUBMIT, struct tegra_drm_submit_args)
+
+#endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware
  2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
                   ` (7 preceding siblings ...)
  2013-01-15 11:44 ` [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device Terje Bergstrom
@ 2013-01-22  9:03 ` Terje Bergström
  8 siblings, 0 replies; 49+ messages in thread
From: Terje Bergström @ 2013-01-22  9:03 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: Arto Merilainen, airlied, thierry.reding, dri-devel, linux-tegra,
	linux-kernel

On 15.01.2013 13:43, Terje Bergstrom wrote:
> This set of patches adds support for Tegra20 and Tegra30 host1x and
> 2D. It is based on linux-next-20130114. The set was regenerated with
> git format-patch -M.

I have pushed both the kernel patches and libdrm changes to
git@gitorious.org:linux-host1x/linux-host1x.git and
git@gitorious.org:linux-host1x/libdrm-host1x.git.

They're not intended to compete with any other repository - I just
wanted to have one place where people can download the kernel patches,
libdrm changes and test suite. I'll remove them once they've served
their purpose.

I'd appreciate feedback on the patches. So far the only feedback has
been from Stephen about clock changes.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver
  2013-01-15 11:43 ` [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver Terje Bergstrom
@ 2013-02-04  9:09   ` Thierry Reding
  2013-02-05  3:30     ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-04  9:09 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: amerilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 13526 bytes --]

On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
> Add host1x, the driver for host1x and its client unit 2D.

Maybe this could be a bit more verbose. Perhaps describe what host1x is.

> diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
[...]
> @@ -0,0 +1,6 @@
> +config TEGRA_HOST1X
> +	tristate "Tegra host1x driver"
> +	help
> +	  Driver for the Tegra host1x hardware.

Maybe s/Tegra/NVIDIA Tegra/?

> +
> +	  Required for enabling tegradrm.

This should probably be dropped. Either encode such knowledge as
explicit dependencies or in this case just remove it altogether since we
will probably merge both drivers anyway.

> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
[...]
> +#include <linux/module.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/clk.h>
> +#include <linux/io.h>
> +#include "dev.h"

Maybe add a blank line between the previous two lines to visually
separate standard Linux includes from driver-specific ones.

> +#include "hw/host1x01.h"
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/host1x.h>
> +
> +#define DRIVER_NAME		"tegra-host1x"

You only ever use this once, so maybe it can just be dropped?

> +static struct host1x_device_info host1x_info = {

Perhaps this should be host1x01_info in order to match the hardware
revision? That'll avoid it having to be renamed later on when other
revisions start to appear.

> +static int host1x_probe(struct platform_device *dev)
> +{
[...]
> +	syncpt_irq = platform_get_irq(dev, 0);
> +	if (IS_ERR_VALUE(syncpt_irq)) {

This is somewhat unusual. It should be fine to just do:

	if (syncpt_irq < 0)

but IS_ERR_VALUE() should work fine too.

> +	memcpy(&host->info, devid->data, sizeof(struct host1x_device_info));

Why not make the .info field a pointer to struct host1x_device_info
instead? That way you don't have to keep separate copies of the same
information.

> +	/* set common host1x device data */
> +	platform_set_drvdata(dev, host);
> +
> +	host->regs = devm_request_and_ioremap(&dev->dev, regs);
> +	if (!host->regs) {
> +		dev_err(&dev->dev, "failed to remap host registers\n");
> +		return -ENXIO;
> +	}

This should probably be rewritten as:

	host->regs = devm_ioremap_resource(&dev->dev, regs);
	if (IS_ERR(host->regs))
		return PTR_ERR(host->regs);

Though that function will only be available in 3.9-rc1.

> +	err = host1x_syncpt_init(host);
> +	if (err)
> +		return err;
[...]
> +	host1x_syncpt_reset(host);

Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why
it might be useful to have host1x_syncpt_reset() as a separate function
but couldn't it be called as part of host1x_syncpt_init()?

> +	dev_info(&dev->dev, "initialized\n");

I don't think this is very useful. We should make sure to tell people
when things fail. When everything goes as planned we don't need to brag
about it =)

> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
> +struct host1x_syncpt_ops {
[...]
> +	const char * (*name)(struct host1x_syncpt *);

Why do we need this? Could we not refer to the syncpt name directly
instead of going through this wrapper? I'd expect the name to be static.

> +struct host1x_device_info {

Maybe this should be called simply host1x_info? _device seems redundant.

> +	int	nb_channels;		/* host1x: num channels supported */
> +	int	nb_pts;			/* host1x: num syncpoints supported */
> +	int	nb_bases;		/* host1x: num syncpoints supported */
> +	int	nb_mlocks;		/* host1x: number of mlocks */
> +	int	(*init)(struct host1x *); /* initialize per SoC ops */
> +	int	sync_offset;
> +};

While this isn't public API, maybe it would still be useful to turn the
comments into proper kerneldoc? That's what people are used to.

> +struct host1x {
> +	void __iomem *regs;
> +	struct host1x_syncpt *syncpt;
> +	struct platform_device *dev;
> +	struct host1x_device_info info;
> +	struct clk *clk;
> +
> +	struct host1x_syncpt_ops syncpt_op;

Maybe make this a struct host1x_syncpt_ops * instead so you don't have
separate copies? While at it, maybe this should be const as well.

> +	struct dentry *debugfs;

This doesn't seem to be used anywhere.

> +static inline
> +struct host1x *host1x_get_host(struct platform_device *_dev)
> +{
> +	struct platform_device *pdev;
> +
> +	if (_dev->dev.parent) {
> +		pdev = to_platform_device(_dev->dev.parent);
> +		return platform_get_drvdata(pdev);
> +	} else
> +		return platform_get_drvdata(_dev);
> +}

There is a lot of needless casting in here. Why not pass in a struct
device * and use dev_get_drvdata() instead?

> diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
[...]
> +#include "hw/host1x01.h"
> +#include "dev.h"
> +#include "hw/host1x01_hardware.h"

The ordering here looks funny.

> +#include "hw/syncpt_hw.c"

Why include the source file here? Can't you compile it separately
instead?

> diff --git a/drivers/gpu/host1x/hw/host1x01.h b/drivers/gpu/host1x/hw/host1x01.h
[...]
> +int host1x01_init(struct host1x *);

For completeness you should probably name the parameter, even if this is
a prototype.

> diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
[...]
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include "hw_host1x01_sync.h"

Again, a blank line might help between the above two. I also assume that
this file will be filled with more content later on, so I guess it's not
worth the trouble to postpone it's creation until a later point.

> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
[...]
> +static inline u32 host1x_sync_syncpt_0_r(void)
> +{
> +	return 0x400;
> +}
> +#define HOST1X_SYNC_SYNCPT_0 \
> +	host1x_sync_syncpt_0_r()
> +static inline u32 host1x_sync_syncpt_base_0_r(void)
> +{
> +	return 0x600;
> +}
> +#define HOST1X_SYNC_SYNCPT_BASE_0 \
> +	host1x_sync_syncpt_base_0_r()
> +static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
> +{
> +	return 0x700;
> +}

Perhaps it would be useful to modify these to take the syncpt ID as a
parameter? That way you don't have to remember to do the multiplication
everytime you access the register?

> diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
[...]
> +/*
> + * Updates the last value read from hardware.
> + * (was host1x_syncpt_load_min)

Can the comment in () not be dropped? Given that this is new code nobody
would know about the old name.

> + */
> +static u32 syncpt_load_min(struct host1x_syncpt *sp)
> +{
> +	struct host1x *dev = sp->dev;
> +	u32 old, live;
> +
> +	do {
> +		old = host1x_syncpt_read_min(sp);
> +		live = host1x_sync_readl(dev,
> +				HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
> +	} while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);

I think this warrants a comment.

> +	if (!host1x_syncpt_check_max(sp, live))
> +		dev_err(&dev->dev->dev,
> +				"%s failed: id=%u, min=%d, max=%d\n",
> +				__func__,
> +				sp->id,
> +				host1x_syncpt_read_min(sp),
> +				host1x_syncpt_read_max(sp));

You could probably make this fit into less lines.

> +/*
> + * Write a cpu syncpoint increment to the hardware, without touching
> + * the cache. Caller is responsible for host being powered.
> + */

The last part of this comment applies to every host1x function, right?
So maybe it should just be dropped.

> +static void syncpt_debug(struct host1x_syncpt *sp)
> +{
> +	u32 i;
> +	for (i = 0; i < host1x_syncpt_nb_pts(sp->dev); i++) {
> +		u32 max = host1x_syncpt_read_max(sp);
> +		u32 min = host1x_syncpt_load_min(sp);
> +		if (!max && !min)
> +			continue;
> +		dev_info(&sp->dev->dev->dev,
> +			"id %d (%s) min %d max %d\n",
> +			i, sp->name,
> +			min, max);
> +
> +	}

There's a gratuitous blank line above.

> +
> +	for (i = 0; i < host1x_syncpt_nb_bases(sp->dev); i++) {
> +		u32 base_val;
> +		host1x_syncpt_read_wait_base(sp);
> +		base_val = sp->base_val;
> +		if (base_val)
> +			dev_info(&sp->dev->dev->dev,
> +					"waitbase id %d val %d\n",
> +					i, base_val);
> +
> +	}

And another one.

> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
[...]
> +#include <linux/platform_device.h>
> +#include <linux/slab.h>
> +#include <linux/stat.h>

I don't think this is needed.

> +#include <linux/module.h>
> +#include "syncpt.h"
> +#include "dev.h"
> +#include <trace/events/host1x.h>

Again, some more spacing would be nice here. And the ordering is a bit
weird. Maybe put the trace include above syncpt.h and dev.h?

> +#define MAX_SYNCPT_LENGTH	5

This doesn't seem to be used anywhere.

> +static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
> +		struct platform_device *pdev,
> +		int client_managed);

Can't you move the actual implementation here? Also I'm not sure if
passing the platform_device is the best choice here. struct device
should work just as well.

> +/*
> + * Resets syncpoint and waitbase values to sw shadows
> + */
> +void host1x_syncpt_reset(struct host1x *dev)

Maybe host1x_syncpt_flush() would be a better name given the above
description? Reset does have this hardware reset connotation so my first
intuition had been that this would reset the syncpt value to 0.

If you decide to change the name, make sure to change it in the syncpt
ops as well.

> +/*
> + * Updates sw shadow state for client managed registers
> + */
> +void host1x_syncpt_save(struct host1x *dev)
> +{
> +	struct host1x_syncpt *sp_base = dev->syncpt;
> +	u32 i;
> +
> +	for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) {
> +		if (host1x_syncpt_client_managed(sp_base + i))
> +			dev->syncpt_op.load_min(sp_base + i);
> +		else
> +			WARN_ON(!host1x_syncpt_min_eq_max(sp_base + i));
> +	}
> +
> +	for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
> +		dev->syncpt_op.read_wait_base(sp_base + i);
> +}

A similar comment applies here. Though I'm not so sure about a better
name. Perhaps host1x_syncpt_sync()?

I know that this must sound all pretty straightforward to you, but for
somebody who hasn't used these functions at all the names are quite
confusing. So instead of people to go read the documentation I tend to
think that making the names as descriptive as possible is essential
here.

> +/*
> + * Updates the last value read from hardware.
> + */
> +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
> +{
> +	u32 val;
> +	val = sp->dev->syncpt_op.load_min(sp);
> +	trace_host1x_syncpt_load_min(sp->id, val);
> +
> +	return val;
> +}

I don't know I understand what this means exactly. Does it read the
value that hardware last incremented? Perhaps this will become clearer
when you add a comment to the syncpt_load_min() implementation.

> +int host1x_syncpt_init(struct host1x *host)
> +{
> +	struct host1x_syncpt *syncpt, *sp;
> +	int i;
> +
> +	syncpt = sp = devm_kzalloc(&host->dev->dev,
> +			sizeof(struct host1x_syncpt) * host->info.nb_pts,

You can make this a bit shorter by using sizeof(*sp) instead.

> +	for (i = 0; i < host->info.nb_pts; ++i, ++sp) {
> +		sp->id = i;
> +		sp->dev = host;

Perhaps:

	syncpt[i].id = i;
	syncpt[i].dev = host;

To avoid the need to explicitly keep track of sp?

> +static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
> +		struct platform_device *pdev,
> +		int client_managed)
> +{
> +	int i;
> +	struct host1x_syncpt *sp = host->syncpt;
> +	char *name;
> +
> +	for (i = 0; i < host->info.nb_pts && sp->name; i++, sp++)
> +		;
> +	if (sp->pdev)
> +		return NULL;
> +
> +	name = kasprintf(GFP_KERNEL, "%02d-%s", sp->id,
> +			pdev ? dev_name(&pdev->dev) : NULL);
> +	if (!name)
> +		return NULL;
> +
> +	sp->pdev = pdev;
> +	sp->name = name;
> +	sp->client_managed = client_managed;
> +
> +	return sp;
> +}
> +
> +struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
> +		int client_managed)
> +{
> +	struct host1x *host = host1x_get_host(pdev);
> +	return _host1x_syncpt_alloc(host, pdev, client_managed);
> +}

I think it's enough to keep track of the struct device here instead of
the struct platform_device.

Also the syncpoint is not actually allocated here, so maybe
host1x_syncpt_request() would be a better name. As a nice side-effect it
makes the naming more similar to the IRQ API and might be easier to work
with.

> +struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id)
> +{
> +	return dev->syncpt + id;
> +}

Should this perhaps do some error checking. What if the specified syncpt
hasn't actually been requested before?

> diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
[...]
> +struct host1x_syncpt {
> +	int id;
> +	atomic_t min_val;
> +	atomic_t max_val;
> +	u32 base_val;
> +	const char *name;
> +	int client_managed;

Is this field actually ever used? Looking through the patches none of
the clients actually set this.

> +/*
> + * Returns true if syncpoint min == max, which means that there are no
> + * outstanding operations.
> + */
> +static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp)
> +{
> +	int min, max;
> +	smp_rmb();
> +	min = atomic_read(&sp->min_val);
> +	max = atomic_read(&sp->max_val);
> +	return (min == max);
> +}

Maybe call this host1x_syncpt_idle() or something similar instead?

> +{
> +	return sp->id != NVSYNCPT_INVALID &&
> +		sp->id < host1x_syncpt_nb_pts(sp->dev);
> +}

Is there really a need for NVSYNCPT_INVALID? Even if you want to keep
the special case you can drop the explicit check because -1 will be
larger than host1x_syncpt_nb_pts() anyway.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts
  2013-01-15 11:43 ` [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts Terje Bergstrom
@ 2013-02-04 10:30   ` Thierry Reding
  2013-02-05  4:29     ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-04 10:30 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: amerilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 10543 bytes --]

On Tue, Jan 15, 2013 at 01:43:58PM +0200, Terje Bergstrom wrote:
[...]
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
[...]
> @@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)
>  
>  	/* set common host1x device data */
>  	platform_set_drvdata(dev, host);
> -
>  	host->regs = devm_request_and_ioremap(&dev->dev, regs);
>  	if (!host->regs) {
>  		dev_err(&dev->dev, "failed to remap host registers\n");

This seems an unrelated (and actually undesirable) change.

> @@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev)
>  	}
>  
>  	err = host1x_syncpt_init(host);
> -	if (err)
> +	if (err) {
> +		dev_err(&dev->dev, "failed to init sync points");
>  		return err;
> +	}

This error message should probably have gone in the previous patch as
well.

> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> index d8f5979..8376092 100644
> --- a/drivers/gpu/host1x/dev.h
> +++ b/drivers/gpu/host1x/dev.h
> @@ -17,11 +17,12 @@
>  #ifndef HOST1X_DEV_H
>  #define HOST1X_DEV_H
>  
> +#include <linux/platform_device.h>
>  #include "syncpt.h"
> +#include "intr.h"
>  
>  struct host1x;
>  struct host1x_syncpt;
> -struct platform_device;

Why include platform_device.h here?

> @@ -34,6 +35,18 @@ struct host1x_syncpt_ops {
>  	const char * (*name)(struct host1x_syncpt *);
>  };
>  
> +struct host1x_intr_ops {
> +	void (*init_host_sync)(struct host1x_intr *);
> +	void (*set_host_clocks_per_usec)(
> +		struct host1x_intr *, u32 clocks);

Could the above two not be combined? The only reason to keep them
separate would be if the host1x clock was dynamically changed, but I
don't think we support that, right?

> +	void (*set_syncpt_threshold)(
> +		struct host1x_intr *, u32 id, u32 thresh);
> +	void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
> +	void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
> +	void (*disable_all_syncpt_intrs)(struct host1x_intr *);

Can disable_all_syncpt_intrs() not be implemented generically using the
number of syncpoints as exposed by host1x_device_info and the
.disable_syncpt_intr() function?

> @@ -46,11 +59,13 @@ struct host1x_device_info {
>  struct host1x {
>  	void __iomem *regs;
>  	struct host1x_syncpt *syncpt;
> +	struct host1x_intr intr;
>  	struct platform_device *dev;
>  	struct host1x_device_info info;
>  	struct clk *clk;
>  
>  	struct host1x_syncpt_ops syncpt_op;
> +	struct host1x_intr_ops intr_op;

I think carrying a const pointer to the interrupt operations structure
is a better option here.

> diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c
[...]
> +static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt);

Can we avoid this forward declaration?

> +static void syncpt_thresh_cascade_fn(struct work_struct *work)

syncpt_thresh_work()?

> +{
> +	struct host1x_intr_syncpt *sp =
> +		container_of(work, struct host1x_intr_syncpt, work);
> +	host1x_syncpt_thresh_fn(sp);

Couldn't we inline the host1x_syncpt_thresh_fn() implementation here?
Why do we need to go through an external function declaration?

> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
> +{
> +	struct host1x *host1x = dev_id;
> +	struct host1x_intr *intr = &host1x->intr;
> +	unsigned long reg;
> +	int i, id;
> +
> +	for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
> +		reg = host1x_sync_readl(host1x,
> +				HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
> +				i * REGISTER_STRIDE);
> +		for_each_set_bit(id, &reg, BITS_PER_LONG) {
> +			struct host1x_intr_syncpt *sp =
> +				intr->syncpt + (i * BITS_PER_LONG + id);
> +			host1x_intr_syncpt_thresh_isr(sp);

Have you considered mimicking the IRQ API and name this something like
host1x_intr_syncpt_thresh_handle() and name the actual ISR just
syncpt_thresh_isr()? Not so important but it makes things a bit clearer
in my opinion.

> +			queue_work(intr->wq, &sp->work);

Should the call to queue_work() perhaps be moved into
host1x_intr_syncpt_thresh_isr().

> +static void host1x_intr_init_host_sync(struct host1x_intr *intr)
> +{
> +	struct host1x *host1x = intr_to_host1x(intr);
> +	int i, err;
> +
> +	host1x_sync_writel(host1x, 0xffffffffUL,
> +		HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE);
> +	host1x_sync_writel(host1x, 0xffffffffUL,
> +		HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS);
> +
> +	for (i = 0; i < host1x->info.nb_pts; i++)
> +		INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
> +
> +	err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
> +				syncpt_thresh_cascade_isr,
> +				IRQF_SHARED, "host1x_syncpt", host1x);
> +	WARN_ON(IS_ERR_VALUE(err));

Do we really want to continue in this case?

> +static void host1x_intr_set_syncpt_threshold(struct host1x_intr *intr,
> +	u32 id, u32 thresh)
> +{
> +	struct host1x *host1x = intr_to_host1x(intr);
> +	host1x_sync_writel(host1x, thresh,
> +		HOST1X_SYNC_SYNCPT_INT_THRESH_0 + id * REGISTER_STRIDE);
> +}

Again, maybe defining the register stride as part of the register
definition might be better. I think HOST1X_SYNC_SYNCPT_INT_THRESH(id) is
easier to read.

> +static void host1x_intr_enable_syncpt_intr(struct host1x_intr *intr, u32 id)
> +{
> +	struct host1x *host1x = intr_to_host1x(intr);
> +
> +	host1x_sync_writel(host1x, BIT_MASK(id),
> +			HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 +
> +			BIT_WORD(id) * REGISTER_STRIDE);
> +}

Same here.

> +static void host1x_intr_disable_syncpt_intr(struct host1x_intr *intr, u32 id)
> +{
> +	struct host1x *host1x = intr_to_host1x(intr);
> +
> +	host1x_sync_writel(host1x, BIT_MASK(id),
> +			HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE +
> +			BIT_WORD(id) * REGISTER_STRIDE);
> +
> +	host1x_sync_writel(host1x, BIT_MASK(id),
> +		HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
> +		BIT_WORD(id) * REGISTER_STRIDE);
> +}

And here.

> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
[...]
> +#include "intr.h"
> +#include <linux/interrupt.h>
> +#include <linux/slab.h>
> +#include <linux/irq.h>
> +#include "dev.h"

More funky ordering of includes.

> +int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
> +			enum host1x_intr_action action, void *data,
> +			void *_waiter,
> +			void **ref)

Why do you pass in _waiter as void * and not struct host1x_waitlist *?

I think I've said this before. The interface doesn't seem optimal to me
here. Passing in an enumeration to choose which action to perform looks
difficult to work with (not to mention the symbols are rather long and
therefore result in ugly code).

Maybe doing this by passing around a pointer to a handler function would
be nicer. However since I haven't really used this yet, I can't really
tell. So maybe we should just merge the implementation as-is for now. We
can always clean it up later.

> +void *host1x_intr_alloc_waiter(void)
> +{
> +	return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
> +}

I'm not sure why this is separate from host1x_syncpt_wait() since it is
only used inside that function and the waiter returned never leaves the
scope of that function, so it might be better to allocate it directly
in host1x_syncpt_wait() instead.

Actually, it looks like the waiter doesn't ever leave scope, so you may
even want to allocate it on the stack.

> +void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)

Here again, you pass in the waiter via a void *. Why's that?

> +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)

Maybe you should keep the type of the irq_sync here so that it properly
propagates to the call to devm_request_irq().

> +{
> +	unsigned int id;
> +	struct host1x *host1x = intr_to_host1x(intr);
> +	u32 nb_pts = host1x_syncpt_nb_pts(host1x);
> +
> +	intr->syncpt = devm_kzalloc(&host1x->dev->dev,
> +			sizeof(struct host1x_intr_syncpt) *
> +			host1x->info.nb_pts,
> +			GFP_KERNEL);
> +
> +	if (!host1x->intr.syncpt)

The above blank line isn't necessary.

> +void host1x_intr_stop(struct host1x_intr *intr)
> +{
> +	unsigned int id;
> +	struct host1x *host1x = intr_to_host1x(intr);
> +	struct host1x_intr_syncpt *syncpt;
> +	u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
> +
> +	mutex_lock(&intr->mutex);
> +
> +	host1x->intr_op.disable_all_syncpt_intrs(intr);

I haven't commented on this everywhere, but I think this could benefit
from a wrapper that forwards this to the intr_op. The same goes for the
sync_op.

> +	for (id = 0, syncpt = intr->syncpt;
> +	     id < nb_pts;
> +	     ++id, ++syncpt) {

I don't think you need to explicitly keep track of syncpt within the for
statement. Instead you could either index intr->syncpt directly or
obtain a reference within the loop. It allows the for statement to be
written much more canonically.

> diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
[...]
> +#define intr_syncpt_to_intr(is) (is->intr)

This one doesn't buy you anything. It actually uses up more characters
so you can just drop it.

> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
[...]
> @@ -119,6 +122,166 @@ void host1x_syncpt_incr(struct host1x_syncpt *sp)
>  	host1x_syncpt_cpu_incr(sp);
>  }
>  
> +/*
> + * Updated sync point form hardware, and returns true if syncpoint is expired,
> + * false if we may need to wait
> + */
> +static bool syncpt_load_min_is_expired(
> +	struct host1x_syncpt *sp,
> +	u32 thresh)

This can all go on one line.

> +/*
> + * Main entrypoint for syncpoint value waits.
> + */
> +int host1x_syncpt_wait(struct host1x_syncpt *sp,
> +			u32 thresh, long timeout, u32 *value)
> +{
[...]
> +}
> +EXPORT_SYMBOL(host1x_syncpt_wait);

This doesn't only seem to be the main entrypoint, but it's basically the
only way to currently wait for syncpoints. One actual use-case where
this might turn out to be a problem is video capturing. The problem is
that using this API you can't very well asynchronously capture frames.
So eventually I think we need a way to allow a generic handler to be
attached to syncpoints so that you can have this handler continuously
invoked after each frame is captured and just pass the buffer back to
userspace.

> +bool host1x_syncpt_is_expired(
> +	struct host1x_syncpt *sp,
> +	u32 thresh)

This can go on one line.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
  2013-01-15 11:44 ` [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support Terje Bergstrom
@ 2013-02-04 11:03   ` Thierry Reding
  2013-02-05  4:41     ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-04 11:03 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: amerilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3620 bytes --]

On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
[...]
> +static pid_t host1x_debug_null_kickoff_pid;
> +unsigned int host1x_debug_trace_cmdbuf;
> +
> +static pid_t host1x_debug_force_timeout_pid;
> +static u32 host1x_debug_force_timeout_val;
> +static u32 host1x_debug_force_timeout_channel;

Please group static and non-static variables.

> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
[...]
> +struct output {
> +	void (*fn)(void *ctx, const char *str, size_t len);
> +	void *ctx;
> +	char buf[256];
> +};

Do we really need this kind of abstraction? There really should be only
one location where debug information is obtained, so I don't see a need
for this.

> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
>  struct host1x_syncpt_ops {
>  	void (*reset)(struct host1x_syncpt *);
>  	void (*reset_wait_base)(struct host1x_syncpt *);
> @@ -117,6 +133,7 @@ struct host1x {
>  	struct host1x_channel_ops channel_op;
>  	struct host1x_cdma_ops cdma_op;
>  	struct host1x_pushbuffer_ops cdma_pb_op;
> +	struct host1x_debug_ops debug_op;
>  	struct host1x_syncpt_ops syncpt_op;
>  	struct host1x_intr_ops intr_op;

Again, better to pass in a const pointer to the ops structure.

> diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c

> +static int show_channel_command(struct output *o, u32 addr, u32 val, int *count)
> +{
> +	unsigned mask;
> +	unsigned subop;
> +
> +	switch (val >> 28) {
> +	case 0x0:

These can easily be derived by looking at the debug output, but it may
still make sense to assign symbolic names to them.

> +static void show_channel_word(struct output *o, int *state, int *count,
> +		u32 addr, u32 val, struct host1x_cdma *cdma)
> +{
> +	static int start_count, dont_print;

What if two processes read debug information at the same time?

> +static void do_show_channel_gather(struct output *o,
> +		phys_addr_t phys_addr,
> +		u32 words, struct host1x_cdma *cdma,
> +		phys_addr_t pin_addr, u32 *map_addr)
> +{
> +	/* Map dmaget cursor to corresponding mem handle */
> +	u32 offset;
> +	int state, count, i;
> +
> +	offset = phys_addr - pin_addr;
> +	/*
> +	 * Sometimes we're given different hardware address to the same
> +	 * page - in these cases the offset will get an invalid number and
> +	 * we just have to bail out.
> +	 */

Why's that?

> +	map_addr = host1x_memmgr_mmap(mem);
> +	if (!map_addr) {
> +		host1x_debug_output(o, "[could not mmap]\n");
> +		return;
> +	}
> +
> +	/* Get base address from mem */
> +	sgt = host1x_memmgr_pin(mem);
> +	if (IS_ERR(sgt)) {
> +		host1x_debug_output(o, "[couldn't pin]\n");
> +		host1x_memmgr_munmap(mem, map_addr);
> +		return;
> +	}

Maybe you should stick with one of "could not" or "couldn't". Makes it
easier to search for.

> +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
> +{
> +	struct host1x_job *job;
> +
> +	list_for_each_entry(job, &cdma->sync_queue, list) {
> +		int i;
> +		host1x_debug_output(o,
> +				"\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
> +				" first_get=%08x, timeout=%d"
> +				" num_slots=%d, num_handles=%d\n",
> +				job,
> +				job->syncpt_id,
> +				job->syncpt_end,
> +				job->first_get,
> +				job->timeout,
> +				job->num_slots,
> +				job->num_unpins);

This could go on fewer lines.

> +static void host1x_debug_show_channel_cdma(struct host1x *m,
> +	struct host1x_channel *ch, struct output *o, int chid)
> +{
[...]
> +	switch (cbstat) {
> +	case 0x00010008:

Again, symbolic names would be nice.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x
  2013-01-15 11:44 ` [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x Terje Bergstrom
@ 2013-02-04 11:08   ` Thierry Reding
  2013-02-05  4:45     ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-04 11:08 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: amerilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1937 bytes --]

On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote:
[...]
> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
> index 697d49a..ffc8bf1 100644
> --- a/drivers/gpu/host1x/Makefile
> +++ b/drivers/gpu/host1x/Makefile
> @@ -12,4 +12,10 @@ host1x-y = \
>  	hw/host1x01.o
>  
>  host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
> +
> +ccflags-y += -Iinclude/drm
> +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
> +
> +host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o
> +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
>  obj-$(CONFIG_TEGRA_HOST1X) += host1x.o

Can this be moved into a separate Makefile in the drm subdirectory?

> diff --git a/drivers/gpu/host1x/host1x_client.h b/drivers/gpu/host1x/host1x_client.h
[...]
> new file mode 100644
> index 0000000..fdd2920
> --- /dev/null
> +++ b/drivers/gpu/host1x/host1x_client.h
> @@ -0,0 +1,25 @@
> +/*
> + * Copyright (c) 2013, NVIDIA Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HOST1X_CLIENT_H
> +#define HOST1X_CLIENT_H
> +
> +struct platform_device;
> +
> +void host1x_set_drm_data(struct platform_device *pdev, void *data);
> +void *host1x_get_drm_data(struct platform_device *pdev);
> +
> +#endif

These aren't defined or used yet.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 6/8] gpu: host1x: Remove second host1x driver
  2013-01-15 11:44 ` [PATCHv5,RESEND 6/8] gpu: host1x: Remove second host1x driver Terje Bergstrom
@ 2013-02-04 11:23   ` Thierry Reding
  0 siblings, 0 replies; 49+ messages in thread
From: Thierry Reding @ 2013-02-04 11:23 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: amerilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

On Tue, Jan 15, 2013 at 01:44:02PM +0200, Terje Bergstrom wrote:
[...]
> +void host1x_set_drm_data(struct platform_device *pdev, void *data)
> +{
> +	struct host1x *host1x = platform_get_drvdata(pdev);
> +	host1x->drm_data = data;
> +}
> +
> +void *host1x_get_drm_data(struct platform_device *pdev)
> +{
> +	struct host1x *host1x = platform_get_drvdata(pdev);
> +	return host1x->drm_data;
> +}

Passing around struct device * should be enough and avoids the need for
the explicit cast to struct platform_device.

It is a bit unfortunate that we have now have two structures called
host1x, but I think we can live with it for now. We can clean that up
once the code has been merged.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks
  2013-01-15 11:44 ` [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks Terje Bergstrom
@ 2013-02-04 11:26   ` Thierry Reding
  2013-02-04 17:06     ` Stephen Warren
  2013-02-05  4:47     ` Terje Bergström
  0 siblings, 2 replies; 49+ messages in thread
From: Thierry Reding @ 2013-02-04 11:26 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: amerilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
> Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
> of 2D clock to that driver alias.
> 
> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
> ---
>  arch/arm/mach-tegra/board-dt-tegra20.c    |    1 +
>  arch/arm/mach-tegra/board-dt-tegra30.c    |    1 +
>  arch/arm/mach-tegra/tegra20_clocks_data.c |    2 +-
>  arch/arm/mach-tegra/tegra30_clocks_data.c |    1 +
>  4 files changed, 4 insertions(+), 1 deletion(-)

With Prashant's clock rework patches now merged this patch can be
dropped.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-01-15 11:44 ` [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device Terje Bergstrom
@ 2013-02-04 12:56   ` Thierry Reding
  2013-02-05  5:17     ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-04 12:56 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: amerilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 13518 bytes --]

On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote:
[...]
> diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
> @@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)
>  
>  static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>  {
> -	return 0;
> +	struct host1x_drm_fpriv *fpriv;
> +	int err = 0;

Can be dropped.

> +
> +	fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
> +	if (!fpriv)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&fpriv->contexts);
> +	filp->driver_priv = fpriv;
> +
> +	return err;

return 0;

> +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
> +{
> +	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp);
> +	struct host1x_drm_context *context, *tmp;
> +
> +	list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
> +		context->client->ops->close_channel(context);
> +		kfree(context);
> +	}
> +	kfree(fpriv);
>  }

Maybe you should add host1x_drm_context_free() to wrap the loop
contents?

> @@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm)
>  	drm_fbdev_cma_restore_mode(host1x->fbdev);
>  }
>  
> +static int
> +tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
> +			 struct drm_file *file_priv)

static int and function name on one line, please.

> +{
> +	struct host1x *host1x = drm->dev_private;
> +	struct tegra_drm_syncpt_read_args *args = data;
> +	struct host1x_syncpt *sp =
> +		host1x_syncpt_get_bydev(host1x->dev, args->id);

I don't know if we need this, except maybe to work around the problem
that we have two different structures named host1x. The _bydev() suffix
is misleading because all you really do here is obtain the syncpt from
the host1x.

> +static int
> +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
> +			 struct drm_file *file_priv)
> +{
> +	struct tegra_drm_open_channel_args *args = data;
> +	struct host1x_client *client;
> +	struct host1x_drm_context *context;
> +	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
> +	struct host1x *host1x = drm->dev_private;
> +	int err = 0;

err = -ENODEV; (see below)

> +
> +	context = kzalloc(sizeof(*context), GFP_KERNEL);
> +	if (!context)
> +		return -ENOMEM;
> +
> +	list_for_each_entry(client, &host1x->clients, list) {
> +		if (client->class == args->class) {
> +			context->client = client;
> +			err = client->ops->open_channel(client, context);
> +			if (err)
> +				goto out;
> +
> +			list_add(&context->list, &fpriv->contexts);
> +			args->context = (uintptr_t)context;

Perhaps cast this to __u64 directly instead? There's little sense in
taking the detour via uintptr_t.

> +			goto out;

return 0;

> +		}
> +	}
> +	err = -ENODEV;
> +
> +out:
> +	if (err)
> +		kfree(context);
> +
> +	return err;
> +}

Then this simply becomes:

	kfree(context);
	return err;

> +static int
> +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
> +			 struct drm_file *file_priv)
> +{
> +	struct tegra_drm_open_channel_args *args = data;
> +	struct host1x_drm_context *context, *tmp;
> +	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
> +	int err = 0;
> +
> +	list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
> +		if ((uintptr_t)context == args->context) {
> +			context->client->ops->close_channel(context);
> +			list_del(&context->list);
> +			kfree(context);
> +			goto out;
> +		}
> +	}
> +	err = -EINVAL;
> +
> +out:
> +	return err;
> +}

Same comments as for tegra_drm_ioctl_open_channel().

> +static int
> +tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
> +			 struct drm_file *file_priv)
> +{
> +	struct tegra_drm_get_channel_param_args *args = data;
> +	struct host1x_drm_context *context;
> +	struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
> +	int err = 0;
> +
> +	list_for_each_entry(context, &fpriv->contexts, list) {
> +		if ((uintptr_t)context == args->context) {
> +			args->value =
> +				context->client->ops->get_syncpoint(context,
> +						args->param);
> +			goto out;
> +		}
> +	}
> +	err = -ENODEV;
> +
> +out:
> +	return err;
> +}

Same comments as well. Also you may want to factor out the context
lookup into a separate function so you don't have to repeat the same
code over and over again.

I wonder if we shouldn't remove .get_syncpoint() from the client ops and
replace it by a simple array instead. The only use-case for this is if a
client wants more than a single syncpoint, right? In that case just keep
an array of syncpoints and the number of syncpoints per client.
Otherwise each client will have to rewrite the same function.

Also, how useful is it to create a context? Looking at the gr2d
implementation for .open_channel(), it will return the same channel to
whichever userspace process requests them. Can you explain why it is
necessary at all? From the name I would have expected some kind of
context switching to take place when different applications submit
requests to the same client, but that doesn't seem to be the case.

> +static int
> +tegra_drm_create_ioctl(struct drm_device *drm, void *data,
> +			 struct drm_file *file_priv)

tegra_drm_gem_create_ioctl() please.

>  static struct drm_ioctl_desc tegra_drm_ioctls[] = {
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE,
> +			tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH),

TEGRA_DRM_GEM_CREATE

>  static const struct file_operations tegra_drm_fops = {
> @@ -303,6 +526,7 @@ struct drm_driver tegra_drm_driver = {
>  	.load = tegra_drm_load,
>  	.unload = tegra_drm_unload,
>  	.open = tegra_drm_open,
> +	.preclose = tegra_drm_close,

I think it'd make sense to name the function tegra_drm_preclose() to
match the name in struct drm_driver.

> diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h
[...]
> +struct host1x_drm_fpriv {
> +	struct list_head contexts;
>  };

Maybe name this host1x_drm_file. fpriv isn't very specific.

> +static inline struct host1x_drm_fpriv *
> +host1x_drm_fpriv(struct drm_file *file_priv)
> +{
> +	return file_priv ? file_priv->driver_priv : NULL;
> +}

I think it's fine to just directly do filp->driver_priv instead of going
through this wrapper.

>  struct host1x_client {
>  	struct host1x *host1x;
>  	struct device *dev;
>  
>  	const struct host1x_client_ops *ops;
>  
> +	u32 class;

Should this perhaps be an enum?

> diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
[...]
> +static u32 gr2d_get_syncpoint(struct host1x_drm_context *context, int index)
> +{
> +	struct gr2d *gr2d = dev_get_drvdata(context->client->dev);
> +	if (index != 0)
> +		return UINT_MAX;
> +
> +	return host1x_syncpt_id(gr2d->syncpt);
> +}

Maybe get_syncpoint() should return int and negative error codes on
failure. That still leaves room for 2^31 possible syncpoints.

> +static u32 handle_cma_to_host1x(struct drm_device *drm,
> +				struct drm_file *file_priv, u32 gem_handle)
> +{
> +	struct drm_gem_object *obj;
> +	struct drm_gem_cma_object *cma_obj;
> +	u32 host1x_handle;
> +
> +	obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
> +	if (!obj)
> +		return 0;
> +
> +	cma_obj = to_drm_gem_cma_obj(obj);
> +	host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
> +	drm_gem_object_unreference(obj);
> +
> +	return host1x_handle;
> +}

I though we had settled in previous reviews on only having a single
allocator and not do the conversion between various types?

> +static int gr2d_submit(struct host1x_drm_context *context,
> +		struct tegra_drm_submit_args *args,
> +		struct drm_device *drm,
> +		struct drm_file *file_priv)
> +{
> +	struct host1x_job *job;
> +	int num_cmdbufs = args->num_cmdbufs;
> +	int num_relocs = args->num_relocs;
> +	int num_waitchks = args->num_waitchks;
> +	struct tegra_drm_cmdbuf __user *cmdbufs =
> +		(void * __user)(uintptr_t)args->cmdbufs;
> +	struct tegra_drm_reloc __user *relocs =
> +		(void * __user)(uintptr_t)args->relocs;
> +	struct tegra_drm_waitchk __user *waitchks =
> +		(void * __user)(uintptr_t)args->waitchks;

No need for all the uintptr_t casts.

> +	struct tegra_drm_syncpt_incr syncpt_incr;
> +	int err;
> +
> +	/* We don't yet support other than one syncpt_incr struct per submit */
> +	if (args->num_syncpt_incrs != 1)
> +		return -EINVAL;
> +
> +	job = host1x_job_alloc(context->channel,
> +			args->num_cmdbufs,
> +			args->num_relocs,
> +			args->num_waitchks);
> +	if (!job)
> +		return -ENOMEM;
> +
> +	job->num_relocs = args->num_relocs;
> +	job->num_waitchk = args->num_waitchks;
> +	job->clientid = (u32)args->context;
> +	job->class = context->client->class;
> +	job->serialize = true;
> +
> +	while (num_cmdbufs) {
> +		struct tegra_drm_cmdbuf cmdbuf;
> +		err = copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf));
> +		if (err)
> +			goto fail;
> +
> +		cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, cmdbuf.mem);
> +		if (!cmdbuf.mem)
> +			goto fail;
> +
> +		host1x_job_add_gather(job,
> +				cmdbuf.mem, cmdbuf.words, cmdbuf.offset);
> +		num_cmdbufs--;
> +		cmdbufs++;
> +	}
> +
> +	err = copy_from_user(job->relocarray,
> +			relocs, sizeof(*relocs) * num_relocs);
> +	if (err)
> +		goto fail;
> +
> +	while (num_relocs--) {
> +		job->relocarray[num_relocs].cmdbuf_mem =
> +			handle_cma_to_host1x(drm, file_priv,
> +			job->relocarray[num_relocs].cmdbuf_mem);
> +		job->relocarray[num_relocs].target =
> +			handle_cma_to_host1x(drm, file_priv,
> +			job->relocarray[num_relocs].target);
> +
> +		if (!job->relocarray[num_relocs].target ||
> +			!job->relocarray[num_relocs].cmdbuf_mem)
> +			goto fail;
> +	}
> +
> +	err = copy_from_user(job->waitchk,
> +			waitchks, sizeof(*waitchks) * num_waitchks);
> +	if (err)
> +		goto fail;
> +
> +	err = host1x_job_pin(job, to_platform_device(context->client->dev));
> +	if (err)
> +		goto fail;
> +
> +	err = copy_from_user(&syncpt_incr,
> +			(void * __user)(uintptr_t)args->syncpt_incrs,
> +			sizeof(syncpt_incr));
> +	if (err)
> +		goto fail;
> +
> +	job->syncpt_id = syncpt_incr.syncpt_id;
> +	job->syncpt_incrs = syncpt_incr.syncpt_incrs;
> +	job->timeout = 10000;
> +	job->is_addr_reg = gr2d_is_addr_reg;
> +	if (args->timeout && args->timeout < 10000)
> +		job->timeout = args->timeout;
> +
> +	err = host1x_channel_submit(job);
> +	if (err)
> +		goto fail_submit;
> +
> +	args->fence = job->syncpt_end;
> +
> +	host1x_job_put(job);
> +	return 0;
> +
> +fail_submit:
> +	host1x_job_unpin(job);
> +fail:
> +	host1x_job_put(job);
> +	return err;
> +}

Most of this looks very generic. Can't it be split out into separate
functions and reused in other (gr3d) modules?

> +static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg)
> +{
> +	int ret;
> +
> +	if (class == NV_HOST1X_CLASS_ID)
> +		ret = reg == 0x2b;
> +	else
> +		switch (reg) {
> +		case 0x1a:
> +		case 0x1b:
> +		case 0x26:
> +		case 0x2b:
> +		case 0x2c:
> +		case 0x2d:
> +		case 0x31:
> +		case 0x32:
> +		case 0x48:
> +		case 0x49:
> +		case 0x4a:
> +		case 0x4b:
> +		case 0x4c:
> +			ret = 1;
> +			break;
> +		default:
> +			ret = 0;
> +			break;
> +		}
> +
> +	return ret;
> +}

I should probably bite the bullet and read through the (still) huge
patch 3 to understand exactly why this is needed.

> +static struct of_device_id gr2d_match[] = {

static const please.

> +static int __exit gr2d_remove(struct platform_device *dev)
> +{
> +	struct host1x *host1x =
> +		host1x_get_drm_data(to_platform_device(dev->dev.parent));
> +	struct gr2d *gr2d = platform_get_drvdata(dev);
> +	int err;
> +
> +	err = host1x_unregister_client(host1x, &gr2d->client);
> +	if (err < 0) {
> +		dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
> +			err);
> +		return err;
> +	}
> +
> +	host1x_syncpt_free(gr2d->syncpt);
> +	return 0;
> +}

Isn't this missing a host1x_channel_put() or host1x_free_channel()?

> diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
[...]
> +struct tegra_gem_create {
> +	__u64 size;
> +	unsigned int flags;
> +	unsigned int handle;
> +	unsigned int offset;
> +};

I think it's better to consistently use the explicitly sized types here.

> +struct tegra_gem_invalidate {
> +	unsigned int handle;
> +};
> +
> +struct tegra_gem_flush {
> +	unsigned int handle;
> +};

Where are these used?

> +struct tegra_drm_syncpt_wait_args {
> +	__u32 id;
> +	__u32 thresh;
> +	__s32 timeout;
> +	__u32 value;
> +};
> +
> +#define DRM_TEGRA_NO_TIMEOUT	(-1)

Is this the only reason why timeout is signed? If so maybe a better
choice would be __u32 and DRM_TEGRA_NO_TIMEOUT 0xffffffff.

> +struct tegra_drm_get_channel_param_args {
> +	__u64 context;
> +	__u32 param;
> +	__u32 value;
> +};

What's the reason for not calling this tegra_drm_get_syncpoint?

> +struct tegra_drm_syncpt_incr {
> +	__u32 syncpt_id;
> +	__u32 syncpt_incrs;
> +};

Maybe the fields would be better named id and incrs. Though I also
notice that incrs is never used. I guess that's supposed to be used in
the future to allow increments by more than a single value. If so,
perhaps value would be a better name.

Now on to the dreaded patch 3...

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks
  2013-02-04 11:26   ` Thierry Reding
@ 2013-02-04 17:06     ` Stephen Warren
  2013-02-05  4:47     ` Terje Bergström
  1 sibling, 0 replies; 49+ messages in thread
From: Stephen Warren @ 2013-02-04 17:06 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Terje Bergstrom, amerilainen, airlied, dri-devel, linux-tegra,
	linux-kernel

On 02/04/2013 04:26 AM, Thierry Reding wrote:
> On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
>> Add a driver alias gr2d for Tegra 2D device, and assign a
>> duplicate of 2D clock to that driver alias.
>> 
>> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> --- 
>> arch/arm/mach-tegra/board-dt-tegra20.c    |    1 + 
>> arch/arm/mach-tegra/board-dt-tegra30.c    |    1 + 
>> arch/arm/mach-tegra/tegra20_clocks_data.c |    2 +- 
>> arch/arm/mach-tegra/tegra30_clocks_data.c |    1 + 4 files
>> changed, 4 insertions(+), 1 deletion(-)
> 
> With Prashant's clock rework patches now merged this patch can be 
> dropped.

Assuming this series is applied for 3.10 and not earlier, yes. I'd
certainly recommend applying for 3.10 not 3.9; the dependencies to
apply this for 3.9 given the AUXDATA/... requirements would be too
painful.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver
  2013-02-04  9:09   ` Thierry Reding
@ 2013-02-05  3:30     ` Terje Bergström
  2013-02-05  7:43       ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-05  3:30 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 04.02.2013 01:09, Thierry Reding wrote:
> On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
>> Add host1x, the driver for host1x and its client unit 2D.
> 
> Maybe this could be a bit more verbose. Perhaps describe what host1x is.

Sure. I could just steal the paragraph from Stephen:

The Tegra host1x module is the DMA engine for register access to Tegra's
graphics- and multimedia-related modules. The modules served by host1x
are referred to as clients. host1x includes some other  functionality,
such as synchronization.

>> diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
> [...]
>> @@ -0,0 +1,6 @@
>> +config TEGRA_HOST1X
>> +     tristate "Tegra host1x driver"
>> +     help
>> +       Driver for the Tegra host1x hardware.
> 
> Maybe s/Tegra/NVIDIA Tegra/?

Sounds good.

>> +
>> +       Required for enabling tegradrm.
> 
> This should probably be dropped. Either encode such knowledge as
> explicit dependencies or in this case just remove it altogether since we
> will probably merge both drivers anyway.

I think this was left from previous versions. Now it just doesn't make
sense. I'll just drop it.

> 
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> [...]
>> +#include <linux/module.h>
>> +#include <linux/list.h>
>> +#include <linux/slab.h>
>> +#include <linux/of.h>
>> +#include <linux/of_device.h>
>> +#include <linux/clk.h>
>> +#include <linux/io.h>
>> +#include "dev.h"
> 
> Maybe add a blank line between the previous two lines to visually
> separate standard Linux includes from driver-specific ones.

Ok. You commented in quite few places in a similar way. I'll fix all of
them to first include system includes, then driver's own includes, and
add a blank line in between.

> 
>> +#include "hw/host1x01.h"
>> +
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/host1x.h>
>> +
>> +#define DRIVER_NAME          "tegra-host1x"
> 
> You only ever use this once, so maybe it can just be dropped?

Yes.

> 
>> +static struct host1x_device_info host1x_info = {
> 
> Perhaps this should be host1x01_info in order to match the hardware
> revision? That'll avoid it having to be renamed later on when other
> revisions start to appear.

Ok, will do. I thought it'd be awkward being alone until the second
version appears, but I'll add it.

> 
>> +static int host1x_probe(struct platform_device *dev)
>> +{
> [...]
>> +     syncpt_irq = platform_get_irq(dev, 0);
>> +     if (IS_ERR_VALUE(syncpt_irq)) {
> 
> This is somewhat unusual. It should be fine to just do:
> 
>         if (syncpt_irq < 0)
> 
> but IS_ERR_VALUE() should work fine too.

I'll use the simpler version.

> 
>> +     memcpy(&host->info, devid->data, sizeof(struct host1x_device_info));
> 
> Why not make the .info field a pointer to struct host1x_device_info
> instead? That way you don't have to keep separate copies of the same
> information.

This had something to do with __init data and non-init data. But, we're
not really putting this data into __init, so we should be able to use
just a pointer.

> 
>> +     /* set common host1x device data */
>> +     platform_set_drvdata(dev, host);
>> +
>> +     host->regs = devm_request_and_ioremap(&dev->dev, regs);
>> +     if (!host->regs) {
>> +             dev_err(&dev->dev, "failed to remap host registers\n");
>> +             return -ENXIO;
>> +     }
> 
> This should probably be rewritten as:
> 
>         host->regs = devm_ioremap_resource(&dev->dev, regs);
>         if (IS_ERR(host->regs))
>                 return PTR_ERR(host->regs);
> 
> Though that function will only be available in 3.9-rc1.

Ok, 3.9-rc1 is fine as a basis.

>> +     err = host1x_syncpt_init(host);
>> +     if (err)
>> +             return err;
> [...]
>> +     host1x_syncpt_reset(host);
> 
> Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why
> it might be useful to have host1x_syncpt_reset() as a separate function
> but couldn't it be called as part of host1x_syncpt_init()?

host1x_syncpt_init() is used for initializing the syncpt structures, and
is called in probe. host1x_syncpt_reset() should be called whenever we
think hardware state is lost, for example if VDD_CORE was rail gated due
to system suspend.

> 
>> +     dev_info(&dev->dev, "initialized\n");
> 
> I don't think this is very useful. We should make sure to tell people
> when things fail. When everything goes as planned we don't need to brag
> about it =)

True. I wish other kernel drivers followed that same philosophy. :-)
I'll remove that. It's mainly useful as debug help, but it's as easy to
check from sysfs the state.

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> [...]
>> +struct host1x_syncpt_ops {
> [...]
>> +     const char * (*name)(struct host1x_syncpt *);
> 
> Why do we need this? Could we not refer to the syncpt name directly
> instead of going through this wrapper? I'd expect the name to be static.

This must be a relic. I'll remove the wrapper.

> 
>> +struct host1x_device_info {
> 
> Maybe this should be called simply host1x_info? _device seems redundant.

Sure.

> 
>> +     int     nb_channels;            /* host1x: num channels supported */
>> +     int     nb_pts;                 /* host1x: num syncpoints supported */
>> +     int     nb_bases;               /* host1x: num syncpoints supported */
>> +     int     nb_mlocks;              /* host1x: number of mlocks */
>> +     int     (*init)(struct host1x *); /* initialize per SoC ops */
>> +     int     sync_offset;
>> +};
> 
> While this isn't public API, maybe it would still be useful to turn the
> comments into proper kerneldoc? That's what people are used to.

Ok.

> 
>> +struct host1x {
>> +     void __iomem *regs;
>> +     struct host1x_syncpt *syncpt;
>> +     struct platform_device *dev;
>> +     struct host1x_device_info info;
>> +     struct clk *clk;
>> +
>> +     struct host1x_syncpt_ops syncpt_op;
> 
> Maybe make this a struct host1x_syncpt_ops * instead so you don't have
> separate copies? While at it, maybe this should be const as well.

Sounds good. I guess there are other areas in need of a const, too.

> 
>> +     struct dentry *debugfs;
> 
> This doesn't seem to be used anywhere.

It's a failed split - it's used in the debug patch (4/8).

> 
>> +static inline
>> +struct host1x *host1x_get_host(struct platform_device *_dev)
>> +{
>> +     struct platform_device *pdev;
>> +
>> +     if (_dev->dev.parent) {
>> +             pdev = to_platform_device(_dev->dev.parent);
>> +             return platform_get_drvdata(pdev);
>> +     } else
>> +             return platform_get_drvdata(_dev);
>> +}
> 
> There is a lot of needless casting in here. Why not pass in a struct
> device * and use dev_get_drvdata() instead?

Hmm, true, this should fit into smaller space.

> 
>> diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
> [...]
>> +#include "hw/host1x01.h"
>> +#include "dev.h"
>> +#include "hw/host1x01_hardware.h"
> 
> The ordering here looks funny.

I'll make it more alphabetic.

> 
>> +#include "hw/syncpt_hw.c"
> 
> Why include the source file here? Can't you compile it separately
> instead?

It's because we need to compile with the hardware headers of that host1x
version, because we haven't been good at keeping compatibility. So
host1x01.c #includes version 01 headers, and syncpt_hw.c in this
compilation unit gets compiled with that. 02 would include 02 headers,
and syncpt_hw.c would get compiled with its register definitions etc.

> 
>> diff --git a/drivers/gpu/host1x/hw/host1x01.h b/drivers/gpu/host1x/hw/host1x01.h
> [...]
>> +int host1x01_init(struct host1x *);
> 
> For completeness you should probably name the parameter, even if this is
> a prototype.

Ok.

> 
>> diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
> [...]
>> +#include <linux/types.h>
>> +#include <linux/bitops.h>
>> +#include "hw_host1x01_sync.h"
> 
> Again, a blank line might help between the above two. I also assume that
> this file will be filled with more content later on, so I guess it's not
> worth the trouble to postpone it's creation until a later point.

Yeah, most of the content gets added by the dreaded patch 3.

> 
>> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
> [...]
>> +static inline u32 host1x_sync_syncpt_0_r(void)
>> +{
>> +     return 0x400;
>> +}
>> +#define HOST1X_SYNC_SYNCPT_0 \
>> +     host1x_sync_syncpt_0_r()
>> +static inline u32 host1x_sync_syncpt_base_0_r(void)
>> +{
>> +     return 0x600;
>> +}
>> +#define HOST1X_SYNC_SYNCPT_BASE_0 \
>> +     host1x_sync_syncpt_base_0_r()
>> +static inline u32 host1x_sync_syncpt_cpu_incr_r(void)
>> +{
>> +     return 0x700;
>> +}
> 
> Perhaps it would be useful to modify these to take the syncpt ID as a
> parameter? That way you don't have to remember to do the multiplication
> everytime you access the register?

Yeah, sounds good.

> 
>> diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
> [...]
>> +/*
>> + * Updates the last value read from hardware.
>> + * (was host1x_syncpt_load_min)
> 
> Can the comment in () not be dropped? Given that this is new code nobody
> would know about the old name.

Yes, it should be dropped.

> 
>> + */
>> +static u32 syncpt_load_min(struct host1x_syncpt *sp)
>> +{
>> +     struct host1x *dev = sp->dev;
>> +     u32 old, live;
>> +
>> +     do {
>> +             old = host1x_syncpt_read_min(sp);
>> +             live = host1x_sync_readl(dev,
>> +                             HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
>> +     } while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
> 
> I think this warrants a comment.

Sure. It just loops in case there's a race writing to min_val.

> 
>> +     if (!host1x_syncpt_check_max(sp, live))
>> +             dev_err(&dev->dev->dev,
>> +                             "%s failed: id=%u, min=%d, max=%d\n",
>> +                             __func__,
>> +                             sp->id,
>> +                             host1x_syncpt_read_min(sp),
>> +                             host1x_syncpt_read_max(sp));
> 
> You could probably make this fit into less lines.

Yes, definitely. Will do.

> 
>> +/*
>> + * Write a cpu syncpoint increment to the hardware, without touching
>> + * the cache. Caller is responsible for host being powered.
>> + */
> 
> The last part of this comment applies to every host1x function, right?
> So maybe it should just be dropped.

Yeah, we don't really have runtime PM, so host1x is anyway turned on. In
downstream, with dynamic power management, some functions require caller
to ensure power is on, some functions turn on power themselves.

I'll remove these comments, as they do not apply until we have runtime PM.

> 
>> +static void syncpt_debug(struct host1x_syncpt *sp)
>> +{
>> +     u32 i;
>> +     for (i = 0; i < host1x_syncpt_nb_pts(sp->dev); i++) {
>> +             u32 max = host1x_syncpt_read_max(sp);
>> +             u32 min = host1x_syncpt_load_min(sp);
>> +             if (!max && !min)
>> +                     continue;
>> +             dev_info(&sp->dev->dev->dev,
>> +                     "id %d (%s) min %d max %d\n",
>> +                     i, sp->name,
>> +                     min, max);
>> +
>> +     }
> 
> There's a gratuitous blank line above.

Will remove.

> 
>> +
>> +     for (i = 0; i < host1x_syncpt_nb_bases(sp->dev); i++) {
>> +             u32 base_val;
>> +             host1x_syncpt_read_wait_base(sp);
>> +             base_val = sp->base_val;
>> +             if (base_val)
>> +                     dev_info(&sp->dev->dev->dev,
>> +                                     "waitbase id %d val %d\n",
>> +                                     i, base_val);
>> +
>> +     }
> 
> And another one.

Consider it gone.

> 
>> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
> [...]
>> +#include <linux/platform_device.h>
>> +#include <linux/slab.h>
>> +#include <linux/stat.h>
> 
> I don't think this is needed.

Yup, gone.

> 
>> +#include <linux/module.h>
>> +#include "syncpt.h"
>> +#include "dev.h"
>> +#include <trace/events/host1x.h>
> 
> Again, some more spacing would be nice here. And the ordering is a bit
> weird. Maybe put the trace include above syncpt.h and dev.h?

Will do.

> 
>> +#define MAX_SYNCPT_LENGTH    5
> 
> This doesn't seem to be used anywhere.

Yeah, it was an old restriction for length of syncpt name, but as we
moved to dynamic allocation, it doesn't apply.

> 
>> +static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
>> +             struct platform_device *pdev,
>> +             int client_managed);
> 
> Can't you move the actual implementation here? Also I'm not sure if
> passing the platform_device is the best choice here. struct device
> should work just as well.

True, and sp->pdev needs to become struct device *, too.

> 
>> +/*
>> + * Resets syncpoint and waitbase values to sw shadows
>> + */
>> +void host1x_syncpt_reset(struct host1x *dev)
> 
> Maybe host1x_syncpt_flush() would be a better name given the above
> description? Reset does have this hardware reset connotation so my first
> intuition had been that this would reset the syncpt value to 0.

Right, it actually reloads values stored in shadow registers back to
host1x. Flush doesn't feel like it's conveying the meaning. Would
host1x_syncpt_restore() work? That'd match with host1x_syncpt_save(),
which just updates all shadow registers from hardware and is used just
before host1x loses power.

> 
> If you decide to change the name, make sure to change it in the syncpt
> ops as well.

Sure.

> 
>> +/*
>> + * Updates sw shadow state for client managed registers
>> + */
>> +void host1x_syncpt_save(struct host1x *dev)
>> +{
>> +     struct host1x_syncpt *sp_base = dev->syncpt;
>> +     u32 i;
>> +
>> +     for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) {
>> +             if (host1x_syncpt_client_managed(sp_base + i))
>> +                     dev->syncpt_op.load_min(sp_base + i);
>> +             else
>> +                     WARN_ON(!host1x_syncpt_min_eq_max(sp_base + i));
>> +     }
>> +
>> +     for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
>> +             dev->syncpt_op.read_wait_base(sp_base + i);
>> +}
> 
> A similar comment applies here. Though I'm not so sure about a better
> name. Perhaps host1x_syncpt_sync()?
> 
> I know that this must sound all pretty straightforward to you, but for
> somebody who hasn't used these functions at all the names are quite
> confusing. So instead of people to go read the documentation I tend to
> think that making the names as descriptive as possible is essential
> here.

I definitely agree that naming should be descriptive. This is used when
saving host1x state before it loses power, so that's why it's called
host1x_syncpt_save().

But I'm open to changing the naming, if something else would feel more
descriptive.

> 
>> +/*
>> + * Updates the last value read from hardware.
>> + */
>> +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
>> +{
>> +     u32 val;
>> +     val = sp->dev->syncpt_op.load_min(sp);
>> +     trace_host1x_syncpt_load_min(sp->id, val);
>> +
>> +     return val;
>> +}
> 
> I don't know I understand what this means exactly. Does it read the
> value that hardware last incremented? Perhaps this will become clearer
> when you add a comment to the syncpt_load_min() implementation.

It just loads the current syncpt value to shadow register. The shadow
register is called min, because host1x tracks the range of sync point
increments that hardware is still going to do, so min is the lower
boundary of the range.

max tells what the sync point is expected to reach for hardware to be
considered idle.

host1x will f.ex. nop out waits for sync point values outside the range,
because hardware isn't good at handling syncpt value wrapping.

> 
>> +int host1x_syncpt_init(struct host1x *host)
>> +{
>> +     struct host1x_syncpt *syncpt, *sp;
>> +     int i;
>> +
>> +     syncpt = sp = devm_kzalloc(&host->dev->dev,
>> +                     sizeof(struct host1x_syncpt) * host->info.nb_pts,
> 
> You can make this a bit shorter by using sizeof(*sp) instead.

Will do.

> 
>> +     for (i = 0; i < host->info.nb_pts; ++i, ++sp) {
>> +             sp->id = i;
>> +             sp->dev = host;
> 
> Perhaps:
> 
>         syncpt[i].id = i;
>         syncpt[i].dev = host;
> 
> To avoid the need to explicitly keep track of sp?

Sounds good. I usually prefer indexing, so I'm happy with this.

> 
>> +static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
>> +             struct platform_device *pdev,
>> +             int client_managed)
>> +{
>> +     int i;
>> +     struct host1x_syncpt *sp = host->syncpt;
>> +     char *name;
>> +
>> +     for (i = 0; i < host->info.nb_pts && sp->name; i++, sp++)
>> +             ;
>> +     if (sp->pdev)
>> +             return NULL;
>> +
>> +     name = kasprintf(GFP_KERNEL, "%02d-%s", sp->id,
>> +                     pdev ? dev_name(&pdev->dev) : NULL);
>> +     if (!name)
>> +             return NULL;
>> +
>> +     sp->pdev = pdev;
>> +     sp->name = name;
>> +     sp->client_managed = client_managed;
>> +
>> +     return sp;
>> +}
>> +
>> +struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
>> +             int client_managed)
>> +{
>> +     struct host1x *host = host1x_get_host(pdev);
>> +     return _host1x_syncpt_alloc(host, pdev, client_managed);
>> +}
> 
> I think it's enough to keep track of the struct device here instead of
> the struct platform_device.

Yes, I actually managed to comment the same thing earlier.

> 
> Also the syncpoint is not actually allocated here, so maybe
> host1x_syncpt_request() would be a better name. As a nice side-effect it
> makes the naming more similar to the IRQ API and might be easier to work
> with.

I'm not entirely sure about the difference, but isn't the number to be
allocated usually passed to a function ending in _request? Allocate
would just allocate the next available - as host1x_syncpt_allocate does.

> 
>> +struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id)
>> +{
>> +     return dev->syncpt + id;
>> +}
> 
> Should this perhaps do some error checking. What if the specified syncpt
> hasn't actually been requested before?

I'll need to check the use of host1x_syncpt_get(). It might be called
for un-allocated (or requested, if we choose that) syncpoints. An error
check would make sense at least to check that id is smaller than nb_pts.

> 
>> diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
> [...]
>> +struct host1x_syncpt {
>> +     int id;
>> +     atomic_t min_val;
>> +     atomic_t max_val;
>> +     u32 base_val;
>> +     const char *name;
>> +     int client_managed;
> 
> Is this field actually ever used? Looking through the patches none of
> the clients actually set this.

VBLANK should be set client_managed, so a follow-up patch would add a
call from dc.c to here, with client_managed=false.

> 
>> +/*
>> + * Returns true if syncpoint min == max, which means that there are no
>> + * outstanding operations.
>> + */
>> +static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp)
>> +{
>> +     int min, max;
>> +     smp_rmb();
>> +     min = atomic_read(&sp->min_val);
>> +     max = atomic_read(&sp->max_val);
>> +     return (min == max);
>> +}
> 
> Maybe call this host1x_syncpt_idle() or something similar instead?

Sounds fine - although the syncpt itself isn't idle, but the
corresponding client.

> 
>> +{
>> +     return sp->id != NVSYNCPT_INVALID &&
>> +             sp->id < host1x_syncpt_nb_pts(sp->dev);
>> +}
> 
> Is there really a need for NVSYNCPT_INVALID? Even if you want to keep
> the special case you can drop the explicit check because -1 will be
> larger than host1x_syncpt_nb_pts() anyway.

No, it's not really needed, so I'll remove it.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts
  2013-02-04 10:30   ` Thierry Reding
@ 2013-02-05  4:29     ` Terje Bergström
  2013-02-05  8:42       ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-05  4:29 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 04.02.2013 02:30, Thierry Reding wrote:
> On Tue, Jan 15, 2013 at 01:43:58PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> [...]
>> @@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)
>>
>>       /* set common host1x device data */
>>       platform_set_drvdata(dev, host);
>> -
>>       host->regs = devm_request_and_ioremap(&dev->dev, regs);
>>       if (!host->regs) {
>>               dev_err(&dev->dev, "failed to remap host registers\n");
> 
> This seems an unrelated (and actually undesirable) change.
> 
>> @@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev)
>>       }
>>
>>       err = host1x_syncpt_init(host);
>> -     if (err)
>> +     if (err) {
>> +             dev_err(&dev->dev, "failed to init sync points");
>>               return err;
>> +     }
> 
> This error message should probably have gone in the previous patch as
> well.

Oops, will move these to previous patch. I'm pretty sure I already fixed
this once. :-(

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
>> index d8f5979..8376092 100644
>> --- a/drivers/gpu/host1x/dev.h
>> +++ b/drivers/gpu/host1x/dev.h
>> @@ -17,11 +17,12 @@
>>  #ifndef HOST1X_DEV_H
>>  #define HOST1X_DEV_H
>>
>> +#include <linux/platform_device.h>
>>  #include "syncpt.h"
>> +#include "intr.h"
>>
>>  struct host1x;
>>  struct host1x_syncpt;
>> -struct platform_device;
> 
> Why include platform_device.h here?

host1x_get_host() actually needs that, so this #include should've also
been in previous patch.

> 
>> @@ -34,6 +35,18 @@ struct host1x_syncpt_ops {
>>       const char * (*name)(struct host1x_syncpt *);
>>  };
>>
>> +struct host1x_intr_ops {
>> +     void (*init_host_sync)(struct host1x_intr *);
>> +     void (*set_host_clocks_per_usec)(
>> +             struct host1x_intr *, u32 clocks);
> 
> Could the above two not be combined? The only reason to keep them
> separate would be if the host1x clock was dynamically changed, but I
> don't think we support that, right?

I've left this as a placeholder to at some point start supporting host1x
clock scaling. But I don't think we're going to do that for a while, so
I could merge them.

> 
>> +     void (*set_syncpt_threshold)(
>> +             struct host1x_intr *, u32 id, u32 thresh);
>> +     void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
>> +     void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
>> +     void (*disable_all_syncpt_intrs)(struct host1x_intr *);
> 
> Can disable_all_syncpt_intrs() not be implemented generically using the
> number of syncpoints as exposed by host1x_device_info and the
> .disable_syncpt_intr() function?

disable_all_syncpt_intrs() disables all interrupts in one write (or one
per 32 sync points), so it's more efficient.

> 
>> @@ -46,11 +59,13 @@ struct host1x_device_info {
>>  struct host1x {
>>       void __iomem *regs;
>>       struct host1x_syncpt *syncpt;
>> +     struct host1x_intr intr;
>>       struct platform_device *dev;
>>       struct host1x_device_info info;
>>       struct clk *clk;
>>
>>       struct host1x_syncpt_ops syncpt_op;
>> +     struct host1x_intr_ops intr_op;
> 
> I think carrying a const pointer to the interrupt operations structure
> is a better option here.

Ok.

> 
>> diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c
> [...]
>> +static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt);
> 
> Can we avoid this forward declaration?

I think we can, if I just move the isr to top of file.

> 
>> +static void syncpt_thresh_cascade_fn(struct work_struct *work)
> 
> syncpt_thresh_work()?

Sounds good.

> 
>> +{
>> +     struct host1x_intr_syncpt *sp =
>> +             container_of(work, struct host1x_intr_syncpt, work);
>> +     host1x_syncpt_thresh_fn(sp);
> 
> Couldn't we inline the host1x_syncpt_thresh_fn() implementation here?
> Why do we need to go through an external function declaration?

If I move syncpt_thresh_work() to intr.c from intr_hw.c, I could do
that. That'd simplify the interrupt path.

> 
>> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
>> +{
>> +     struct host1x *host1x = dev_id;
>> +     struct host1x_intr *intr = &host1x->intr;
>> +     unsigned long reg;
>> +     int i, id;
>> +
>> +     for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
>> +             reg = host1x_sync_readl(host1x,
>> +                             HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
>> +                             i * REGISTER_STRIDE);
>> +             for_each_set_bit(id, &reg, BITS_PER_LONG) {
>> +                     struct host1x_intr_syncpt *sp =
>> +                             intr->syncpt + (i * BITS_PER_LONG + id);
>> +                     host1x_intr_syncpt_thresh_isr(sp);
> 
> Have you considered mimicking the IRQ API and name this something like
> host1x_intr_syncpt_thresh_handle() and name the actual ISR just
> syncpt_thresh_isr()? Not so important but it makes things a bit clearer
> in my opinion.

This gets a bit confusing, because we have an ISR that calls a function
that is also called ISR. I've kept "isr" in names of both to emphasize
that this is running in interrupt context. I'm open to renaming these to
make it clearer.

Did you refer to chained IRQ handler in linux/irq.h when you mentioned
IRQ API as reference for naming?

> 
>> +                     queue_work(intr->wq, &sp->work);
> 
> Should the call to queue_work() perhaps be moved into
> host1x_intr_syncpt_thresh_isr().

I'm not sure, either way would be ok to me. The current structure allows
host1x_intr_syncpt_thresh_isr() to only take one parameter
(host1x_intr_syncpt). If we move queue_work, we'd also need to pass
host1x_intr.

> 
>> +static void host1x_intr_init_host_sync(struct host1x_intr *intr)
>> +{
>> +     struct host1x *host1x = intr_to_host1x(intr);
>> +     int i, err;
>> +
>> +     host1x_sync_writel(host1x, 0xffffffffUL,
>> +             HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE);
>> +     host1x_sync_writel(host1x, 0xffffffffUL,
>> +             HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS);
>> +
>> +     for (i = 0; i < host1x->info.nb_pts; i++)
>> +             INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
>> +
>> +     err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
>> +                             syncpt_thresh_cascade_isr,
>> +                             IRQF_SHARED, "host1x_syncpt", host1x);
>> +     WARN_ON(IS_ERR_VALUE(err));
> 
> Do we really want to continue in this case?

Hmm, we'd need to actually return an error code. There's not much the
driver can do without syncpt interrupts.

> 
>> +static void host1x_intr_set_syncpt_threshold(struct host1x_intr *intr,
>> +     u32 id, u32 thresh)
>> +{
>> +     struct host1x *host1x = intr_to_host1x(intr);
>> +     host1x_sync_writel(host1x, thresh,
>> +             HOST1X_SYNC_SYNCPT_INT_THRESH_0 + id * REGISTER_STRIDE);
>> +}
> 
> Again, maybe defining the register stride as part of the register
> definition might be better. I think HOST1X_SYNC_SYNCPT_INT_THRESH(id) is
> easier to read.

Sounds good.

> 
>> +static void host1x_intr_enable_syncpt_intr(struct host1x_intr *intr, u32 id)
>> +{
>> +     struct host1x *host1x = intr_to_host1x(intr);
>> +
>> +     host1x_sync_writel(host1x, BIT_MASK(id),
>> +                     HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 +
>> +                     BIT_WORD(id) * REGISTER_STRIDE);
>> +}
> 
> Same here.

Yep.

> 
>> +static void host1x_intr_disable_syncpt_intr(struct host1x_intr *intr, u32 id)
>> +{
>> +     struct host1x *host1x = intr_to_host1x(intr);
>> +
>> +     host1x_sync_writel(host1x, BIT_MASK(id),
>> +                     HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE +
>> +                     BIT_WORD(id) * REGISTER_STRIDE);
>> +
>> +     host1x_sync_writel(host1x, BIT_MASK(id),
>> +             HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
>> +             BIT_WORD(id) * REGISTER_STRIDE);
>> +}
> 
> And here.

Yep.

> 
>> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
> [...]
>> +#include "intr.h"
>> +#include <linux/interrupt.h>
>> +#include <linux/slab.h>
>> +#include <linux/irq.h>
>> +#include "dev.h"
> 
> More funky ordering of includes.

Will fix.

> 
>> +int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
>> +                     enum host1x_intr_action action, void *data,
>> +                     void *_waiter,
>> +                     void **ref)
> 
> Why do you pass in _waiter as void * and not struct host1x_waitlist *?

struct host1x_waitlist is defined inside intr.c, so I've chosen to pass
void *. I could naturally just forward declare host1x_waitlist in intr.h
and change the allocation and add_action to use that.

> 
> I think I've said this before. The interface doesn't seem optimal to me
> here. Passing in an enumeration to choose which action to perform looks
> difficult to work with (not to mention the symbols are rather long and
> therefore result in ugly code).
> 
> Maybe doing this by passing around a pointer to a handler function would
> be nicer. However since I haven't really used this yet, I can't really
> tell. So maybe we should just merge the implementation as-is for now. We
> can always clean it up later.

We're using the enum also to index into arrays. We do it so that we can
remove all the completed waiters from the wait_head, and insert them
into lists per action type. This way we can run all actions in priority
order: first action_submit_complete, then action_wakeup, and then
action_wakeup_interruptible.

Now, we're recently noticed that the priority order is actually wrong.
The first priority should be to wake up non-interruptible tasks, then
interruptible tasks. Cleaning up memory of completed submits should be
lower priority.

I've considered this part as something private to host1x driver and it's
not really meant to be called f.ex. from DRM. But, as you seem to have a
need to have an asynchronous wait for a fence, we'd need to figure
something out for that.

> 
>> +void *host1x_intr_alloc_waiter(void)
>> +{
>> +     return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
>> +}
> 
> I'm not sure why this is separate from host1x_syncpt_wait() since it is
> only used inside that function and the waiter returned never leaves the
> scope of that function, so it might be better to allocate it directly
> in host1x_syncpt_wait() instead.
> 
> Actually, it looks like the waiter doesn't ever leave scope, so you may
> even want to allocate it on the stack.

In patch 3, at submit time we first allocate waiter, then take
submit_lock, write submit to channel, and add the waiter while having
the lock. I did this so that I host1x_intr_add_action() can always
succeed. Otherwise I'd need to write another code path to handle the
case where we wrote a job to channel, but we're not able to add a
submit_complete action to it.

> 
>> +void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
> 
> Here again, you pass in the waiter via a void *. Why's that?

host1x_waitlist is hidden inside intr.c.

> 
>> +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
> 
> Maybe you should keep the type of the irq_sync here so that it properly
> propagates to the call to devm_request_irq().

I'm not sure what you mean. Do you mean that I should use unsigned int,
as that's the type used in devm_request_irq()?

> 
>> +{
>> +     unsigned int id;
>> +     struct host1x *host1x = intr_to_host1x(intr);
>> +     u32 nb_pts = host1x_syncpt_nb_pts(host1x);
>> +
>> +     intr->syncpt = devm_kzalloc(&host1x->dev->dev,
>> +                     sizeof(struct host1x_intr_syncpt) *
>> +                     host1x->info.nb_pts,
>> +                     GFP_KERNEL);
>> +
>> +     if (!host1x->intr.syncpt)
> 
> The above blank line isn't necessary.

Will remove.

> 
>> +void host1x_intr_stop(struct host1x_intr *intr)
>> +{
>> +     unsigned int id;
>> +     struct host1x *host1x = intr_to_host1x(intr);
>> +     struct host1x_intr_syncpt *syncpt;
>> +     u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
>> +
>> +     mutex_lock(&intr->mutex);
>> +
>> +     host1x->intr_op.disable_all_syncpt_intrs(intr);
> 
> I haven't commented on this everywhere, but I think this could benefit
> from a wrapper that forwards this to the intr_op. The same goes for the
> sync_op.

You mean something like "host1x_disable_all_syncpt_intrs"?

> 
>> +     for (id = 0, syncpt = intr->syncpt;
>> +          id < nb_pts;
>> +          ++id, ++syncpt) {
> 
> I don't think you need to explicitly keep track of syncpt within the for
> statement. Instead you could either index intr->syncpt directly or
> obtain a reference within the loop. It allows the for statement to be
> written much more canonically.

Yep, will do.

> 
>> diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
> [...]
>> +#define intr_syncpt_to_intr(is) (is->intr)
> 
> This one doesn't buy you anything. It actually uses up more characters
> so you can just drop it.

True, it's useless. I'll remove.

> 
>> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
> [...]
>> @@ -119,6 +122,166 @@ void host1x_syncpt_incr(struct host1x_syncpt *sp)
>>       host1x_syncpt_cpu_incr(sp);
>>  }
>>
>> +/*
>> + * Updated sync point form hardware, and returns true if syncpoint is expired,
>> + * false if we may need to wait
>> + */
>> +static bool syncpt_load_min_is_expired(
>> +     struct host1x_syncpt *sp,
>> +     u32 thresh)
> 
> This can all go on one line.

Ok.

> 
>> +/*
>> + * Main entrypoint for syncpoint value waits.
>> + */
>> +int host1x_syncpt_wait(struct host1x_syncpt *sp,
>> +                     u32 thresh, long timeout, u32 *value)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_wait);
> 
> This doesn't only seem to be the main entrypoint, but it's basically the
> only way to currently wait for syncpoints. One actual use-case where
> this might turn out to be a problem is video capturing. The problem is
> that using this API you can't very well asynchronously capture frames.
> So eventually I think we need a way to allow a generic handler to be
> attached to syncpoints so that you can have this handler continuously
> invoked after each frame is captured and just pass the buffer back to
> userspace.

Yep, so far all asynchronous waits have been done in user space. We
would probably allow attaching a handler to a syncpt value, so that we'd
call that handler once a value is reached. In effect, similar to a
wake_up event that is now added via host1x_intr_add_action, but simpler.
That'd mean that the handler needs to be re-added after each frame.

We could also add the handler as persistent if re-adding would be a
problem. That'd require some new wiring and I'll have to think how to
implement that.

> 
>> +bool host1x_syncpt_is_expired(
>> +     struct host1x_syncpt *sp,
>> +     u32 thresh)
> 
> This can go on one line.

Will join.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
  2013-02-04 11:03   ` Thierry Reding
@ 2013-02-05  4:41     ` Terje Bergström
  2013-02-05  9:15       ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-05  4:41 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 04.02.2013 03:03, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> [...]
>> +static pid_t host1x_debug_null_kickoff_pid;
>> +unsigned int host1x_debug_trace_cmdbuf;
>> +
>> +static pid_t host1x_debug_force_timeout_pid;
>> +static u32 host1x_debug_force_timeout_val;
>> +static u32 host1x_debug_force_timeout_channel;
> 
> Please group static and non-static variables.

Will do.

> 
>> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> [...]
>> +struct output {
>> +	void (*fn)(void *ctx, const char *str, size_t len);
>> +	void *ctx;
>> +	char buf[256];
>> +};
> 
> Do we really need this kind of abstraction? There really should be only
> one location where debug information is obtained, so I don't see a need
> for this.

This is used by debugfs code to direct to debugfs, and
nvhost_debug_dump() to send via printk.

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> [...]
>>  struct host1x_syncpt_ops {
>>  	void (*reset)(struct host1x_syncpt *);
>>  	void (*reset_wait_base)(struct host1x_syncpt *);
>> @@ -117,6 +133,7 @@ struct host1x {
>>  	struct host1x_channel_ops channel_op;
>>  	struct host1x_cdma_ops cdma_op;
>>  	struct host1x_pushbuffer_ops cdma_pb_op;
>> +	struct host1x_debug_ops debug_op;
>>  	struct host1x_syncpt_ops syncpt_op;
>>  	struct host1x_intr_ops intr_op;
> 
> Again, better to pass in a const pointer to the ops structure.

Ok.

> 
>> diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
> 
>> +static int show_channel_command(struct output *o, u32 addr, u32 val, int *count)
>> +{
>> +	unsigned mask;
>> +	unsigned subop;
>> +
>> +	switch (val >> 28) {
>> +	case 0x0:
> 
> These can easily be derived by looking at the debug output, but it may
> still make sense to assign symbolic names to them.

I have another suggestion. In downstream I removed the decoding part and
I just print out a string of hex. That removes quite a bit bunch of code
from kernel. It makes the debug output also more compact.

It's much easier to write a user space program to decode than maintain
it in kernel.

> 
>> +static void show_channel_word(struct output *o, int *state, int *count,
>> +		u32 addr, u32 val, struct host1x_cdma *cdma)
>> +{
>> +	static int start_count, dont_print;
> 
> What if two processes read debug information at the same time?

show_channels() acquires cdma.lock, so that shouldn't happen.

> 
>> +static void do_show_channel_gather(struct output *o,
>> +		phys_addr_t phys_addr,
>> +		u32 words, struct host1x_cdma *cdma,
>> +		phys_addr_t pin_addr, u32 *map_addr)
>> +{
>> +	/* Map dmaget cursor to corresponding mem handle */
>> +	u32 offset;
>> +	int state, count, i;
>> +
>> +	offset = phys_addr - pin_addr;
>> +	/*
>> +	 * Sometimes we're given different hardware address to the same
>> +	 * page - in these cases the offset will get an invalid number and
>> +	 * we just have to bail out.
>> +	 */
> 
> Why's that?

Because of a race - memory might've been unpinned and unmapped from
IOMMU and when we re-map (pin), we are given a new address.

But, I think this comment is a bit stale - we used to dump also old
gathers. The latest code only dumps jobs in sync queue, so the race
shouldn't happen.

> 
>> +	map_addr = host1x_memmgr_mmap(mem);
>> +	if (!map_addr) {
>> +		host1x_debug_output(o, "[could not mmap]\n");
>> +		return;
>> +	}
>> +
>> +	/* Get base address from mem */
>> +	sgt = host1x_memmgr_pin(mem);
>> +	if (IS_ERR(sgt)) {
>> +		host1x_debug_output(o, "[couldn't pin]\n");
>> +		host1x_memmgr_munmap(mem, map_addr);
>> +		return;
>> +	}
> 
> Maybe you should stick with one of "could not" or "couldn't". Makes it
> easier to search for.

I prefer "could not", so I'll use that.

> 
>> +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
>> +{
>> +	struct host1x_job *job;
>> +
>> +	list_for_each_entry(job, &cdma->sync_queue, list) {
>> +		int i;
>> +		host1x_debug_output(o,
>> +				"\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
>> +				" first_get=%08x, timeout=%d"
>> +				" num_slots=%d, num_handles=%d\n",
>> +				job,
>> +				job->syncpt_id,
>> +				job->syncpt_end,
>> +				job->first_get,
>> +				job->timeout,
>> +				job->num_slots,
>> +				job->num_unpins);
> 
> This could go on fewer lines.

Yes, will merge.

> 
>> +static void host1x_debug_show_channel_cdma(struct host1x *m,
>> +	struct host1x_channel *ch, struct output *o, int chid)
>> +{
> [...]
>> +	switch (cbstat) {
>> +	case 0x00010008:
> 
> Again, symbolic names would be nice.

I propose I remove the decoding from kernel, and save 200 lines.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x
  2013-02-04 11:08   ` Thierry Reding
@ 2013-02-05  4:45     ` Terje Bergström
  2013-02-05  9:26       ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-05  4:45 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 04.02.2013 03:08, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
>> index 697d49a..ffc8bf1 100644
>> --- a/drivers/gpu/host1x/Makefile
>> +++ b/drivers/gpu/host1x/Makefile
>> @@ -12,4 +12,10 @@ host1x-y = \
>>  	hw/host1x01.o
>>  
>>  host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
>> +
>> +ccflags-y += -Iinclude/drm
>> +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>> +
>> +host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o
>> +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
>>  obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
> 
> Can this be moved into a separate Makefile in the drm subdirectory?

I tried, and kernel build helpfully created two .ko files. As having
cyclic dependencies between two modules isn't nice, I merged them to
same module and that seemed to force merging Makefile.

If anybody has an idea on how to do it otherwise, I'd be happy to keep
the Makefiles separate.

> 
>> diff --git a/drivers/gpu/host1x/host1x_client.h b/drivers/gpu/host1x/host1x_client.h
> [...]
>> new file mode 100644
>> index 0000000..fdd2920
>> --- /dev/null
>> +++ b/drivers/gpu/host1x/host1x_client.h
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Copyright (c) 2013, NVIDIA Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HOST1X_CLIENT_H
>> +#define HOST1X_CLIENT_H
>> +
>> +struct platform_device;
>> +
>> +void host1x_set_drm_data(struct platform_device *pdev, void *data);
>> +void *host1x_get_drm_data(struct platform_device *pdev);
>> +
>> +#endif
> 
> These aren't defined or used yet.

Hmm, right, they would go to patch 7.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks
  2013-02-04 11:26   ` Thierry Reding
  2013-02-04 17:06     ` Stephen Warren
@ 2013-02-05  4:47     ` Terje Bergström
  1 sibling, 0 replies; 49+ messages in thread
From: Terje Bergström @ 2013-02-05  4:47 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 04.02.2013 03:26, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
>> Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
>> of 2D clock to that driver alias.
>>
>> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
>> ---
>>  arch/arm/mach-tegra/board-dt-tegra20.c    |    1 +
>>  arch/arm/mach-tegra/board-dt-tegra30.c    |    1 +
>>  arch/arm/mach-tegra/tegra20_clocks_data.c |    2 +-
>>  arch/arm/mach-tegra/tegra30_clocks_data.c |    1 +
>>  4 files changed, 4 insertions(+), 1 deletion(-)
> 
> With Prashant's clock rework patches now merged this patch can be
> dropped.

Yes, and I'll also need to start calling 2D clock with name NULL in gr2d.c.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-02-04 12:56   ` Thierry Reding
@ 2013-02-05  5:17     ` Terje Bergström
  2013-02-05  9:54       ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-05  5:17 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 04.02.2013 04:56, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
>> @@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)
>>
>>  static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>>  {
>> -     return 0;
>> +     struct host1x_drm_fpriv *fpriv;
>> +     int err = 0;
> 
> Can be dropped.
> 
>> +
>> +     fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
>> +     if (!fpriv)
>> +             return -ENOMEM;
>> +
>> +     INIT_LIST_HEAD(&fpriv->contexts);
>> +     filp->driver_priv = fpriv;
>> +
>> +     return err;
> 
> return 0;

Ok.

> 
>> +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
>> +{
>> +     struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp);
>> +     struct host1x_drm_context *context, *tmp;
>> +
>> +     list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
>> +             context->client->ops->close_channel(context);
>> +             kfree(context);
>> +     }
>> +     kfree(fpriv);
>>  }
> 
> Maybe you should add host1x_drm_context_free() to wrap the loop
> contents?

Makes sense. Will do.

> 
>> @@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm)
>>       drm_fbdev_cma_restore_mode(host1x->fbdev);
>>  }
>>
>> +static int
>> +tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
>> +                      struct drm_file *file_priv)
> 
> static int and function name on one line, please.

Ok, will re-split the lines.

> 
>> +{
>> +     struct host1x *host1x = drm->dev_private;
>> +     struct tegra_drm_syncpt_read_args *args = data;
>> +     struct host1x_syncpt *sp =
>> +             host1x_syncpt_get_bydev(host1x->dev, args->id);
> 
> I don't know if we need this, except maybe to work around the problem
> that we have two different structures named host1x. The _bydev() suffix
> is misleading because all you really do here is obtain the syncpt from
> the host1x.

Yeah, it's actually working around the host1x duplicate naming.
host1x_syncpt_get takes struct host1x as parameter, but that's different
host1x than in this code.

> 
>> +static int
>> +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
>> +                      struct drm_file *file_priv)
>> +{
>> +     struct tegra_drm_open_channel_args *args = data;
>> +     struct host1x_client *client;
>> +     struct host1x_drm_context *context;
>> +     struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>> +     struct host1x *host1x = drm->dev_private;
>> +     int err = 0;
> 
> err = -ENODEV; (see below)

Ok, makes sense.

> 
>> +
>> +     context = kzalloc(sizeof(*context), GFP_KERNEL);
>> +     if (!context)
>> +             return -ENOMEM;
>> +
>> +     list_for_each_entry(client, &host1x->clients, list) {
>> +             if (client->class == args->class) {
>> +                     context->client = client;
>> +                     err = client->ops->open_channel(client, context);
>> +                     if (err)
>> +                             goto out;
>> +
>> +                     list_add(&context->list, &fpriv->contexts);
>> +                     args->context = (uintptr_t)context;
> 
> Perhaps cast this to __u64 directly instead? There's little sense in
> taking the detour via uintptr_t.

I think compiler complained about a direct cast to __u64, but I'll try
again.

> 
>> +                     goto out;
> 
> return 0;
> 
>> +             }
>> +     }
>> +     err = -ENODEV;
>> +
>> +out:
>> +     if (err)
>> +             kfree(context);
>> +
>> +     return err;
>> +}
> 
> Then this simply becomes:
> 
>         kfree(context);
>         return err;

Sounds good.

> 
>> +static int
>> +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
>> +                      struct drm_file *file_priv)
>> +{
>> +     struct tegra_drm_open_channel_args *args = data;
>> +     struct host1x_drm_context *context, *tmp;
>> +     struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>> +     int err = 0;
>> +
>> +     list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
>> +             if ((uintptr_t)context == args->context) {
>> +                     context->client->ops->close_channel(context);
>> +                     list_del(&context->list);
>> +                     kfree(context);
>> +                     goto out;
>> +             }
>> +     }
>> +     err = -EINVAL;
>> +
>> +out:
>> +     return err;
>> +}
> 
> Same comments as for tegra_drm_ioctl_open_channel().

Ok, will apply.

> 
>> +static int
>> +tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
>> +                      struct drm_file *file_priv)
>> +{
>> +     struct tegra_drm_get_channel_param_args *args = data;
>> +     struct host1x_drm_context *context;
>> +     struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>> +     int err = 0;
>> +
>> +     list_for_each_entry(context, &fpriv->contexts, list) {
>> +             if ((uintptr_t)context == args->context) {
>> +                     args->value =
>> +                             context->client->ops->get_syncpoint(context,
>> +                                             args->param);
>> +                     goto out;
>> +             }
>> +     }
>> +     err = -ENODEV;
>> +
>> +out:
>> +     return err;
>> +}
> 
> Same comments as well. Also you may want to factor out the context
> lookup into a separate function so you don't have to repeat the same
> code over and over again.

Will do.

> 
> I wonder if we shouldn't remove .get_syncpoint() from the client ops and
> replace it by a simple array instead. The only use-case for this is if a
> client wants more than a single syncpoint, right? In that case just keep
> an array of syncpoints and the number of syncpoints per client.
> Otherwise each client will have to rewrite the same function.

That makes sense. Will do.

> Also, how useful is it to create a context? Looking at the gr2d
> implementation for .open_channel(), it will return the same channel to
> whichever userspace process requests them. Can you explain why it is
> necessary at all? From the name I would have expected some kind of
> context switching to take place when different applications submit
> requests to the same client, but that doesn't seem to be the case.

Hardware context switching will be a later submit, and it'll actually
create a new structure. Hardware context might live longer than the
process that created it, so they'll need to be separate.

We've used the context as a place for storing flags and the reference to
hardware context. It'd allow also opening channels to multiple devices,
and context would be used in submit to find out the target device. But
as hardware context switching is not implemented in this patch set, and
neither is support for anything but 2D, it's difficult to justify it.

Perhaps the justification is that this way we can keep the kernel API
stable even when we add support for hardware contexts and other clients.

> 
>> +static int
>> +tegra_drm_create_ioctl(struct drm_device *drm, void *data,
>> +                      struct drm_file *file_priv)
> 
> tegra_drm_gem_create_ioctl() please.

Sure.

> 
>>  static struct drm_ioctl_desc tegra_drm_ioctls[] = {
>> +     DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE,
>> +                     tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
> 
> TEGRA_DRM_GEM_CREATE

Will change.

> 
>>  static const struct file_operations tegra_drm_fops = {
>> @@ -303,6 +526,7 @@ struct drm_driver tegra_drm_driver = {
>>       .load = tegra_drm_load,
>>       .unload = tegra_drm_unload,
>>       .open = tegra_drm_open,
>> +     .preclose = tegra_drm_close,
> 
> I think it'd make sense to name the function tegra_drm_preclose() to
> match the name in struct drm_driver.

Yes, and I think you added preclose in your vblank patch set, so I'll
need to rebase.

> 
>> diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h
> [...]
>> +struct host1x_drm_fpriv {
>> +     struct list_head contexts;
>>  };
> 
> Maybe name this host1x_drm_file. fpriv isn't very specific.

host1x_drm_file sounds a bit odd, because it's not really a file, but a
private data pointer stored in driver_priv.

> 
>> +static inline struct host1x_drm_fpriv *
>> +host1x_drm_fpriv(struct drm_file *file_priv)
>> +{
>> +     return file_priv ? file_priv->driver_priv : NULL;
>> +}
> 
> I think it's fine to just directly do filp->driver_priv instead of going
> through this wrapper.

Ok.

> 
>>  struct host1x_client {
>>       struct host1x *host1x;
>>       struct device *dev;
>>
>>       const struct host1x_client_ops *ops;
>>
>> +     u32 class;
> 
> Should this perhaps be an enum?

That would make sense. I've kept it u32, because the type of class in
hardware is u32, but the two don't need to match.

> 
>> diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
> [...]
>> +static u32 gr2d_get_syncpoint(struct host1x_drm_context *context, int index)
>> +{
>> +     struct gr2d *gr2d = dev_get_drvdata(context->client->dev);
>> +     if (index != 0)
>> +             return UINT_MAX;
>> +
>> +     return host1x_syncpt_id(gr2d->syncpt);
>> +}
> 
> Maybe get_syncpoint() should return int and negative error codes on
> failure. That still leaves room for 2^31 possible syncpoints.

That'd be enough. Will do. :-)

> 
>> +static u32 handle_cma_to_host1x(struct drm_device *drm,
>> +                             struct drm_file *file_priv, u32 gem_handle)
>> +{
>> +     struct drm_gem_object *obj;
>> +     struct drm_gem_cma_object *cma_obj;
>> +     u32 host1x_handle;
>> +
>> +     obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
>> +     if (!obj)
>> +             return 0;
>> +
>> +     cma_obj = to_drm_gem_cma_obj(obj);
>> +     host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
>> +     drm_gem_object_unreference(obj);
>> +
>> +     return host1x_handle;
>> +}
> 
> I though we had settled in previous reviews on only having a single
> allocator and not do the conversion between various types?

I'll need to agree with Lucas on how to handle this. He intended to make
a patch to fix this, but he hasn't had time to do that.

But, I'd still like to keep the possibility open to add dma_buf as
memory handle type, and fit that into the same API, so there's still a
need to have the mem_mgr_type abstraction.

> 
>> +static int gr2d_submit(struct host1x_drm_context *context,
>> +             struct tegra_drm_submit_args *args,
>> +             struct drm_device *drm,
>> +             struct drm_file *file_priv)
>> +{
>> +     struct host1x_job *job;
>> +     int num_cmdbufs = args->num_cmdbufs;
>> +     int num_relocs = args->num_relocs;
>> +     int num_waitchks = args->num_waitchks;
>> +     struct tegra_drm_cmdbuf __user *cmdbufs =
>> +             (void * __user)(uintptr_t)args->cmdbufs;
>> +     struct tegra_drm_reloc __user *relocs =
>> +             (void * __user)(uintptr_t)args->relocs;
>> +     struct tegra_drm_waitchk __user *waitchks =
>> +             (void * __user)(uintptr_t)args->waitchks;
> 
> No need for all the uintptr_t casts.

Will try to remove - but I do remember getting compiler warnings without
them.

(...)
> Most of this looks very generic. Can't it be split out into separate
> functions and reused in other (gr3d) modules?

That's actually how most of this is downstream. I thought to make
everything really simple and make it all 2D specific in the first patch
set, and split into generic when we add support for another device.

> 
>> +static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg)
>> +{
>> +     int ret;
>> +
>> +     if (class == NV_HOST1X_CLASS_ID)
>> +             ret = reg == 0x2b;
>> +     else
>> +             switch (reg) {
>> +             case 0x1a:
>> +             case 0x1b:
>> +             case 0x26:
>> +             case 0x2b:
>> +             case 0x2c:
>> +             case 0x2d:
>> +             case 0x31:
>> +             case 0x32:
>> +             case 0x48:
>> +             case 0x49:
>> +             case 0x4a:
>> +             case 0x4b:
>> +             case 0x4c:
>> +                     ret = 1;
>> +                     break;
>> +             default:
>> +                     ret = 0;
>> +                     break;
>> +             }
>> +
>> +     return ret;
>> +}
> 
> I should probably bite the bullet and read through the (still) huge
> patch 3 to understand exactly why this is needed.

That's the security firewall. It walks through each submit, and ensures
that each register write that writes an address, goes through the host1x
reloc mechanism. This way user space cannot ask 2D to write to arbitrary
memory locations.

> 
>> +static struct of_device_id gr2d_match[] = {
> 
> static const please.

Ok.

> 
>> +static int __exit gr2d_remove(struct platform_device *dev)
>> +{
>> +     struct host1x *host1x =
>> +             host1x_get_drm_data(to_platform_device(dev->dev.parent));
>> +     struct gr2d *gr2d = platform_get_drvdata(dev);
>> +     int err;
>> +
>> +     err = host1x_unregister_client(host1x, &gr2d->client);
>> +     if (err < 0) {
>> +             dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
>> +                     err);
>> +             return err;
>> +     }
>> +
>> +     host1x_syncpt_free(gr2d->syncpt);
>> +     return 0;
>> +}
> 
> Isn't this missing a host1x_channel_put() or host1x_free_channel()?

All references should be handled in gr2d_open_channel() and
gr2d_close_channel(). I think we'd need to ensure all contexts are
closed at this point.

> 
>> diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
> [...]
>> +struct tegra_gem_create {
>> +     __u64 size;
>> +     unsigned int flags;
>> +     unsigned int handle;
>> +     unsigned int offset;
>> +};
> 
> I think it's better to consistently use the explicitly sized types here.
> 
>> +struct tegra_gem_invalidate {
>> +     unsigned int handle;
>> +};
>> +
>> +struct tegra_gem_flush {
>> +     unsigned int handle;
>> +};
> 
> Where are these used?

Arto, please go through these.

> 
>> +struct tegra_drm_syncpt_wait_args {
>> +     __u32 id;
>> +     __u32 thresh;
>> +     __s32 timeout;
>> +     __u32 value;
>> +};
>> +
>> +#define DRM_TEGRA_NO_TIMEOUT (-1)
> 
> Is this the only reason why timeout is signed? If so maybe a better
> choice would be __u32 and DRM_TEGRA_NO_TIMEOUT 0xffffffff.

I believe it is so. In fact we'd need to rename it to something like
INFINITE_TIMEOUT, because we also have a case of timeout=0, which
returns immediately, i.e. doesn't have a timeout either.

> 
>> +struct tegra_drm_get_channel_param_args {
>> +     __u64 context;
>> +     __u32 param;
>> +     __u32 value;
>> +};
> 
> What's the reason for not calling this tegra_drm_get_syncpoint?

I wanted to use the same struct for other parameters, too: wait bases,
mutexes. But it doesn't really optimize anything, so I can make them
each specific structs.

> 
>> +struct tegra_drm_syncpt_incr {
>> +     __u32 syncpt_id;
>> +     __u32 syncpt_incrs;
>> +};
> 
> Maybe the fields would be better named id and incrs. Though I also
> notice that incrs is never used. I guess that's supposed to be used in
> the future to allow increments by more than a single value. If so,
> perhaps value would be a better name.

It's actually used in the dreaded patch 3, as part of tegra_drm_submit_args.

> Now on to the dreaded patch 3...

Enjoy. :-)

Terje


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver
  2013-02-05  3:30     ` Terje Bergström
@ 2013-02-05  7:43       ` Thierry Reding
  2013-02-06 20:13         ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-05  7:43 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5861 bytes --]

On Mon, Feb 04, 2013 at 07:30:08PM -0800, Terje Bergström wrote:
> On 04.02.2013 01:09, Thierry Reding wrote:
> > On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
> >> Add host1x, the driver for host1x and its client unit 2D.
> > 
> > Maybe this could be a bit more verbose. Perhaps describe what host1x is.
> 
> Sure. I could just steal the paragraph from Stephen:
> 
> The Tegra host1x module is the DMA engine for register access to Tegra's
> graphics- and multimedia-related modules. The modules served by host1x
> are referred to as clients. host1x includes some other  functionality,
> such as synchronization.

Yes, that sound good.

> >> +     err = host1x_syncpt_init(host);
> >> +     if (err)
> >> +             return err;
> > [...]
> >> +     host1x_syncpt_reset(host);
> > 
> > Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why
> > it might be useful to have host1x_syncpt_reset() as a separate function
> > but couldn't it be called as part of host1x_syncpt_init()?
> 
> host1x_syncpt_init() is used for initializing the syncpt structures, and
> is called in probe. host1x_syncpt_reset() should be called whenever we
> think hardware state is lost, for example if VDD_CORE was rail gated due
> to system suspend.

My point was that you could include the call to host1x_syncpt_reset()
within host1x_syncpt_init(). That will keep unneeded code out of the
host1x_probe() function. Also you don't want to use the syncpoints
uninitialized, right?

> >> +#include "hw/syncpt_hw.c"
> > 
> > Why include the source file here? Can't you compile it separately
> > instead?
> 
> It's because we need to compile with the hardware headers of that host1x
> version, because we haven't been good at keeping compatibility. So
> host1x01.c #includes version 01 headers, and syncpt_hw.c in this
> compilation unit gets compiled with that. 02 would include 02 headers,
> and syncpt_hw.c would get compiled with its register definitions etc.

Okay, fair enough.

> >> + */
> >> +static u32 syncpt_load_min(struct host1x_syncpt *sp)
> >> +{
> >> +     struct host1x *dev = sp->dev;
> >> +     u32 old, live;
> >> +
> >> +     do {
> >> +             old = host1x_syncpt_read_min(sp);
> >> +             live = host1x_sync_readl(dev,
> >> +                             HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
> >> +     } while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
> > 
> > I think this warrants a comment.
> 
> Sure. It just loops in case there's a race writing to min_val.

Oh, I see. That'd make a good comment. Is the cast to (u32) really
necessary?

> >> +/*
> >> + * Resets syncpoint and waitbase values to sw shadows
> >> + */
> >> +void host1x_syncpt_reset(struct host1x *dev)
> > 
> > Maybe host1x_syncpt_flush() would be a better name given the above
> > description? Reset does have this hardware reset connotation so my first
> > intuition had been that this would reset the syncpt value to 0.
> 
> Right, it actually reloads values stored in shadow registers back to
> host1x. Flush doesn't feel like it's conveying the meaning. Would
> host1x_syncpt_restore() work? That'd match with host1x_syncpt_save(),
> which just updates all shadow registers from hardware and is used just
> before host1x loses power.

Save/restore has the disadvantage of the direction not being implicit.
Save could mean save to hardware or save to software. The same is true
for restore. However if the direction is clearly defined, save and
restore work for me.

Maybe the comment could be changed to be more explicit. Something like:

	/*
	 * Write cached syncpoint and waitbase values to hardware.
	 */

And for host1x_syncpt_save():

	/*
	 * For client-managed registers, update the cached syncpoint and
	 * waitbase values by reading from the registers.
	 */

> >> +/*
> >> + * Updates the last value read from hardware.
> >> + */
> >> +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
> >> +{
> >> +     u32 val;
> >> +     val = sp->dev->syncpt_op.load_min(sp);
> >> +     trace_host1x_syncpt_load_min(sp->id, val);
> >> +
> >> +     return val;
> >> +}
> > 
> > I don't know I understand what this means exactly. Does it read the
> > value that hardware last incremented? Perhaps this will become clearer
> > when you add a comment to the syncpt_load_min() implementation.
> 
> It just loads the current syncpt value to shadow register. The shadow
> register is called min, because host1x tracks the range of sync point
> increments that hardware is still going to do, so min is the lower
> boundary of the range.
> 
> max tells what the sync point is expected to reach for hardware to be
> considered idle.
> 
> host1x will f.ex. nop out waits for sync point values outside the range,
> because hardware isn't good at handling syncpt value wrapping.

Maybe the function should be called host1x_syncpt_load() if there is no
equivalent way to load the maximum value (since there is no register to
read from).

> > Also the syncpoint is not actually allocated here, so maybe
> > host1x_syncpt_request() would be a better name. As a nice side-effect it
> > makes the naming more similar to the IRQ API and might be easier to work
> > with.
> 
> I'm not entirely sure about the difference, but isn't the number to be
> allocated usually passed to a function ending in _request? Allocate
> would just allocate the next available - as host1x_syncpt_allocate does.

That's certainly true for interrupts. However, if you look at the DMA
subsystem for example, you can also request an unnamed resource.

The difference is sufficiently subtle that host1x_syncpt_allocate()
would work for me too, though. I just have a slight preference for
host1x_syncpt_request().

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts
  2013-02-05  4:29     ` Terje Bergström
@ 2013-02-05  8:42       ` Thierry Reding
  2013-02-06 20:29         ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-05  8:42 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 12187 bytes --]

On Mon, Feb 04, 2013 at 08:29:08PM -0800, Terje Bergström wrote:
> On 04.02.2013 02:30, Thierry Reding wrote:
> >> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> >> index d8f5979..8376092 100644
> >> --- a/drivers/gpu/host1x/dev.h
> >> +++ b/drivers/gpu/host1x/dev.h
> >> @@ -17,11 +17,12 @@
> >>  #ifndef HOST1X_DEV_H
> >>  #define HOST1X_DEV_H
> >>
> >> +#include <linux/platform_device.h>
> >>  #include "syncpt.h"
> >> +#include "intr.h"
> >>
> >>  struct host1x;
> >>  struct host1x_syncpt;
> >> -struct platform_device;
> > 
> > Why include platform_device.h here?
> 
> host1x_get_host() actually needs that, so this #include should've also
> been in previous patch.

No need to if you pass struct device * instead. You might need
linux/device.h instead, though.

> >> +     void (*set_syncpt_threshold)(
> >> +             struct host1x_intr *, u32 id, u32 thresh);
> >> +     void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
> >> +     void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
> >> +     void (*disable_all_syncpt_intrs)(struct host1x_intr *);
> > 
> > Can disable_all_syncpt_intrs() not be implemented generically using the
> > number of syncpoints as exposed by host1x_device_info and the
> > .disable_syncpt_intr() function?
> 
> disable_all_syncpt_intrs() disables all interrupts in one write (or one
> per 32 sync points), so it's more efficient.

Yes, I noticed that and failed to remove this comment.

> >> +{
> >> +     struct host1x_intr_syncpt *sp =
> >> +             container_of(work, struct host1x_intr_syncpt, work);
> >> +     host1x_syncpt_thresh_fn(sp);
> > 
> > Couldn't we inline the host1x_syncpt_thresh_fn() implementation here?
> > Why do we need to go through an external function declaration?
> 
> If I move syncpt_thresh_work() to intr.c from intr_hw.c, I could do
> that. That'd simplify the interrupt path.

I like simplification. =)

> >> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
> >> +{
> >> +     struct host1x *host1x = dev_id;
> >> +     struct host1x_intr *intr = &host1x->intr;
> >> +     unsigned long reg;
> >> +     int i, id;
> >> +
> >> +     for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
> >> +             reg = host1x_sync_readl(host1x,
> >> +                             HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
> >> +                             i * REGISTER_STRIDE);
> >> +             for_each_set_bit(id, &reg, BITS_PER_LONG) {
> >> +                     struct host1x_intr_syncpt *sp =
> >> +                             intr->syncpt + (i * BITS_PER_LONG + id);
> >> +                     host1x_intr_syncpt_thresh_isr(sp);
> > 
> > Have you considered mimicking the IRQ API and name this something like
> > host1x_intr_syncpt_thresh_handle() and name the actual ISR just
> > syncpt_thresh_isr()? Not so important but it makes things a bit clearer
> > in my opinion.
> 
> This gets a bit confusing, because we have an ISR that calls a function
> that is also called ISR. I've kept "isr" in names of both to emphasize
> that this is running in interrupt context. I'm open to renaming these to
> make it clearer.
> 
> Did you refer to chained IRQ handler in linux/irq.h when you mentioned
> IRQ API as reference for naming?

What I had in mind was more along the lines of kernel/irq/chip.c, which
has a bunch of handlers for various types of interrupts, such as
handle_nested_irq() or handle_simple_irq().

Hence my proposal to rename host1x_intr_syncpt_thresh_isr() to
host1x_intr_syncpt_handle() because it handles the interrupt from a
single syncpoint and syncpt_thresh_cascade_isr() to syncpt_thresh_isr()
to keep it shorter.

Another variant would be host1x_syncpt_irq() for the top-level handler
and something host1x_handle_syncpt() to handle individual syncpoints. I
like this one best, but this is pure bike-shedding and there's nothing
technically wrong with the names you chose, so I can't really object if
you want to stick to them.

> >> +                     queue_work(intr->wq, &sp->work);
> > 
> > Should the call to queue_work() perhaps be moved into
> > host1x_intr_syncpt_thresh_isr().
> 
> I'm not sure, either way would be ok to me. The current structure allows
> host1x_intr_syncpt_thresh_isr() to only take one parameter
> (host1x_intr_syncpt). If we move queue_work, we'd also need to pass
> host1x_intr.

I think I'd still prefer to have all the code in one function because it
make subsequent modification easier and less error-prone.

> >> +static void host1x_intr_init_host_sync(struct host1x_intr *intr)
> >> +{
> >> +     struct host1x *host1x = intr_to_host1x(intr);
> >> +     int i, err;
> >> +
> >> +     host1x_sync_writel(host1x, 0xffffffffUL,
> >> +             HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE);
> >> +     host1x_sync_writel(host1x, 0xffffffffUL,
> >> +             HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS);
> >> +
> >> +     for (i = 0; i < host1x->info.nb_pts; i++)
> >> +             INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
> >> +
> >> +     err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
> >> +                             syncpt_thresh_cascade_isr,
> >> +                             IRQF_SHARED, "host1x_syncpt", host1x);
> >> +     WARN_ON(IS_ERR_VALUE(err));
> > 
> > Do we really want to continue in this case?
> 
> Hmm, we'd need to actually return an error code. There's not much the
> driver can do without syncpt interrupts.

Yeah, in that case I think we should bail out. It's not like we're
expecting any failures. If the interrupt cannot be requested, something
must seriously be wrong and we should tell users about it so that it can
be fixed. Trying to continue on a best effort basis isn't useful here, I
think.

> >> +int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
> >> +                     enum host1x_intr_action action, void *data,
> >> +                     void *_waiter,
> >> +                     void **ref)
> > 
> > Why do you pass in _waiter as void * and not struct host1x_waitlist *?
> 
> struct host1x_waitlist is defined inside intr.c, so I've chosen to pass
> void *. I could naturally just forward declare host1x_waitlist in intr.h
> and change the allocation and add_action to use that.

Yes, that's definitely better.

> > I think I've said this before. The interface doesn't seem optimal to me
> > here. Passing in an enumeration to choose which action to perform looks
> > difficult to work with (not to mention the symbols are rather long and
> > therefore result in ugly code).
> > 
> > Maybe doing this by passing around a pointer to a handler function would
> > be nicer. However since I haven't really used this yet, I can't really
> > tell. So maybe we should just merge the implementation as-is for now. We
> > can always clean it up later.
> 
> We're using the enum also to index into arrays. We do it so that we can
> remove all the completed waiters from the wait_head, and insert them
> into lists per action type. This way we can run all actions in priority
> order: first action_submit_complete, then action_wakeup, and then
> action_wakeup_interruptible.
> 
> Now, we're recently noticed that the priority order is actually wrong.
> The first priority should be to wake up non-interruptible tasks, then
> interruptible tasks. Cleaning up memory of completed submits should be
> lower priority.
> 
> I've considered this part as something private to host1x driver and it's
> not really meant to be called f.ex. from DRM. But, as you seem to have a
> need to have an asynchronous wait for a fence, we'd need to figure
> something out for that.

Okay, let's keep it as-is for now and see how it can be improved later
when we have an actual use-case for using it externally.

> >> +void *host1x_intr_alloc_waiter(void)
> >> +{
> >> +     return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
> >> +}
> > 
> > I'm not sure why this is separate from host1x_syncpt_wait() since it is
> > only used inside that function and the waiter returned never leaves the
> > scope of that function, so it might be better to allocate it directly
> > in host1x_syncpt_wait() instead.
> > 
> > Actually, it looks like the waiter doesn't ever leave scope, so you may
> > even want to allocate it on the stack.
> 
> In patch 3, at submit time we first allocate waiter, then take
> submit_lock, write submit to channel, and add the waiter while having
> the lock. I did this so that I host1x_intr_add_action() can always
> succeed. Otherwise I'd need to write another code path to handle the
> case where we wrote a job to channel, but we're not able to add a
> submit_complete action to it.

Okay. In that case why not allocate it on the stack in the first place
so you don't have to bother with allocations (and potential failure) at
all? The variable doesn't leave the function scope, so there shouldn't
be any issues, right?

Or if that doesn't work it would still be preferable to allocate memory
in host1x_syncpt_wait() directly instead of going through the wrapper.

> >> +void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
> > 
> > Here again, you pass in the waiter via a void *. Why's that?
> 
> host1x_waitlist is hidden inside intr.c.

I don't think that's necessary here. I'd rather have the compiler check
for types rather than hide the structure.

> >> +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
> > 
> > Maybe you should keep the type of the irq_sync here so that it properly
> > propagates to the call to devm_request_irq().
> 
> I'm not sure what you mean. Do you mean that I should use unsigned int,
> as that's the type used in devm_request_irq()?

Yes.

> >> +void host1x_intr_stop(struct host1x_intr *intr)
> >> +{
> >> +     unsigned int id;
> >> +     struct host1x *host1x = intr_to_host1x(intr);
> >> +     struct host1x_intr_syncpt *syncpt;
> >> +     u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
> >> +
> >> +     mutex_lock(&intr->mutex);
> >> +
> >> +     host1x->intr_op.disable_all_syncpt_intrs(intr);
> > 
> > I haven't commented on this everywhere, but I think this could benefit
> > from a wrapper that forwards this to the intr_op. The same goes for the
> > sync_op.
> 
> You mean something like "host1x_disable_all_syncpt_intrs"?

Yes. I think that'd be useful for each of the op functions. Perhaps you
could even pass in a struct host1x * to make calls more uniform.

> >> +/*
> >> + * Main entrypoint for syncpoint value waits.
> >> + */
> >> +int host1x_syncpt_wait(struct host1x_syncpt *sp,
> >> +                     u32 thresh, long timeout, u32 *value)
> >> +{
> > [...]
> >> +}
> >> +EXPORT_SYMBOL(host1x_syncpt_wait);
> > 
> > This doesn't only seem to be the main entrypoint, but it's basically the
> > only way to currently wait for syncpoints. One actual use-case where
> > this might turn out to be a problem is video capturing. The problem is
> > that using this API you can't very well asynchronously capture frames.
> > So eventually I think we need a way to allow a generic handler to be
> > attached to syncpoints so that you can have this handler continuously
> > invoked after each frame is captured and just pass the buffer back to
> > userspace.
> 
> Yep, so far all asynchronous waits have been done in user space. We
> would probably allow attaching a handler to a syncpt value, so that we'd
> call that handler once a value is reached. In effect, similar to a
> wake_up event that is now added via host1x_intr_add_action, but simpler.
> That'd mean that the handler needs to be re-added after each frame.
> 
> We could also add the handler as persistent if re-adding would be a
> problem. That'd require some new wiring and I'll have to think how to
> implement that.

Yes, that sounds like what I had in mind. Again, no need to worry about
it now. We can cross that bridge when we come to it.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
  2013-02-05  4:41     ` Terje Bergström
@ 2013-02-05  9:15       ` Thierry Reding
  2013-02-06 20:58         ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-05  9:15 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5432 bytes --]

On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
> On 04.02.2013 03:03, Thierry Reding wrote:
> > On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
> >> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> > [...]
> >> +struct output {
> >> +	void (*fn)(void *ctx, const char *str, size_t len);
> >> +	void *ctx;
> >> +	char buf[256];
> >> +};
> > 
> > Do we really need this kind of abstraction? There really should be only
> > one location where debug information is obtained, so I don't see a need
> > for this.
> 
> This is used by debugfs code to direct to debugfs, and
> nvhost_debug_dump() to send via printk.

Yes, that was precisely my point. Why bother providing the same data via
several output methods. debugfs is good for showing large amounts of
data such as register dumps or a tabular representation of syncpoints
for instance.

If, however, you want to interactively show debug information using
printk the same format isn't very useful and something more reduced is
often better.

> >> diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
> > 
> >> +static int show_channel_command(struct output *o, u32 addr, u32 val, int *count)
> >> +{
> >> +	unsigned mask;
> >> +	unsigned subop;
> >> +
> >> +	switch (val >> 28) {
> >> +	case 0x0:
> > 
> > These can easily be derived by looking at the debug output, but it may
> > still make sense to assign symbolic names to them.
> 
> I have another suggestion. In downstream I removed the decoding part and
> I just print out a string of hex. That removes quite a bit bunch of code
> from kernel. It makes the debug output also more compact.
> 
> It's much easier to write a user space program to decode than maintain
> it in kernel.

I don't know. I think if you use in-kernel debugging facilities such as
debugfs or printk, then the output should be self-explanatory. However I
do see the usefulness of having a binary dump that can be decoded in
userspace. But I think if we want to go that way we should make that a
separate interface. USB provides something like that, which can then be
fed to libpcap or wireshark to capture and analyze USB traffic. If done
properly you get replay functionality for free. I don't know what infra-
structure exists to help with implementing something similar.

So I think having debugfs output some data about syncpoints or the state
of channels might be useful to quickly diagnose a certain set of
problems but for anything more involved maybe a complete binary dump may
be better.

I'm not sure whether doing this would be acceptable though. Maybe Dave
or somebody else on the lists can answer that. An alternative way to
achieve the same would be to hook ioctl() from userspace and not do any
of it in kernel space.

> >> +static void show_channel_word(struct output *o, int *state, int *count,
> >> +		u32 addr, u32 val, struct host1x_cdma *cdma)
> >> +{
> >> +	static int start_count, dont_print;
> > 
> > What if two processes read debug information at the same time?
> 
> show_channels() acquires cdma.lock, so that shouldn't happen.

Okay. Another solution would be to pass around a debug context which
keeps track of the variables. But if we opt for a more involved dump
interface as discussed above this will no longer be relevant.

> >> +static void do_show_channel_gather(struct output *o,
> >> +		phys_addr_t phys_addr,
> >> +		u32 words, struct host1x_cdma *cdma,
> >> +		phys_addr_t pin_addr, u32 *map_addr)
> >> +{
> >> +	/* Map dmaget cursor to corresponding mem handle */
> >> +	u32 offset;
> >> +	int state, count, i;
> >> +
> >> +	offset = phys_addr - pin_addr;
> >> +	/*
> >> +	 * Sometimes we're given different hardware address to the same
> >> +	 * page - in these cases the offset will get an invalid number and
> >> +	 * we just have to bail out.
> >> +	 */
> > 
> > Why's that?
> 
> Because of a race - memory might've been unpinned and unmapped from
> IOMMU and when we re-map (pin), we are given a new address.
> 
> But, I think this comment is a bit stale - we used to dump also old
> gathers. The latest code only dumps jobs in sync queue, so the race
> shouldn't happen.

Okay. In the context of a channel dump interface this may not be
relevant anymore. Can you think of any issue that wouldn't be detectable
or debuggable by analyzing a binary dump of the data within a channel?
I'm asking because at that point we wouldn't be able to access any of
the in-kernel data structures but would have to rely on the data itself
for diagnostics. IOMMU virtual addresses won't be available and so on.

> >> +static void host1x_debug_show_channel_cdma(struct host1x *m,
> >> +	struct host1x_channel *ch, struct output *o, int chid)
> >> +{
> > [...]
> >> +	switch (cbstat) {
> >> +	case 0x00010008:
> > 
> > Again, symbolic names would be nice.
> 
> I propose I remove the decoding from kernel, and save 200 lines.

I think it could be more than 200 lines. If all we provide in the kernel
is some statistics about syncpoint usage or channel state that should be
a lot less code than we have now.

However that would make it necessary to provide userspace tools that can
provide the same quality of diagnostics, so I'm not sure if it's doable
without access to the in-kernel data structures.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x
  2013-02-05  4:45     ` Terje Bergström
@ 2013-02-05  9:26       ` Thierry Reding
  0 siblings, 0 replies; 49+ messages in thread
From: Thierry Reding @ 2013-02-05  9:26 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1272 bytes --]

On Mon, Feb 04, 2013 at 08:45:36PM -0800, Terje Bergström wrote:
> On 04.02.2013 03:08, Thierry Reding wrote:
> > * PGP Signed by an unknown key
> > 
> > On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote:
> > [...]
> >> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
> >> index 697d49a..ffc8bf1 100644
> >> --- a/drivers/gpu/host1x/Makefile
> >> +++ b/drivers/gpu/host1x/Makefile
> >> @@ -12,4 +12,10 @@ host1x-y = \
> >>  	hw/host1x01.o
> >>  
> >>  host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
> >> +
> >> +ccflags-y += -Iinclude/drm
> >> +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
> >> +
> >> +host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o
> >> +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
> >>  obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
> > 
> > Can this be moved into a separate Makefile in the drm subdirectory?
> 
> I tried, and kernel build helpfully created two .ko files. As having
> cyclic dependencies between two modules isn't nice, I merged them to
> same module and that seemed to force merging Makefile.
> 
> If anybody has an idea on how to do it otherwise, I'd be happy to keep
> the Makefiles separate.

Okay, I'll take a look.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-02-05  5:17     ` Terje Bergström
@ 2013-02-05  9:54       ` Thierry Reding
  2013-02-06 21:23         ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-05  9:54 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 9716 bytes --]

On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
> On 04.02.2013 04:56, Thierry Reding wrote:
> > On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote:
> >> +{
> >> +     struct host1x *host1x = drm->dev_private;
> >> +     struct tegra_drm_syncpt_read_args *args = data;
> >> +     struct host1x_syncpt *sp =
> >> +             host1x_syncpt_get_bydev(host1x->dev, args->id);
> > 
> > I don't know if we need this, except maybe to work around the problem
> > that we have two different structures named host1x. The _bydev() suffix
> > is misleading because all you really do here is obtain the syncpt from
> > the host1x.
> 
> Yeah, it's actually working around the host1x duplicate naming.
> host1x_syncpt_get takes struct host1x as parameter, but that's different
> host1x than in this code.

So maybe a better way would be to rename the DRM host1x after all. If it
avoids the need for workarounds such as this I think it justifies the
additional churn.

> > Also, how useful is it to create a context? Looking at the gr2d
> > implementation for .open_channel(), it will return the same channel to
> > whichever userspace process requests them. Can you explain why it is
> > necessary at all? From the name I would have expected some kind of
> > context switching to take place when different applications submit
> > requests to the same client, but that doesn't seem to be the case.
> 
> Hardware context switching will be a later submit, and it'll actually
> create a new structure. Hardware context might live longer than the
> process that created it, so they'll need to be separate.

Why would it live longer than the process? Isn't the whole purpose of
the context to keep per-process state? What use is that state if the
process dies?

> We've used the context as a place for storing flags and the reference to
> hardware context. It'd allow also opening channels to multiple devices,
> and context would be used in submit to find out the target device. But
> as hardware context switching is not implemented in this patch set, and
> neither is support for anything but 2D, it's difficult to justify it.
> 
> Perhaps the justification is that this way we can keep the kernel API
> stable even when we add support for hardware contexts and other clients.

We don't need a stable kernel API. But I guess it is fine to keep it if
for no other reason to fill the context returned in the ioctl() with
meaningful data.

> >> diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h
> > [...]
> >> +struct host1x_drm_fpriv {
> >> +     struct list_head contexts;
> >>  };
> > 
> > Maybe name this host1x_drm_file. fpriv isn't very specific.
> 
> host1x_drm_file sounds a bit odd, because it's not really a file, but a
> private data pointer stored in driver_priv.

The same is true for struct drm_file, which is stored in struct file's
.private_data field. I find it to be very intuitive if the inheritance
is reflected in the structure name. struct host1x_drm_file is host1x'
driver-specific part of struct drm_file.

> >> +static u32 handle_cma_to_host1x(struct drm_device *drm,
> >> +                             struct drm_file *file_priv, u32 gem_handle)
> >> +{
> >> +     struct drm_gem_object *obj;
> >> +     struct drm_gem_cma_object *cma_obj;
> >> +     u32 host1x_handle;
> >> +
> >> +     obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
> >> +     if (!obj)
> >> +             return 0;
> >> +
> >> +     cma_obj = to_drm_gem_cma_obj(obj);
> >> +     host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
> >> +     drm_gem_object_unreference(obj);
> >> +
> >> +     return host1x_handle;
> >> +}
> > 
> > I though we had settled in previous reviews on only having a single
> > allocator and not do the conversion between various types?
> 
> I'll need to agree with Lucas on how to handle this. He intended to make
> a patch to fix this, but he hasn't had time to do that.
> 
> But, I'd still like to keep the possibility open to add dma_buf as
> memory handle type, and fit that into the same API, so there's still a
> need to have the mem_mgr_type abstraction.

I fail to see how dma_buf would require a separate mem_mgr_type. Can we
perhaps postpone this to a later point and just go with CMA as the only
alternative for now until we have an actual working implementation that
we can use this for?

> >> +static int gr2d_submit(struct host1x_drm_context *context,
> >> +             struct tegra_drm_submit_args *args,
> >> +             struct drm_device *drm,
> >> +             struct drm_file *file_priv)
> >> +{
> >> +     struct host1x_job *job;
> >> +     int num_cmdbufs = args->num_cmdbufs;
> >> +     int num_relocs = args->num_relocs;
> >> +     int num_waitchks = args->num_waitchks;
> >> +     struct tegra_drm_cmdbuf __user *cmdbufs =
> >> +             (void * __user)(uintptr_t)args->cmdbufs;
> >> +     struct tegra_drm_reloc __user *relocs =
> >> +             (void * __user)(uintptr_t)args->relocs;
> >> +     struct tegra_drm_waitchk __user *waitchks =
> >> +             (void * __user)(uintptr_t)args->waitchks;
> > 
> > No need for all the uintptr_t casts.
> 
> Will try to remove - but I do remember getting compiler warnings without
> them.

I think you shouldn't even have to cast to void * first. Just cast to
the target type directly. I don't see why the compiler should complain.

> (...)
> > Most of this looks very generic. Can't it be split out into separate
> > functions and reused in other (gr3d) modules?
> 
> That's actually how most of this is downstream. I thought to make
> everything really simple and make it all 2D specific in the first patch
> set, and split into generic when we add support for another device.

Okay, that's fine then.

> >> +static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg)
> >> +{
> >> +     int ret;
> >> +
> >> +     if (class == NV_HOST1X_CLASS_ID)
> >> +             ret = reg == 0x2b;
> >> +     else
> >> +             switch (reg) {
> >> +             case 0x1a:
> >> +             case 0x1b:
> >> +             case 0x26:
> >> +             case 0x2b:
> >> +             case 0x2c:
> >> +             case 0x2d:
> >> +             case 0x31:
> >> +             case 0x32:
> >> +             case 0x48:
> >> +             case 0x49:
> >> +             case 0x4a:
> >> +             case 0x4b:
> >> +             case 0x4c:
> >> +                     ret = 1;
> >> +                     break;
> >> +             default:
> >> +                     ret = 0;
> >> +                     break;
> >> +             }
> >> +
> >> +     return ret;
> >> +}
> > 
> > I should probably bite the bullet and read through the (still) huge
> > patch 3 to understand exactly why this is needed.
> 
> That's the security firewall. It walks through each submit, and ensures
> that each register write that writes an address, goes through the host1x
> reloc mechanism. This way user space cannot ask 2D to write to arbitrary
> memory locations.

I see. Can this be made more generic? Perhaps adding a table of valid
registers to the device and use a generic function to iterate over that
instead of having to provide the same function for each client.

> >> +static int __exit gr2d_remove(struct platform_device *dev)
> >> +{
> >> +     struct host1x *host1x =
> >> +             host1x_get_drm_data(to_platform_device(dev->dev.parent));
> >> +     struct gr2d *gr2d = platform_get_drvdata(dev);
> >> +     int err;
> >> +
> >> +     err = host1x_unregister_client(host1x, &gr2d->client);
> >> +     if (err < 0) {
> >> +             dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
> >> +                     err);
> >> +             return err;
> >> +     }
> >> +
> >> +     host1x_syncpt_free(gr2d->syncpt);
> >> +     return 0;
> >> +}
> > 
> > Isn't this missing a host1x_channel_put() or host1x_free_channel()?
> 
> All references should be handled in gr2d_open_channel() and
> gr2d_close_channel(). I think we'd need to ensure all contexts are
> closed at this point.

Yes, that'd work as well. Actually I would assume that all contexts
associated with a given file should be freed when the file is closed.
That way all of this should work pretty much automatically.

> >> +struct tegra_drm_syncpt_wait_args {
> >> +     __u32 id;
> >> +     __u32 thresh;
> >> +     __s32 timeout;
> >> +     __u32 value;
> >> +};
> >> +
> >> +#define DRM_TEGRA_NO_TIMEOUT (-1)
> > 
> > Is this the only reason why timeout is signed? If so maybe a better
> > choice would be __u32 and DRM_TEGRA_NO_TIMEOUT 0xffffffff.
> 
> I believe it is so. In fact we'd need to rename it to something like
> INFINITE_TIMEOUT, because we also have a case of timeout=0, which
> returns immediately, i.e. doesn't have a timeout either.

For timeout == 0 I don't think we need a symbolic name. It is pretty
common for 0 to mean no timeout. But yes, DRM_TEGRA_INFINITE_TIMEOUT
should be okay.

> >> +struct tegra_drm_syncpt_incr {
> >> +     __u32 syncpt_id;
> >> +     __u32 syncpt_incrs;
> >> +};
> > 
> > Maybe the fields would be better named id and incrs. Though I also
> > notice that incrs is never used. I guess that's supposed to be used in
> > the future to allow increments by more than a single value. If so,
> > perhaps value would be a better name.
> 
> It's actually used in the dreaded patch 3, as part of tegra_drm_submit_args.

Okay. The superfluous syncpt_ prefixes should still go away.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver
  2013-02-05  7:43       ` Thierry Reding
@ 2013-02-06 20:13         ` Terje Bergström
  0 siblings, 0 replies; 49+ messages in thread
From: Terje Bergström @ 2013-02-06 20:13 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 04.02.2013 23:43, Thierry Reding wrote:
> My point was that you could include the call to host1x_syncpt_reset()
> within host1x_syncpt_init(). That will keep unneeded code out of the
> host1x_probe() function. Also you don't want to use the syncpoints
> uninitialized, right?

Of course, sorry, I misunderstood. That makes a lot of sense.

>>>> + */
>>>> +static u32 syncpt_load_min(struct host1x_syncpt *sp)
>>>> +{
>>>> +     struct host1x *dev = sp->dev;
>>>> +     u32 old, live;
>>>> +
>>>> +     do {
>>>> +             old = host1x_syncpt_read_min(sp);
>>>> +             live = host1x_sync_readl(dev,
>>>> +                             HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
>>>> +     } while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
>>>
>>> I think this warrants a comment.
>>
>> Sure. It just loops in case there's a race writing to min_val.
> 
> Oh, I see. That'd make a good comment. Is the cast to (u32) really
> necessary?

I'll add a comment. atomic_cmpxchg returns a signed value, so I think
the cast is needed.

> Save/restore has the disadvantage of the direction not being implicit.
> Save could mean save to hardware or save to software. The same is true
> for restore. However if the direction is clearly defined, save and
> restore work for me.
> 
> Maybe the comment could be changed to be more explicit. Something like:
> 
> 	/*
> 	 * Write cached syncpoint and waitbase values to hardware.
> 	 */
> 
> And for host1x_syncpt_save():
> 
> 	/*
> 	 * For client-managed registers, update the cached syncpoint and
> 	 * waitbase values by reading from the registers.
> 	 */

I was using save in the same way as f.ex. i915 (i915_suspend.c): save
state of hardware to RAM, restore state from RAM. I'll add proper
comments, but save and restore are for all syncpts, not only client managed.

> 
>>>> +/*
>>>> + * Updates the last value read from hardware.
>>>> + */
>>>> +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
>>>> +{
>>>> +     u32 val;
>>>> +     val = sp->dev->syncpt_op.load_min(sp);
>>>> +     trace_host1x_syncpt_load_min(sp->id, val);
>>>> +
>>>> +     return val;
>>>> +}
> Maybe the function should be called host1x_syncpt_load() if there is no
> equivalent way to load the maximum value (since there is no register to
> read from).

Sounds good. Maximum is just a software concept.

> That's certainly true for interrupts. However, if you look at the DMA
> subsystem for example, you can also request an unnamed resource.
> 
> The difference is sufficiently subtle that host1x_syncpt_allocate()
> would work for me too, though. I just have a slight preference for
> host1x_syncpt_request().

I don't really have a strong preference, so I'll follow your suggestion.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts
  2013-02-05  8:42       ` Thierry Reding
@ 2013-02-06 20:29         ` Terje Bergström
  2013-02-06 20:38           ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-06 20:29 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 05.02.2013 00:42, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 08:29:08PM -0800, Terje Bergström wrote:
>> host1x_get_host() actually needs that, so this #include should've also
>> been in previous patch.
> 
> No need to if you pass struct device * instead. You might need
> linux/device.h instead, though.

Can do.

> Another variant would be host1x_syncpt_irq() for the top-level handler
> and something host1x_handle_syncpt() to handle individual syncpoints. I
> like this one best, but this is pure bike-shedding and there's nothing
> technically wrong with the names you chose, so I can't really object if
> you want to stick to them.

I could use these names. They sound logical to me,too.

> 
>>>> +                     queue_work(intr->wq, &sp->work);
>>>
>>> Should the call to queue_work() perhaps be moved into
>>> host1x_intr_syncpt_thresh_isr().
>>
>> I'm not sure, either way would be ok to me. The current structure allows
>> host1x_intr_syncpt_thresh_isr() to only take one parameter
>> (host1x_intr_syncpt). If we move queue_work, we'd also need to pass
>> host1x_intr.
> 
> I think I'd still prefer to have all the code in one function because it
> make subsequent modification easier and less error-prone.

Ok, I'll do that change.

> Yeah, in that case I think we should bail out. It's not like we're
> expecting any failures. If the interrupt cannot be requested, something
> must seriously be wrong and we should tell users about it so that it can
> be fixed. Trying to continue on a best effort basis isn't useful here, I
> think.

Yep, I agree.

>> In patch 3, at submit time we first allocate waiter, then take
>> submit_lock, write submit to channel, and add the waiter while having
>> the lock. I did this so that I host1x_intr_add_action() can always
>> succeed. Otherwise I'd need to write another code path to handle the
>> case where we wrote a job to channel, but we're not able to add a
>> submit_complete action to it.
> 
> Okay. In that case why not allocate it on the stack in the first place
> so you don't have to bother with allocations (and potential failure) at
> all? The variable doesn't leave the function scope, so there shouldn't
> be any issues, right?

The submit code in patch 3 allocates a waiter, and the waiter outlives
the function scope. That waiter will clean up job queue once a job is
complete.

> Or if that doesn't work it would still be preferable to allocate memory
> in host1x_syncpt_wait() directly instead of going through the wrapper.

This was done purely, because I'm hiding the struct size from the
caller. If the caller needs to allocate, I need to expose the struct in
a header, not just a forward declaration.

>>>> +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
>>>
>>> Maybe you should keep the type of the irq_sync here so that it properly
>>> propagates to the call to devm_request_irq().
>>
>> I'm not sure what you mean. Do you mean that I should use unsigned int,
>> as that's the type used in devm_request_irq()?
> 
> Yes.

Ok, will do.

>>>> +void host1x_intr_stop(struct host1x_intr *intr)
>>>> +{
>>>> +     unsigned int id;
>>>> +     struct host1x *host1x = intr_to_host1x(intr);
>>>> +     struct host1x_intr_syncpt *syncpt;
>>>> +     u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
>>>> +
>>>> +     mutex_lock(&intr->mutex);
>>>> +
>>>> +     host1x->intr_op.disable_all_syncpt_intrs(intr);
>>>
>>> I haven't commented on this everywhere, but I think this could benefit
>>> from a wrapper that forwards this to the intr_op. The same goes for the
>>> sync_op.
>>
>> You mean something like "host1x_disable_all_syncpt_intrs"?
> 
> Yes. I think that'd be useful for each of the op functions. Perhaps you
> could even pass in a struct host1x * to make calls more uniform.

Ok, I'll add the wrapper, and I'll check if passing struct host1x *
would make sense. In effect that'd render struct host1x_intr mostly
unused, so how about if we just merge the contents of host1x_intr to host1x?

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts
  2013-02-06 20:29         ` Terje Bergström
@ 2013-02-06 20:38           ` Thierry Reding
  2013-02-06 20:41             ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-06 20:38 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1977 bytes --]

On Wed, Feb 06, 2013 at 12:29:26PM -0800, Terje Bergström wrote:
> On 05.02.2013 00:42, Thierry Reding wrote:
[...]
> > Or if that doesn't work it would still be preferable to allocate memory
> > in host1x_syncpt_wait() directly instead of going through the wrapper.
> 
> This was done purely, because I'm hiding the struct size from the
> caller. If the caller needs to allocate, I need to expose the struct in
> a header, not just a forward declaration.

I don't think we need to hide the struct from the caller. This is all
host1x internal. Even if a host1x client uses the struct it makes little
sense to hide it. They are all part of the same code base so there's not
much to be gained by hiding the structure definition.

> >>>> +void host1x_intr_stop(struct host1x_intr *intr)
> >>>> +{
> >>>> +     unsigned int id;
> >>>> +     struct host1x *host1x = intr_to_host1x(intr);
> >>>> +     struct host1x_intr_syncpt *syncpt;
> >>>> +     u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
> >>>> +
> >>>> +     mutex_lock(&intr->mutex);
> >>>> +
> >>>> +     host1x->intr_op.disable_all_syncpt_intrs(intr);
> >>>
> >>> I haven't commented on this everywhere, but I think this could benefit
> >>> from a wrapper that forwards this to the intr_op. The same goes for the
> >>> sync_op.
> >>
> >> You mean something like "host1x_disable_all_syncpt_intrs"?
> > 
> > Yes. I think that'd be useful for each of the op functions. Perhaps you
> > could even pass in a struct host1x * to make calls more uniform.
> 
> Ok, I'll add the wrapper, and I'll check if passing struct host1x *
> would make sense. In effect that'd render struct host1x_intr mostly
> unused, so how about if we just merge the contents of host1x_intr to host1x?

We can probably do that. It might make some sense to keep it in order to
scope the related fields but struct host1x isn't very large yet, so I
think omitting host1x_intr should be fine.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts
  2013-02-06 20:38           ` Thierry Reding
@ 2013-02-06 20:41             ` Terje Bergström
  0 siblings, 0 replies; 49+ messages in thread
From: Terje Bergström @ 2013-02-06 20:41 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 06.02.2013 12:38, Thierry Reding wrote:
> On Wed, Feb 06, 2013 at 12:29:26PM -0800, Terje Bergström wrote:
>> This was done purely, because I'm hiding the struct size from the
>> caller. If the caller needs to allocate, I need to expose the struct in
>> a header, not just a forward declaration.
> 
> I don't think we need to hide the struct from the caller. This is all
> host1x internal. Even if a host1x client uses the struct it makes little
> sense to hide it. They are all part of the same code base so there's not
> much to be gained by hiding the structure definition.

I agree, and will change.

>> Ok, I'll add the wrapper, and I'll check if passing struct host1x *
>> would make sense. In effect that'd render struct host1x_intr mostly
>> unused, so how about if we just merge the contents of host1x_intr to host1x?
> 
> We can probably do that. It might make some sense to keep it in order to
> scope the related fields but struct host1x isn't very large yet, so I
> think omitting host1x_intr should be fine.

Yes, it's not very large, and it'd remove a lot of casting between
host1x and host1x_intr, so I'll just do that.

Terje


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
  2013-02-05  9:15       ` Thierry Reding
@ 2013-02-06 20:58         ` Terje Bergström
  2013-02-08  6:54           ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-06 20:58 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 05.02.2013 01:15, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
>> This is used by debugfs code to direct to debugfs, and
>> nvhost_debug_dump() to send via printk.
> 
> Yes, that was precisely my point. Why bother providing the same data via
> several output methods. debugfs is good for showing large amounts of
> data such as register dumps or a tabular representation of syncpoints
> for instance.
> 
> If, however, you want to interactively show debug information using
> printk the same format isn't very useful and something more reduced is
> often better.

debugfs is there to be able to get a reliable dump of host1x state
(f.ex. no lines intermixed with other output).

printk output is there because often we get just UART logs from failure
cases, and having as much information as possible in the logs speeds up
debugging.

Both of them need to output the values of sync points, and the channel
state. Dumping all of that consists of a lot of code, and I wouldn't
want to duplicate that for two output formats.

>> I have another suggestion. In downstream I removed the decoding part and
>> I just print out a string of hex. That removes quite a bit bunch of code
>> from kernel. It makes the debug output also more compact.
> I don't know. I think if you use in-kernel debugging facilities such as
> debugfs or printk, then the output should be self-explanatory. However I
> do see the usefulness of having a binary dump that can be decoded in
> userspace. But I think if we want to go that way we should make that a
> separate interface. USB provides something like that, which can then be
> fed to libpcap or wireshark to capture and analyze USB traffic. If done
> properly you get replay functionality for free. I don't know what infra-
> structure exists to help with implementing something similar.

It's not actually binary. I think I misrepresented the suggestion.

I'm suggesting that we'd display only the contents of command FIFO and
contents of gathers (i.e. all opcodes) in hex format, not decoded. All
other text would remain as is, so syncpt values, etc would be readable
by a glance.

The user space tool can then take the streams and decode them if needed.

We've noticed that the decoded opcodes format can be very long and
sometimes takes a minute to dump out via a slow console. The hex output
is much more compact and faster to dump.

Actual tracing or wireshark kind of capability would come via decoding
the ftrace log. When enabled, everything that is written to the channel,
is also written to ftrace.

>>>> +static void show_channel_word(struct output *o, int *state, int *count,
>>>> +		u32 addr, u32 val, struct host1x_cdma *cdma)
>>>> +{
>>>> +	static int start_count, dont_print;
>>>
>>> What if two processes read debug information at the same time?
>>
>> show_channels() acquires cdma.lock, so that shouldn't happen.
> 
> Okay. Another solution would be to pass around a debug context which
> keeps track of the variables. But if we opt for a more involved dump
> interface as discussed above this will no longer be relevant.

Actually, debugging process needs cdma.lock, because it goes through the
cdma queue. Also command FIFO dumping is something that must be done by
a single thread at a time.

> Okay. In the context of a channel dump interface this may not be
> relevant anymore. Can you think of any issue that wouldn't be detectable
> or debuggable by analyzing a binary dump of the data within a channel?
> I'm asking because at that point we wouldn't be able to access any of
> the in-kernel data structures but would have to rely on the data itself
> for diagnostics. IOMMU virtual addresses won't be available and so on.

In many cases, looking at syncpt values, and channel state
(active/waiting on a syncpt, etc) gives an indication on what is the
current state of hardware. But, very often problems are ripple effects
on something that happened earlier and the job that caused the problem
has already been freed and is not visible in the dump.

To get a full history, we need often need the ftrace log.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-02-05  9:54       ` Thierry Reding
@ 2013-02-06 21:23         ` Terje Bergström
  2013-02-08  7:07           ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-06 21:23 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 05.02.2013 01:54, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
>> Yeah, it's actually working around the host1x duplicate naming.
>> host1x_syncpt_get takes struct host1x as parameter, but that's different
>> host1x than in this code.
> 
> So maybe a better way would be to rename the DRM host1x after all. If it
> avoids the need for workarounds such as this I think it justifies the
> additional churn.

Ok, I'll include that. Do you have a preference for the name? Something
like "host1x_drm" might work?

>>> Also, how useful is it to create a context? Looking at the gr2d
>>> implementation for .open_channel(), it will return the same channel to
>>> whichever userspace process requests them. Can you explain why it is
>>> necessary at all? From the name I would have expected some kind of
>>> context switching to take place when different applications submit
>>> requests to the same client, but that doesn't seem to be the case.
>>
>> Hardware context switching will be a later submit, and it'll actually
>> create a new structure. Hardware context might live longer than the
>> process that created it, so they'll need to be separate.
> 
> Why would it live longer than the process? Isn't the whole purpose of
> the context to keep per-process state? What use is that state if the
> process dies?

Hardware context has to be kept alive for as long as there's a job
running from that process. If an app sends 10 jobs to 2D channel, and
dies immediately, there's no sane way for host1x to remove the jobs from
queue. The jobs will keep on running and kernel will need to track them.

>> Perhaps the justification is that this way we can keep the kernel API
>> stable even when we add support for hardware contexts and other clients.
> 
> We don't need a stable kernel API. But I guess it is fine to keep it if
> for no other reason to fill the context returned in the ioctl() with
> meaningful data.

Sorry, I meant stable IOCTL API, so we agree on this.

>> host1x_drm_file sounds a bit odd, because it's not really a file, but a
>> private data pointer stored in driver_priv.
> 
> The same is true for struct drm_file, which is stored in struct file's
> .private_data field. I find it to be very intuitive if the inheritance
> is reflected in the structure name. struct host1x_drm_file is host1x'
> driver-specific part of struct drm_file.

Ok, makes sense. I'll do that.

> I fail to see how dma_buf would require a separate mem_mgr_type. Can we
> perhaps postpone this to a later point and just go with CMA as the only
> alternative for now until we have an actual working implementation that
> we can use this for?

Each submit refers to a number of buffers. Some of them are the streams,
some are textures or other input/output buffers. Each of these buffers
might be passed as a GEM handle, or (when implemented) as a dma_buf fd.
Thus we need a field to tell host1x which API to call to handle that handle.

I think we can leave out the code for managing the type until we
actually have separate memory managers. That'd make GEM handles
effectively of type 0, as we don't set it.

> 
>>>> +static int gr2d_submit(struct host1x_drm_context *context,
>>>> +             struct tegra_drm_submit_args *args,
>>>> +             struct drm_device *drm,
>>>> +             struct drm_file *file_priv)
>>>> +{
>>>> +     struct host1x_job *job;
>>>> +     int num_cmdbufs = args->num_cmdbufs;
>>>> +     int num_relocs = args->num_relocs;
>>>> +     int num_waitchks = args->num_waitchks;
>>>> +     struct tegra_drm_cmdbuf __user *cmdbufs =
>>>> +             (void * __user)(uintptr_t)args->cmdbufs;
>>>> +     struct tegra_drm_reloc __user *relocs =
>>>> +             (void * __user)(uintptr_t)args->relocs;
>>>> +     struct tegra_drm_waitchk __user *waitchks =
>>>> +             (void * __user)(uintptr_t)args->waitchks;
>>>
>>> No need for all the uintptr_t casts.
>>
>> Will try to remove - but I do remember getting compiler warnings without
>> them.
> 
> I think you shouldn't even have to cast to void * first. Just cast to
> the target type directly. I don't see why the compiler should complain.

This is what I get without them:

drivers/gpu/host1x/drm/gr2d.c:108:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-cast]
drivers/gpu/host1x/drm/gr2d.c:110:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-cast]
drivers/gpu/host1x/drm/gr2d.c:112:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-c

The problem is that the fields are __u64's and can't be cast directly
into 32-bit pointers.

>> That's the security firewall. It walks through each submit, and ensures
>> that each register write that writes an address, goes through the host1x
>> reloc mechanism. This way user space cannot ask 2D to write to arbitrary
>> memory locations.
> 
> I see. Can this be made more generic? Perhaps adding a table of valid
> registers to the device and use a generic function to iterate over that
> instead of having to provide the same function for each client.

For which one does gcc generate more efficient code? I've thought a
switch-case statement might get compiled into something more efficient
than a table lookup.

But the rest of the code is generic - just the one function which
compares against known address registers is specific to 2D.

>>>> +static int __exit gr2d_remove(struct platform_device *dev)
>>>> +{
>>>> +     struct host1x *host1x =
>>>> +             host1x_get_drm_data(to_platform_device(dev->dev.parent));
>>>> +     struct gr2d *gr2d = platform_get_drvdata(dev);
>>>> +     int err;
>>>> +
>>>> +     err = host1x_unregister_client(host1x, &gr2d->client);
>>>> +     if (err < 0) {
>>>> +             dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
>>>> +                     err);
>>>> +             return err;
>>>> +     }
>>>> +
>>>> +     host1x_syncpt_free(gr2d->syncpt);
>>>> +     return 0;
>>>> +}
>>>
>>> Isn't this missing a host1x_channel_put() or host1x_free_channel()?
>>
>> All references should be handled in gr2d_open_channel() and
>> gr2d_close_channel(). I think we'd need to ensure all contexts are
>> closed at this point.
> 
> Yes, that'd work as well. Actually I would assume that all contexts
> associated with a given file should be freed when the file is closed.
> That way all of this should work pretty much automatically.

Naturally they are, so we're actually already good. All contexts get
closed at file close.

> For timeout == 0 I don't think we need a symbolic name. It is pretty
> common for 0 to mean no timeout. But yes, DRM_TEGRA_INFINITE_TIMEOUT
> should be okay.

Ok, will do that.

>>>> +struct tegra_drm_syncpt_incr {
>>>> +     __u32 syncpt_id;
>>>> +     __u32 syncpt_incrs;
>>>> +};
>>>
>>> Maybe the fields would be better named id and incrs. Though I also
>>> notice that incrs is never used. I guess that's supposed to be used in
>>> the future to allow increments by more than a single value. If so,
>>> perhaps value would be a better name.
>>
>> It's actually used in the dreaded patch 3, as part of tegra_drm_submit_args.
> 
> Okay. The superfluous syncpt_ prefixes should still go away.

Sure, forgot to comment, but I'm fine with that.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support
  2013-02-06 20:58         ` Terje Bergström
@ 2013-02-08  6:54           ` Thierry Reding
  0 siblings, 0 replies; 49+ messages in thread
From: Thierry Reding @ 2013-02-08  6:54 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4687 bytes --]

On Wed, Feb 06, 2013 at 12:58:19PM -0800, Terje Bergström wrote:
> On 05.02.2013 01:15, Thierry Reding wrote:
> > On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
> >> This is used by debugfs code to direct to debugfs, and
> >> nvhost_debug_dump() to send via printk.
> > 
> > Yes, that was precisely my point. Why bother providing the same data via
> > several output methods. debugfs is good for showing large amounts of
> > data such as register dumps or a tabular representation of syncpoints
> > for instance.
> > 
> > If, however, you want to interactively show debug information using
> > printk the same format isn't very useful and something more reduced is
> > often better.
> 
> debugfs is there to be able to get a reliable dump of host1x state
> (f.ex. no lines intermixed with other output).
> 
> printk output is there because often we get just UART logs from failure
> cases, and having as much information as possible in the logs speeds up
> debugging.
> 
> Both of them need to output the values of sync points, and the channel
> state. Dumping all of that consists of a lot of code, and I wouldn't
> want to duplicate that for two output formats.

I'm still not convinced, but I think I could live with it. =)

> >> I have another suggestion. In downstream I removed the decoding part and
> >> I just print out a string of hex. That removes quite a bit bunch of code
> >> from kernel. It makes the debug output also more compact.
> > I don't know. I think if you use in-kernel debugging facilities such as
> > debugfs or printk, then the output should be self-explanatory. However I
> > do see the usefulness of having a binary dump that can be decoded in
> > userspace. But I think if we want to go that way we should make that a
> > separate interface. USB provides something like that, which can then be
> > fed to libpcap or wireshark to capture and analyze USB traffic. If done
> > properly you get replay functionality for free. I don't know what infra-
> > structure exists to help with implementing something similar.
> 
> It's not actually binary. I think I misrepresented the suggestion.
> 
> I'm suggesting that we'd display only the contents of command FIFO and
> contents of gathers (i.e. all opcodes) in hex format, not decoded. All
> other text would remain as is, so syncpt values, etc would be readable
> by a glance.
> 
> The user space tool can then take the streams and decode them if needed.
> 
> We've noticed that the decoded opcodes format can be very long and
> sometimes takes a minute to dump out via a slow console. The hex output
> is much more compact and faster to dump.
> 
> Actual tracing or wireshark kind of capability would come via decoding
> the ftrace log. When enabled, everything that is written to the channel,
> is also written to ftrace.

Okay, I'll have to take a closer look at ftrace since I've never used it
before. It sounds like extra infrastructure won't be necessary then.

> >>>> +static void show_channel_word(struct output *o, int *state, int *count,
> >>>> +		u32 addr, u32 val, struct host1x_cdma *cdma)
> >>>> +{
> >>>> +	static int start_count, dont_print;
> >>>
> >>> What if two processes read debug information at the same time?
> >>
> >> show_channels() acquires cdma.lock, so that shouldn't happen.
> > 
> > Okay. Another solution would be to pass around a debug context which
> > keeps track of the variables. But if we opt for a more involved dump
> > interface as discussed above this will no longer be relevant.
> 
> Actually, debugging process needs cdma.lock, because it goes through the
> cdma queue. Also command FIFO dumping is something that must be done by
> a single thread at a time.
> 
> > Okay. In the context of a channel dump interface this may not be
> > relevant anymore. Can you think of any issue that wouldn't be detectable
> > or debuggable by analyzing a binary dump of the data within a channel?
> > I'm asking because at that point we wouldn't be able to access any of
> > the in-kernel data structures but would have to rely on the data itself
> > for diagnostics. IOMMU virtual addresses won't be available and so on.
> 
> In many cases, looking at syncpt values, and channel state
> (active/waiting on a syncpt, etc) gives an indication on what is the
> current state of hardware. But, very often problems are ripple effects
> on something that happened earlier and the job that caused the problem
> has already been freed and is not visible in the dump.
> 
> To get a full history, we need often need the ftrace log.

So that's already covered. Great!

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-02-06 21:23         ` Terje Bergström
@ 2013-02-08  7:07           ` Thierry Reding
  2013-02-11  0:42             ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-08  7:07 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5732 bytes --]

On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
> On 05.02.2013 01:54, Thierry Reding wrote:
> > On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
> >> Yeah, it's actually working around the host1x duplicate naming.
> >> host1x_syncpt_get takes struct host1x as parameter, but that's different
> >> host1x than in this code.
> > 
> > So maybe a better way would be to rename the DRM host1x after all. If it
> > avoids the need for workarounds such as this I think it justifies the
> > additional churn.
> 
> Ok, I'll include that. Do you have a preference for the name? Something
> like "host1x_drm" might work?

Yes, that sounds good.

> >>> Also, how useful is it to create a context? Looking at the gr2d
> >>> implementation for .open_channel(), it will return the same channel to
> >>> whichever userspace process requests them. Can you explain why it is
> >>> necessary at all? From the name I would have expected some kind of
> >>> context switching to take place when different applications submit
> >>> requests to the same client, but that doesn't seem to be the case.
> >>
> >> Hardware context switching will be a later submit, and it'll actually
> >> create a new structure. Hardware context might live longer than the
> >> process that created it, so they'll need to be separate.
> > 
> > Why would it live longer than the process? Isn't the whole purpose of
> > the context to keep per-process state? What use is that state if the
> > process dies?
> 
> Hardware context has to be kept alive for as long as there's a job
> running from that process. If an app sends 10 jobs to 2D channel, and
> dies immediately, there's no sane way for host1x to remove the jobs from
> queue. The jobs will keep on running and kernel will need to track them.

Okay, I understand now. There was one additional thing that I wanted to
point out, but the context is gone now. I'll go through the patch again
and reply there.

> > I fail to see how dma_buf would require a separate mem_mgr_type. Can we
> > perhaps postpone this to a later point and just go with CMA as the only
> > alternative for now until we have an actual working implementation that
> > we can use this for?
> 
> Each submit refers to a number of buffers. Some of them are the streams,
> some are textures or other input/output buffers. Each of these buffers
> might be passed as a GEM handle, or (when implemented) as a dma_buf fd.
> Thus we need a field to tell host1x which API to call to handle that handle.

Understood.

> I think we can leave out the code for managing the type until we
> actually have separate memory managers. That'd make GEM handles
> effectively of type 0, as we don't set it.

I think that's a good idea. Let's start simple for now and who knows
what else will have changed by the time we get to implement dma_buf.
Maybe Lucas will have finished his work on the allocator and we will
need to synchronize with that anyway.

> >>>> +static int gr2d_submit(struct host1x_drm_context *context,
> >>>> +             struct tegra_drm_submit_args *args,
> >>>> +             struct drm_device *drm,
> >>>> +             struct drm_file *file_priv)
> >>>> +{
> >>>> +     struct host1x_job *job;
> >>>> +     int num_cmdbufs = args->num_cmdbufs;
> >>>> +     int num_relocs = args->num_relocs;
> >>>> +     int num_waitchks = args->num_waitchks;
> >>>> +     struct tegra_drm_cmdbuf __user *cmdbufs =
> >>>> +             (void * __user)(uintptr_t)args->cmdbufs;
> >>>> +     struct tegra_drm_reloc __user *relocs =
> >>>> +             (void * __user)(uintptr_t)args->relocs;
> >>>> +     struct tegra_drm_waitchk __user *waitchks =
> >>>> +             (void * __user)(uintptr_t)args->waitchks;
> >>>
> >>> No need for all the uintptr_t casts.
> >>
> >> Will try to remove - but I do remember getting compiler warnings without
> >> them.
> > 
> > I think you shouldn't even have to cast to void * first. Just cast to
> > the target type directly. I don't see why the compiler should complain.
> 
> This is what I get without them:
> 
> drivers/gpu/host1x/drm/gr2d.c:108:3: warning: cast to pointer from
> integer of different size [-Wint-to-pointer-cast]
> drivers/gpu/host1x/drm/gr2d.c:110:3: warning: cast to pointer from
> integer of different size [-Wint-to-pointer-cast]
> drivers/gpu/host1x/drm/gr2d.c:112:3: warning: cast to pointer from
> integer of different size [-Wint-to-pointer-c
> 
> The problem is that the fields are __u64's and can't be cast directly
> into 32-bit pointers.

Alright.

> >> That's the security firewall. It walks through each submit, and ensures
> >> that each register write that writes an address, goes through the host1x
> >> reloc mechanism. This way user space cannot ask 2D to write to arbitrary
> >> memory locations.
> > 
> > I see. Can this be made more generic? Perhaps adding a table of valid
> > registers to the device and use a generic function to iterate over that
> > instead of having to provide the same function for each client.
> 
> For which one does gcc generate more efficient code? I've thought a
> switch-case statement might get compiled into something more efficient
> than a table lookup.
> 
> But the rest of the code is generic - just the one function which
> compares against known address registers is specific to 2D.

Table lookup should be pretty fast. I wouldn't worry too much about
performance at this stage, though. Readability is more important in my
opinion. A lookup table is a lot more readable and reusable I think. If
it turns out that using a function is actually faster we can always
optimize later.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-02-08  7:07           ` Thierry Reding
@ 2013-02-11  0:42             ` Terje Bergström
  2013-02-11  6:44               ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-02-11  0:42 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 07.02.2013 23:07, Thierry Reding wrote:
> On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
>>>> That's the security firewall. It walks through each submit, and ensures
>>>> that each register write that writes an address, goes through the host1x
>>>> reloc mechanism. This way user space cannot ask 2D to write to arbitrary
>>>> memory locations.
>>> I see. Can this be made more generic? Perhaps adding a table of valid
>>> registers to the device and use a generic function to iterate over that
>>> instead of having to provide the same function for each client.
>> For which one does gcc generate more efficient code? I've thought a
>> switch-case statement might get compiled into something more efficient
>> than a table lookup.
>> But the rest of the code is generic - just the one function which
>> compares against known address registers is specific to 2D.
> Table lookup should be pretty fast. I wouldn't worry too much about
> performance at this stage, though. Readability is more important in my
> opinion. A lookup table is a lot more readable and reusable I think. If
> it turns out that using a function is actually faster we can always
> optimize later.

You're right about performance. We already saw quite a bad performance
hit with the current firewall, so we'll need to worry about performance
later.

I'll take a look at converting the register list to a table. Instead of
always doing a linear search of a table, a bitfield might be more
appropriate.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-02-11  0:42             ` Terje Bergström
@ 2013-02-11  6:44               ` Thierry Reding
  2013-02-11 15:40                 ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-11  6:44 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2121 bytes --]

On Sun, Feb 10, 2013 at 04:42:53PM -0800, Terje Bergström wrote:
> On 07.02.2013 23:07, Thierry Reding wrote:
> > On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
> >>>> That's the security firewall. It walks through each submit, and ensures
> >>>> that each register write that writes an address, goes through the host1x
> >>>> reloc mechanism. This way user space cannot ask 2D to write to arbitrary
> >>>> memory locations.
> >>> I see. Can this be made more generic? Perhaps adding a table of valid
> >>> registers to the device and use a generic function to iterate over that
> >>> instead of having to provide the same function for each client.
> >> For which one does gcc generate more efficient code? I've thought a
> >> switch-case statement might get compiled into something more efficient
> >> than a table lookup.
> >> But the rest of the code is generic - just the one function which
> >> compares against known address registers is specific to 2D.
> > Table lookup should be pretty fast. I wouldn't worry too much about
> > performance at this stage, though. Readability is more important in my
> > opinion. A lookup table is a lot more readable and reusable I think. If
> > it turns out that using a function is actually faster we can always
> > optimize later.
> 
> You're right about performance. We already saw quite a bad performance
> hit with the current firewall, so we'll need to worry about performance
> later.

I guess the additional overhead of looking up in a table vs. an actual
function being run will be rather small compared to the total overhead
incurred by having the firewall in the first place.

> I'll take a look at converting the register list to a table. Instead of
> always doing a linear search of a table, a bitfield might be more
> appropriate.

I don't know. Just a plain table with register offsets seems a lot more
straightforward than a bitfield. In my opinion an array of offsets is a
lot more readable than a field of bits. Especially since you can't just
setup a bitfield easily with initialized values.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device
  2013-02-11  6:44               ` Thierry Reding
@ 2013-02-11 15:40                 ` Terje Bergström
  0 siblings, 0 replies; 49+ messages in thread
From: Terje Bergström @ 2013-02-11 15:40 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 10.02.2013 22:44, Thierry Reding wrote:
> On Sun, Feb 10, 2013 at 04:42:53PM -0800, Terje Bergström wrote:
>> You're right about performance. We already saw quite a bad performance
>> hit with the current firewall, so we'll need to worry about performance
>> later.
> 
> I guess the additional overhead of looking up in a table vs. an actual
> function being run will be rather small compared to the total overhead
> incurred by having the firewall in the first place.

Yeah, I'll just implement a simple linear table lookup and let's see
what happens. I'll optimize with bitfield if needed.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-01-15 11:43 ` [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support Terje Bergstrom
@ 2013-02-25 15:24   ` Thierry Reding
  2013-02-26  9:48     ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-02-25 15:24 UTC (permalink / raw)
  To: Terje Bergstrom
  Cc: amerilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 57263 bytes --]

On Tue, Jan 15, 2013 at 01:43:59PM +0200, Terje Bergstrom wrote:
[...]
> diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
> index e89fb2b..57680a6 100644
> --- a/drivers/gpu/host1x/Kconfig
> +++ b/drivers/gpu/host1x/Kconfig
> @@ -3,4 +3,27 @@ config TEGRA_HOST1X
>  	help
>  	  Driver for the Tegra host1x hardware.
>  
> -	  Required for enabling tegradrm.
> +	  Required for enabling tegradrm and 2D acceleration.

I don't think I commented on this in the other patches, but I think this
could use a bit more information about what host1x is. Also mentioning
that it is a requirement for tegra-drm and 2D acceleration isn't very
useful because it can equally well be expressed in Kconfig. If you add
some description about what host1x is, people will know that they want
to enable it.

> +if TEGRA_HOST1X
> +
> +config TEGRA_HOST1X_CMA
> +	bool "Support DRM CMA buffers"
> +	depends on DRM
> +	default y
> +	select DRM_GEM_CMA_HELPER
> +	select DRM_KMS_CMA_HELPER
> +	help
> +	  Say yes if you wish to use DRM CMA buffers.
> +
> +	  If unsure, choose Y.

Perhaps make this not user-selectable (for now)? If somebody disables
this explicitly they won't get a working driver, right?

> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
[...]
> +#include "cdma.h"
> +#include "channel.h"
> +#include "dev.h"
> +#include "memmgr.h"
> +#include "job.h"
> +#include <asm/cacheflush.h>
> +
> +#include <linux/slab.h>
> +#include <linux/kfifo.h>
> +#include <linux/interrupt.h>
> +#include <trace/events/host1x.h>
> +
> +#define TRACE_MAX_LENGTH 128U

"" includes generally follow <> ones.

> +/*
> + * Add an entry to the sync queue.
> + */
> +static void add_to_sync_queue(struct host1x_cdma *cdma,
> +			      struct host1x_job *job,
> +			      u32 nr_slots,
> +			      u32 first_get)
> +{
> +	if (job->syncpt_id == NVSYNCPT_INVALID) {
> +		dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n",
> +				__func__);
> +		return;
> +	}
> +
> +	job->first_get = first_get;
> +	job->num_slots = nr_slots;
> +	host1x_job_get(job);
> +	list_add_tail(&job->list, &cdma->sync_queue);
> +}

It's a bit odd that you pass a job in here along with some parameters
that are then assigned to the job's fields. Couldn't you just assign
them to the job's fields before passing the job into this function?

I also see that you only use this function once, so maybe you could
open-code it instead.

> +/*
> + * Return the status of the cdma's sync queue or push buffer for the given event
> + *  - sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-)
> + *  - pb space: returns the number of free slots in the channel's push buffer
> + * Must be called with the cdma lock held.
> + */
> +static unsigned int cdma_status_locked(struct host1x_cdma *cdma,
> +		enum cdma_event event)
> +{
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +	switch (event) {
> +	case CDMA_EVENT_SYNC_QUEUE_EMPTY:
> +		return list_empty(&cdma->sync_queue) ? 1 : 0;
> +	case CDMA_EVENT_PUSH_BUFFER_SPACE: {
> +		struct push_buffer *pb = &cdma->push_buffer;
> +		return host1x->cdma_pb_op.space(pb);
> +	}
> +	default:
> +		return 0;
> +	}
> +}

Similarly this function is only used in one place and it requires a
whole lot of documentation to define the meaning of the return value. If
you implement this functionality directly in host1x_cdma_wait_locked()
you have much more context and don't require all this "protocol".

> +/*
> + * Start timer for a buffer submition that has completed yet.

"submission". And I don't understand the "that has completed yet" part.

> + * Must be called with the cdma lock held.
> + */
> +static void cdma_start_timer_locked(struct host1x_cdma *cdma,
> +		struct host1x_job *job)

You use two different styles to indent the function parameters. You
might want to stick to one, preferably aligning them with the first
parameter on the first line.

> +{
> +	struct host1x *host = cdma_to_host1x(cdma);
> +
> +	if (cdma->timeout.clientid) {
> +		/* timer already started */
> +		return;
> +	}
> +
> +	cdma->timeout.clientid = job->clientid;
> +	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
> +	cdma->timeout.syncpt_val = job->syncpt_end;
> +	cdma->timeout.start_ktime = ktime_get();
> +
> +	schedule_delayed_work(&cdma->timeout.wq,
> +			msecs_to_jiffies(job->timeout));
> +}
> +
> +/*
> + * Stop timer when a buffer submition completes.

"submission"

> +/*
> + * For all sync queue entries that have already finished according to the
> + * current sync point registers:
> + *  - unpin & unref their mems
> + *  - pop their push buffer slots
> + *  - remove them from the sync queue
> + * This is normally called from the host code's worker thread, but can be
> + * called manually if necessary.
> + * Must be called with the cdma lock held.
> + */
> +static void update_cdma_locked(struct host1x_cdma *cdma)
> +{
> +	bool signal = false;
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +	struct host1x_job *job, *n;
> +
> +	/* If CDMA is stopped, queue is cleared and we can return */
> +	if (!cdma->running)
> +		return;
> +
> +	/*
> +	 * Walk the sync queue, reading the sync point registers as necessary,
> +	 * to consume as many sync queue entries as possible without blocking
> +	 */
> +	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
> +		struct host1x_syncpt *sp = host1x->syncpt + job->syncpt_id;

host1x_syncpt_get()?

> +
> +		/* Check whether this syncpt has completed, and bail if not */
> +		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
> +			/* Start timer on next pending syncpt */
> +			if (job->timeout)
> +				cdma_start_timer_locked(cdma, job);
> +			break;
> +		}
> +
> +		/* Cancel timeout, when a buffer completes */
> +		if (cdma->timeout.clientid)
> +			stop_cdma_timer_locked(cdma);
> +
> +		/* Unpin the memory */
> +		host1x_job_unpin(job);
> +
> +		/* Pop push buffer slots */
> +		if (job->num_slots) {
> +			struct push_buffer *pb = &cdma->push_buffer;
> +			host1x->cdma_pb_op.pop_from(pb, job->num_slots);
> +			if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
> +				signal = true;
> +		}
> +
> +		list_del(&job->list);
> +		host1x_job_put(job);
> +	}
> +
> +	if (list_empty(&cdma->sync_queue) &&
> +				cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
> +			signal = true;

This looks funny, maybe:

	if (cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY &&
	    list_empty(&cdma->sync_queue))
		signal = true;

?

> +
> +	/* Wake up CdmaWait() if the requested event happened */

CdmaWait()? Where's that?

> +	if (signal) {
> +		cdma->event = CDMA_EVENT_NONE;
> +		up(&cdma->sem);
> +	}
> +}
> +
> +void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
> +		struct platform_device *dev)

There's nothing in this function that requires a platform_device, so
passing struct device should be enough. Or maybe host1x_cdma should get
a struct device * field?

> +{
> +	u32 get_restart;

Maybe just call this "restart" or "restart_addr". get_restart sounds
like a function name.

> +	u32 syncpt_incrs;
> +	struct host1x_job *job = NULL;
> +	u32 syncpt_val;
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +
> +	syncpt_val = host1x_syncpt_load_min(cdma->timeout.syncpt);
> +
> +	dev_dbg(&dev->dev,
> +		"%s: starting cleanup (thresh %d)\n",
> +		__func__, syncpt_val);

This fits on two lines.

> +
> +	/*
> +	 * Move the sync_queue read pointer to the first entry that hasn't
> +	 * completed based on the current HW syncpt value. It's likely there
> +	 * won't be any (i.e. we're still at the head), but covers the case
> +	 * where a syncpt incr happens just prior/during the teardown.
> +	 */
> +
> +	dev_dbg(&dev->dev,
> +		"%s: skip completed buffers still in sync_queue\n",
> +		__func__);

This too.

> +	list_for_each_entry(job, &cdma->sync_queue, list) {
> +		if (syncpt_val < job->syncpt_end)
> +			break;
> +
> +		host1x_job_dump(&dev->dev, job);
> +	}

That's potentially a lot of debug output. I wonder if it might make
sense to control parts of this via a module parameter. Then again, if
somebody really needs to debug this, maybe they really want *all* the
information.

> +	/*
> +	 * Walk the sync_queue, first incrementing with the CPU syncpts that
> +	 * are partially executed (the first buffer) or fully skipped while
> +	 * still in the current context (slots are also NOP-ed).
> +	 *
> +	 * At the point contexts are interleaved, syncpt increments must be
> +	 * done inline with the pushbuffer from a GATHER buffer to maintain
> +	 * the order (slots are modified to be a GATHER of syncpt incrs).
> +	 *
> +	 * Note: save in get_restart the location where the timed out buffer
> +	 * started in the PB, so we can start the refetch from there (with the
> +	 * modified NOP-ed PB slots). This lets things appear to have completed
> +	 * properly for this buffer and resources are freed.
> +	 */
> +
> +	dev_dbg(&dev->dev,
> +		"%s: perform CPU incr on pending same ctx buffers\n",
> +		__func__);

Can be collapsed to two lines.

> +
> +	get_restart = cdma->last_put;
> +	if (!list_empty(&cdma->sync_queue))
> +		get_restart = job->first_get;

Perhaps:

	if (list_empty(&cdma->sync_queue))
		restart = cdma->last_put;
	else
		restart = job->first_get;

?

> +	list_for_each_entry_from(job, &cdma->sync_queue, list)
> +		if (job->clientid == cdma->timeout.clientid)
> +			job->timeout = 500;

I think this warrants a comment.

> +/*
> + * Destroy a cdma
> + */
> +void host1x_cdma_deinit(struct host1x_cdma *cdma)
> +{
> +	struct push_buffer *pb = &cdma->push_buffer;
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +
> +	if (cdma->running) {
> +		pr_warn("%s: CDMA still running\n",
> +				__func__);
> +	} else {
> +		host1x->cdma_pb_op.destroy(pb);
> +		host1x->cdma_op.timeout_destroy(cdma);
> +	}
> +}

There's no way to recover from the situation where a cdma is still
running. Can this not return an error code (-EBUSY?) if the cdma can't
be destroyed?

> +/*
> + * End a cdma submit
> + * Kick off DMA, add job to the sync queue, and a number of slots to be freed
> + * from the pushbuffer. The handles for a submit must all be pinned at the same
> + * time, but they can be unpinned in smaller chunks.
> + */
> +void host1x_cdma_end(struct host1x_cdma *cdma,
> +		struct host1x_job *job)
> +{
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +	bool was_idle = list_empty(&cdma->sync_queue);

Maybe just "idle"? It reflects the current state of the CDMA, not any
old state.

> +
> +	host1x->cdma_op.kick(cdma);
> +
> +	add_to_sync_queue(cdma,
> +			job,
> +			cdma->slots_used,
> +			cdma->first_get);

No need to split this over so many lines. Also, shouldn't the order be
reversed here? I.e. first add to sync queue, then start DMA?

> +	/* start timer on idle -> active transitions */
> +	if (job->timeout && was_idle)
> +		cdma_start_timer_locked(cdma, job);

This could be part of add_to_sync_queue(), but if you open-code that as
I suggest earlier it should obviously stay.

> diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
[...]
> +struct platform_device;

No need for this if you pass struct device * instead.

> +/*
> + * cdma
> + *
> + * This is in charge of a host command DMA channel.
> + * Sends ops to a push buffer, and takes responsibility for unpinning
> + * (& possibly freeing) of memory after those ops have completed.
> + * Producer:
> + *	begin
> + *		push - send ops to the push buffer
> + *	end - start command DMA and enqueue handles to be unpinned
> + * Consumer:
> + *	update - call to update sync queue and push buffer, unpin memory
> + */

I find the name to be a bit confusing. For some reason I automatically
think of GSM when I read CDMA. This really is more of a job queue, so
maybe calling it host1x_job_queue might be more appropriate. But I've
already requested a lot of things to be renamed, so I think I can live
with this being called CDMA if you don't want to change it.

Alternatively all of these could be moved to the struct host1x_channel
given that there's only one of each of the push_buffer, buffer_timeout
and host1x_cma objects per channel.

> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
[...]
> +#include "channel.h"
> +#include "dev.h"
> +#include "job.h"
> +
> +#include <linux/slab.h>
> +#include <linux/module.h>

Again the include ordering is strange.

> +/*
> + * Iterator function for host1x device list
> + * It takes a fptr as an argument and calls that function for each
> + * device in the list
> + */
> +void host1x_channel_for_all(struct host1x *host1x, void *data,
> +	int (*fptr)(struct host1x_channel *ch, void *fdata))
> +{
> +	struct host1x_channel *ch;
> +	int ret;
> +
> +	list_for_each_entry(ch, &host1x->chlist.list, list) {
> +		if (ch && fptr) {
> +			ret = fptr(ch, data);
> +			if (ret) {
> +				pr_info("%s: iterator error\n", __func__);
> +				break;
> +			}
> +		}
> +	}
> +}

Couldn't you rewrite this as a macro, similar to list_for_each_entry()
so that users could do something like:

	host1x_for_each_channel(channel, host1x) {
		...
	}

That's a bit friendlier than having each user write a separate function
to be called from this iterator.

> +int host1x_channel_submit(struct host1x_job *job)
> +{
> +	return host1x_get_host(job->ch->dev)->channel_op.submit(job);
> +}

I'd expect a function named host1x_channel_submit() to take a struct
host1x_channel *. Should this perhaps be called host1x_job_submit()?

> +struct host1x_channel *host1x_channel_get(struct host1x_channel *ch)
> +{
> +	int err = 0;
> +
> +	mutex_lock(&ch->reflock);
> +	if (ch->refcount == 0)
> +		err = host1x_cdma_init(&ch->cdma);
> +	if (!err)
> +		ch->refcount++;
> +
> +	mutex_unlock(&ch->reflock);
> +
> +	return err ? NULL : ch;
> +}

Why don't you use any of the kernel's reference counting mechanisms?

> +void host1x_channel_put(struct host1x_channel *ch)
> +{
> +	mutex_lock(&ch->reflock);
> +	if (ch->refcount == 1) {
> +		host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
> +		host1x_cdma_deinit(&ch->cdma);
> +	}
> +	ch->refcount--;
> +	mutex_unlock(&ch->reflock);
> +}

I think you can do all of this using a kref.

> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
> +{
> +	struct host1x_channel *ch = NULL;
> +	struct host1x *host1x = host1x_get_host(pdev);
> +	int chindex;
> +	int max_channels = host1x->info.nb_channels;
> +	int err;
> +
> +	mutex_lock(&host1x->chlist_mutex);
> +
> +	chindex = host1x->allocated_channels;
> +	if (chindex > max_channels)
> +		goto fail;
> +
> +	ch = kzalloc(sizeof(*ch), GFP_KERNEL);
> +	if (ch == NULL)
> +		goto fail;
> +
> +	/* Link platform_device to host1x_channel */
> +	err = host1x->channel_op.init(ch, host1x, chindex);
> +	if (err < 0)
> +		goto fail;
> +
> +	ch->dev = pdev;
> +
> +	/* Add to channel list */
> +	list_add_tail(&ch->list, &host1x->chlist.list);
> +
> +	host1x->allocated_channels++;
> +
> +	mutex_unlock(&host1x->chlist_mutex);
> +	return ch;
> +
> +fail:
> +	dev_err(&pdev->dev, "failed to init channel\n");
> +	kfree(ch);
> +	mutex_unlock(&host1x->chlist_mutex);
> +	return NULL;
> +}

I think the critical section could be shorter here. It's probably not
worth the extra trouble, though, given that channels are not often
allocated.

> +void host1x_channel_free(struct host1x_channel *ch)
> +{
> +	struct host1x *host1x = host1x_get_host(ch->dev);
> +	struct host1x_channel *chiter, *tmp;
> +	list_for_each_entry_safe(chiter, tmp, &host1x->chlist.list, list) {
> +		if (chiter == ch) {
> +			list_del(&chiter->list);
> +			kfree(ch);
> +			host1x->allocated_channels--;
> +
> +			return;
> +		}
> +	}
> +}

This doesn't free the channel if it happens to not be part of the host1x
channel list. Perhaps an easier way to write it would be:

	host1x = host1x_get_host(ch->dev);

	list_del(&ch->list);
	kfree(ch);

	host1x->allocated_channels--;

Looking at the rest of the code, it seems like a channel will never not
be part of the host1x channel list, so I don't think there's a need to
to scan the list.

On a side-note: generally if you break out of the loop right after
freeing the memory of a removed node, there's no need to use the _safe
variant since you won't be accessing the .next field of the freed node
anyway.

Maybe these should also adopt a similar naming as what we discussed for
the syncpoints. That is:

	struct host1x_channel *host1x_channel_request(struct device *dev);

?

> diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
[...]
> +
> +/*
> + * host1x device list in debug-fs dump of host1x and client device
> + * as well as channel state
> + */

I don't understand this comment.

> +struct host1x_channel {
> +	struct list_head list;
> +
> +	int refcount;
> +	int chid;

This can probably just be id. It is a field of host1x_channel, so the ch
prefix is redundant.

> +	struct mutex reflock;
> +	struct mutex submitlock;
> +	void __iomem *regs;
> +	struct device *node;

This is never used.

> +	struct platform_device *dev;

Can this be just struct device *?

> +	struct cdev cdev;

This is never used.

> +/* channel list operations */
> +void host1x_channel_list_init(struct host1x *);
> +void host1x_channel_for_all(struct host1x *, void *data,
> +	int (*fptr)(struct host1x_channel *ch, void *fdata));
> +
> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev);
> +void host1x_channel_free(struct host1x_channel *ch);

Is it a good idea to make host1x_channel_free() publicly available?
Shouldn't the host1x_channel_alloc()/host1x_channel_request() return a
host1x_channel with a reference count of 1 and everybody release their
reference using host1x_channel_put() to make sure the channel is freed
only after the last reference disappears?

Otherwise whoever calls host1x_channel_free() will confuse everybody
else that's still keeping a reference.

> diff --git a/drivers/gpu/host1x/cma.c b/drivers/gpu/host1x/cma.c
[...]

Various spurious blank lines in this file, and the alignment of function
parameters is off.

> +struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev)

I don't think this needs platform_device either.

> +{
> +	struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
> +	struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
> +
> +	mutex_lock(struct_mutex);
> +	drm_gem_object_reference(&obj->base);
> +	mutex_unlock(struct_mutex);

I think it's more customary to obtain a pointer to struct drm_device and
then use mutex_{lock,unlock}(&drm->struct_mutex). Or you could just use
drm_gem_object_reference_unlocked(&obj->base) instead. Which doesn't
exist yet, apparently. But it could be added.

> +int host1x_cma_pin_array_ids(struct platform_device *dev,
> +		long unsigned *ids,
> +		long unsigned id_type_mask,
> +		long unsigned id_type,
> +		u32 count,
> +		struct host1x_job_unpin_data *unpin_data,
> +		dma_addr_t *phys_addr)

struct device * and unsigned long please. count can also doesn't need to
be a sized type. unsigned int will do just fine. The return value can
also be unsigned int if you don't expect to return any error conditions.

> +{
> +	int i;
> +	int pin_count = 0;

Both should be unsigned as well, and can go on one line:

	unsigned int pin_count = 0, i;

> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
>  struct host1x;
> +struct host1x_intr;
>  struct host1x_syncpt;
> +struct host1x_channel;
> +struct host1x_cdma;
> +struct host1x_job;
> +struct push_buffer;
> +struct dentry;

I think this already belongs in a previous patch. The debugfs dentry
isn't added in this patch.

> +struct host1x_channel_ops {
> +	int (*init)(struct host1x_channel *,
> +		    struct host1x *,
> +		    int chid);

Please add the parameter names as well (the same goes for all ops
declared in this file). And "id" will be enough. Also the channel ID can
surely be unsigned, right?

> +struct host1x_cdma_ops {
> +	void (*start)(struct host1x_cdma *);
> +	void (*stop)(struct host1x_cdma *);
> +	void (*kick)(struct  host1x_cdma *);
> +	int (*timeout_init)(struct host1x_cdma *,
> +			    u32 syncpt_id);
> +	void (*timeout_destroy)(struct host1x_cdma *);
> +	void (*timeout_teardown_begin)(struct host1x_cdma *);
> +	void (*timeout_teardown_end)(struct host1x_cdma *,
> +				     u32 getptr);
> +	void (*timeout_cpu_incr)(struct host1x_cdma *,
> +				 u32 getptr,
> +				 u32 syncpt_incrs,
> +				 u32 syncval,
> +				 u32 nr_slots);
> +};

Can the timeout_ prefix not be dropped? The functions are generally
useful and not directly related to timeouts, even though they seem to
only be used during timeout handling.

Also, is it really necessary to abstract these into an ops structure? I
get that newer hardware revisions might require different ops for sync-
point handling because the register layout or number of syncpoints may
be different, but the CDMA and push buffer (below) concepts are pretty
much a software abstraction, and as such its implementation is unlikely
to change with some future hardware revision.

> +struct host1x_pushbuffer_ops {
> +	void (*reset)(struct push_buffer *);
> +	int (*init)(struct push_buffer *);
> +	void (*destroy)(struct push_buffer *);
> +	void (*push_to)(struct push_buffer *,
> +			struct mem_handle *,
> +			u32 op1, u32 op2);
> +	void (*pop_from)(struct push_buffer *,
> +			 unsigned int slots);

Maybe just push() and pop()?

> +	u32 (*space)(struct push_buffer *);
> +	u32 (*putptr)(struct push_buffer *);
> +};
>  
>  struct host1x_syncpt_ops {
>  	void (*reset)(struct host1x_syncpt *);
> @@ -64,9 +111,19 @@ struct host1x {
>  	struct host1x_device_info info;
>  	struct clk *clk;
>  
> +	/* Sync point dedicated to replacing waits for expired fences */
> +	struct host1x_syncpt *nop_sp;
> +
> +	struct host1x_channel_ops channel_op;
> +	struct host1x_cdma_ops cdma_op;
> +	struct host1x_pushbuffer_ops cdma_pb_op;
>  	struct host1x_syncpt_ops syncpt_op;
>  	struct host1x_intr_ops intr_op;
>  
> +	struct mutex chlist_mutex;
> +	struct host1x_channel chlist;

Shouldn't this just be struct list_head?

> +	int allocated_channels;

unsigned int? And maybe just "num_channels"?

> diff --git a/drivers/gpu/host1x/host1x.h b/drivers/gpu/host1x/host1x.h
[...]
> +enum host1x_class {
> +	NV_HOST1X_CLASS_ID		= 0x1,
> +	NV_GRAPHICS_2D_CLASS_ID		= 0x51,

This entry belongs in a later patch, right? And I find it convenient if
enumeration constants start with the enum name as prefix. Furthermore
it'd be nice to reuse the hardware module names, like so:

	enum host1x_class {
		HOST1X_CLASS_HOST1X,
		HOST1X_CLASS_GR2D,
		HOST1X_CLASS_GR3D,
	};

> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
[...]
> +#include <linux/slab.h>
> +#include <linux/scatterlist.h>
> +#include <linux/dma-mapping.h>
> +#include "cdma.h"
> +#include "channel.h"
> +#include "dev.h"
> +#include "memmgr.h"
> +
> +#include "cdma_hw.h"
> +
> +static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get)
> +{
> +	return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop)
> +		| HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst)
> +		| HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);

I think it is more customary to put the | at the end of the preceding
line:

	return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop) |
	       HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst) |
	       HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);

Also since these are all single bits, I'd prefer if you could drop the
_F suffix and not make them take a parameter. I think it'd even be
better not to have this function at all, but make the intent explicit
where the register is written. That is, have each call site set the bits
explicitly instead of calling this helper. Having a parameter list such
as (true, false, false) or (true, true, true) is confusing since you
have to keep looking up the meaning of the parameters.

> +}
> +
> +static void cdma_timeout_handler(struct work_struct *work);

Can this prototype be avoided?

> +/**
> + * Reset to empty push buffer
> + */
> +static void push_buffer_reset(struct push_buffer *pb)
> +{
> +	pb->fence = PUSH_BUFFER_SIZE - 8;
> +	pb->cur = 0;

Maybe position is a better name than cur.

> +/**
> + * Init push buffer resources
> + */
> +static void push_buffer_destroy(struct push_buffer *pb);

You should be careful with these comment blocks. If you start them with
/**, then you should make them proper kerneldoc comments. But you don't
really need that for static functions, so you could just make them /*-
style.

Also this particular comment is confusingly place on top of the proto-
type of push_buffer_destroy().

> +/*
> + * Push two words to the push buffer
> + * Caller must ensure push buffer is not full
> + */
> +static void push_buffer_push_to(struct push_buffer *pb,
> +		struct mem_handle *handle,
> +		u32 op1, u32 op2)
> +{
> +	u32 cur = pb->cur;
> +	u32 *p = (u32 *)((u32)pb->mapped + cur);

You do all this extra casting to make sure to increment by bytes and not
32-bit words. How about you change pb->cur to contain the word index, so
that you don't have to go through hoops each time around.

Alternatively you could make it a pointer to u32 and not have to index
or cast at all. So you'd end up with something like:

	struct push_buffer {
		u32 *start;
		u32 *end;
		u32 *ptr;
	};

> +/*
> + * Return the number of two word slots free in the push buffer
> + */
> +static u32 push_buffer_space(struct push_buffer *pb)
> +{
> +	return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
> +}

Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than
PUSH_BUFFER_SIZE, can it?

> +/*
> + * Init timeout resources
> + */
> +static int cdma_timeout_init(struct host1x_cdma *cdma,
> +				 u32 syncpt_id)
> +{
> +	if (syncpt_id == NVSYNCPT_INVALID)
> +		return -EINVAL;

Do we really need the syncpt_id check here? It is the only reason why we
need to pass the parameter in the first place, and if we get to this
point we should already have made sure that the syncpoint is actually
valid.

> +/*
> + * Increment timedout buffer's syncpt via CPU.

Nit: "timed out buffer's"

> + */
> +static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
> +				u32 syncpt_incrs, u32 syncval, u32 nr_slots)

The syncval parameter isn't used.

> +{
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +	struct push_buffer *pb = &cdma->push_buffer;
> +	u32 i, getidx;
> +
> +	for (i = 0; i < syncpt_incrs; i++)
> +		host1x_syncpt_cpu_incr(cdma->timeout.syncpt);
> +
> +	/* after CPU incr, ensure shadow is up to date */
> +	host1x_syncpt_load_min(cdma->timeout.syncpt);
> +
> +	/* NOP all the PB slots */
> +	getidx = getptr - pb->phys;
> +	while (nr_slots--) {
> +		u32 *p = (u32 *)((u32)pb->mapped + getidx);
> +		*(p++) = HOST1X_OPCODE_NOOP;
> +		*(p++) = HOST1X_OPCODE_NOOP;
> +		dev_dbg(&host1x->dev->dev, "%s: NOP at 0x%x\n",
> +			__func__, pb->phys + getidx);
> +		getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1);
> +	}
> +	wmb();

Why the memory barrier?

> +/*
> + * Similar to cdma_start(), but rather than starting from an idle
> + * state (where DMA GET is set to DMA PUT), on a timeout we restore
> + * DMA GET from an explicit value (so DMA may again be pending).
> + */
> +static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
> +{
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +	struct host1x_channel *ch = cdma_to_channel(cdma);
> +
> +	if (cdma->running)
> +		return;
> +
> +	cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
> +
> +	host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
> +		HOST1X_CHANNEL_DMACTRL);
> +
> +	/* set base, end pointer (all of memory) */
> +	host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART);
> +	host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND);

According to the TRM, writing to HOST1X_CHANNEL_DMASTART will start a
DMA transfer on the channel (if DMA_PUT != DMA_GET). Irrespective of
that, why set the valid range to all of physical memory? We know the
valid range of the push buffer, why not set the limits accordingly?

> +/*
> + * Kick channel DMA into action by writing its PUT offset (if it has changed)
> + */
> +static void cdma_kick(struct host1x_cdma *cdma)
> +{
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +	struct host1x_channel *ch = cdma_to_channel(cdma);
> +	u32 put;
> +
> +	put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
> +
> +	if (put != cdma->last_put) {
> +		host1x_ch_writel(ch, put, HOST1X_CHANNEL_DMAPUT);
> +		cdma->last_put = put;
> +	}
> +}

kick() sounds unusual. Maybe flush or commit or something similar would
be more accurate.

> +static void cdma_stop(struct host1x_cdma *cdma)
> +{
> +	struct host1x_channel *ch = cdma_to_channel(cdma);
> +
> +	mutex_lock(&cdma->lock);
> +	if (cdma->running) {
> +		host1x_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY);
> +		host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
> +			HOST1X_CHANNEL_DMACTRL);
> +		cdma->running = false;
> +	}
> +	mutex_unlock(&cdma->lock);
> +}

Perhaps this should be ranem cdma_stop_sync() or similar to make it
clear that it waits for the queue to run empty.

> +static void cdma_timeout_teardown_end(struct host1x_cdma *cdma, u32 getptr)

Maybe the last parameter should be called restart to match its purpose?

> +{
> +	struct host1x *host1x = cdma_to_host1x(cdma);
> +	struct host1x_channel *ch = cdma_to_channel(cdma);
> +	u32 cmdproc_stop;
> +
> +	dev_dbg(&host1x->dev->dev,
> +		"end channel teardown (id %d, DMAGET restart = 0x%x)\n",
> +		ch->chid, getptr);
> +
> +	cmdproc_stop = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
> +	cmdproc_stop &= ~(BIT(ch->chid));

No need for the extra parentheses.

> +	host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
> +
> +	cdma->torndown = false;
> +	cdma_timeout_restart(cdma, getptr);
> +}

I find this a bit non-intuitive. We teardown a channel, and when we're
done tearing down, the torndown variable is set to false and the channel
is actually restarted. Maybe you could explain some more how this works
and what its purpose is.

> +/*
> + * If this timeout fires, it indicates the current sync_queue entry has
> + * exceeded its TTL and the userctx should be timed out and remaining
> + * submits already issued cleaned up (future submits return an error).
> + */

I can't seem to find what causes subsequent submits to return an error.
Also, how is the channel reset so that new jobs can be submitted?

> +static void cdma_timeout_handler(struct work_struct *work)
> +{
> +	struct host1x_cdma *cdma;
> +	struct host1x *host1x;
> +	struct host1x_channel *ch;
> +
> +	u32 syncpt_val;
> +
> +	u32 prev_cmdproc, cmdproc_stop;
> +
> +	cdma = container_of(to_delayed_work(work), struct host1x_cdma,
> +			    timeout.wq);
> +	host1x = cdma_to_host1x(cdma);
> +	ch = cdma_to_channel(cdma);
> +
> +	mutex_lock(&cdma->lock);
> +
> +	if (!cdma->timeout.clientid) {
> +		dev_dbg(&host1x->dev->dev,
> +			 "cdma_timeout: expired, but has no clientid\n");
> +		mutex_unlock(&cdma->lock);
> +		return;
> +	}

How can the CDMA not have a client?

> +
> +	/* stop processing to get a clean snapshot */
> +	prev_cmdproc = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
> +	cmdproc_stop = prev_cmdproc | BIT(ch->chid);
> +	host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
> +
> +	dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
> +		prev_cmdproc, cmdproc_stop);
> +
> +	syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
> +
> +	/* has buffer actually completed? */
> +	if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
> +		dev_dbg(&host1x->dev->dev,
> +			 "cdma_timeout: expired, but buffer had completed\n");

Maybe this should really be a warning?

> +		/* restore */
> +		cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid));

No need for the extra parentheses. Also, why not just use prev_cmdproc,
which shouldn't have the bit set anyway?

> diff --git a/drivers/gpu/host1x/hw/cdma_hw.h b/drivers/gpu/host1x/hw/cdma_hw.h
[...]
> +/*
> + * Size of the sync queue. If it is too small, we won't be able to queue up
> + * many command buffers. If it is too large, we waste memory.
> + */
> +#define HOST1X_SYNC_QUEUE_SIZE 512

I don't see this used anywhere.

> +/*
> + * Number of gathers we allow to be queued up per channel. Must be a
> + * power of two. Currently sized such that pushbuffer is 4KB (512*8B).
> + */
> +#define HOST1X_GATHER_QUEUE_SIZE 512

More pieces falling into place.

> diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
[...]
> +#include "host1x.h"
> +#include "channel.h"
> +#include "dev.h"
> +#include <linux/slab.h>
> +#include "intr.h"
> +#include "job.h"
> +#include <trace/events/host1x.h>

More include ordering issues.

> +static void submit_gathers(struct host1x_job *job)
> +{
> +	/* push user gathers */
> +	int i;

unsigned int?

> +	for (i = 0 ; i < job->num_gathers; i++) {
> +		struct host1x_job_gather *g = &job->gathers[i];
> +		u32 op1 = host1x_opcode_gather(g->words);
> +		u32 op2 = g->mem_base + g->offset;
> +		host1x_cdma_push_gather(&job->ch->cdma,
> +				job->gathers[i].ref,
> +				job->gathers[i].offset,
> +				op1, op2);
> +	}
> +}

Perhaps inline this into channel_submit()? I'm not sure how useful it
really is to split off smallish functions such as this which aren't
reused anywhere else. I don't have any major objection though, so you
can keep it separate if you want.

> +static inline void __iomem *host1x_channel_regs(void __iomem *p, int ndx)
> +{
> +	p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES;
> +	return p;
> +}
> +
> +static int host1x_channel_init(struct host1x_channel *ch,
> +	struct host1x *dev, int index)
> +{
> +	ch->chid = index;
> +	mutex_init(&ch->reflock);
> +	mutex_init(&ch->submitlock);
> +
> +	ch->regs = host1x_channel_regs(dev->regs, index);
> +	return 0;
> +}

You only use host1x_channel_regs() once, so I really don't think it buys
you anything to split it off. Both host1x_channel_regs() and
host1x_channel_init() are short enough that they can be collapsed.

> diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
[...]
>  #include "hw/host1x01.h"
>  #include "dev.h"
> +#include "channel.h"
>  #include "hw/host1x01_hardware.h"
>  
> +#include "hw/channel_hw.c"
> +#include "hw/cdma_hw.c"
>  #include "hw/syncpt_hw.c"
>  #include "hw/intr_hw.c"
>  
>  int host1x01_init(struct host1x *host)
>  {
> +	host->channel_op = host1x_channel_ops;
> +	host->cdma_op = host1x_cdma_ops;
> +	host->cdma_pb_op = host1x_pushbuffer_ops;
>  	host->syncpt_op = host1x_syncpt_ops;
>  	host->intr_op = host1x_intr_ops;

I think I mentioned this before, but I'd prefer not to have the .c files
included here, but rather reference the ops structures externally. But I
still think that especially CDMA and push buffer ops don't need to be in
separate structures since they aren't likely to change with new hardware
revisions.

> diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
[...]
> index c1d5324..03873c0 100644
> --- a/drivers/gpu/host1x/hw/host1x01_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
> @@ -21,6 +21,130 @@
>  
>  #include <linux/types.h>
>  #include <linux/bitops.h>
> +#include "hw_host1x01_channel.h"
>  #include "hw_host1x01_sync.h"
> +#include "hw_host1x01_uclass.h"
> +
> +/* channel registers */
> +#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384

The only user of this seems to be host1x_channel_regs(), so it could be
moved to that file. Also the name is overly long, why not something like
HOST1X_CHANNEL_SIZE?

> +#define HOST1X_OPCODE_NOOP host1x_opcode_nonincr(0, 0)

HOST1X_OPCODE_NOP would be more canonical in my opinion.


> +static inline u32 host1x_mask2(unsigned x, unsigned y)
> +{
> +	return 1 | (1 << (y - x));
> +}

What's this? I don't see it used anywhere.

> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
[...]
> +#define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \
> +	host1x_channel_dmactrl_dmastop_f(v)

I mentioned this elsewhere already, but I think the _F suffix (and _f
for that matter) along with the v parameter should go away.

> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
[...]

What does the "uclass" stand for? It seems a bit useless to me.

> diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
> index 16e3ada..ba48cee 100644
> --- a/drivers/gpu/host1x/hw/syncpt_hw.c
> +++ b/drivers/gpu/host1x/hw/syncpt_hw.c
> @@ -97,6 +97,15 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp)
>  	wmb();
>  }
>  
> +/* remove a wait pointed to by patch_addr */
> +static int syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr)
> +{
> +	u32 override = host1x_class_host_wait_syncpt(
> +			NVSYNCPT_GRAPHICS_HOST, 0);
> +	__raw_writel(override, patch_addr);

__raw_writel() isn't meant to be used for regular memory addresses, but
only for MMIO addresses. patch_addr will be a kernel virtual address to
an location in RAM, so you can just treat it as a normal pointer, so:

	*(u32 *)patch_addr = override;

A small optimization might be to make override a static const, so that
it doesn't have to be composed every time.

> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
[...]
> +static void action_submit_complete(struct host1x_waitlist *waiter)
> +{
> +	struct host1x_channel *channel = waiter->data;
> +	int nr_completed = waiter->count;

No need for this variable.

> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
[...]
> +#ifdef CONFIG_TEGRA_HOST1X_FIREWALL
> +static int host1x_firewall = 1;
> +#else
> +static int host1x_firewall;
> +#endif

You could use IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) in the code,
which will have the nice side-effect of compiling code out if the symbol
isn't selected.

> +struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
> +		u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)

Maybe make the parameters unsigned int instead of u32?

> +{
> +	struct host1x_job *job = NULL;
> +	int num_unpins = num_cmdbufs + num_relocs;

unsigned int?

> +	s64 total;

This doesn't need to be signed, u64 will be good enough. None of the
terms in the expression that assigns to total can be negative.

> +	void *mem;
> +
> +	/* Check that we're not going to overflow */
> +	total = sizeof(struct host1x_job)
> +			+ num_relocs * sizeof(struct host1x_reloc)
> +			+ num_unpins * sizeof(struct host1x_job_unpin_data)
> +			+ num_waitchks * sizeof(struct host1x_waitchk)
> +			+ num_cmdbufs * sizeof(struct host1x_job_gather)
> +			+ num_unpins * sizeof(dma_addr_t)
> +			+ num_unpins * sizeof(u32 *);

"+"s at the end of the preceding lines.

> +	if (total > ULONG_MAX)
> +		return NULL;
> +
> +	mem = job = kzalloc(total, GFP_KERNEL);
> +	if (!job)
> +		return NULL;
> +
> +	kref_init(&job->ref);
> +	job->ch = ch;
> +
> +	/* First init state to zero */
> +
> +	/*
> +	 * Redistribute memory to the structs.
> +	 * Overflows and negative conditions have
> +	 * already been checked in job_alloc().
> +	 */

The last two lines don't really apply here. The checks are in this same
function and they check only for overflow, not negative conditions,
which can't happen anyway since the counts are all unsigned.

> +void host1x_job_get(struct host1x_job *job)
> +{
> +	kref_get(&job->ref);
> +}

I think it is common for *_get() functions to return a pointer to the
referenced object.

> +void host1x_job_add_gather(struct host1x_job *job,
> +		u32 mem_id, u32 words, u32 offset)
> +{
> +	struct host1x_job_gather *cur_gather =
> +			&job->gathers[job->num_gathers];

Should this check for overflow?

> +/*
> + * Check driver supplied waitchk structs for syncpt thresholds
> + * that have already been satisfied and NULL the comparison (to
> + * avoid a wrap condition in the HW).
> + */
> +static int do_waitchks(struct host1x_job *job, struct host1x *host,
> +		u32 patch_mem, struct mem_handle *h)
> +{
> +	int i;
> +
> +	/* compare syncpt vs wait threshold */
> +	for (i = 0; i < job->num_waitchk; i++) {
> +		struct host1x_waitchk *wait = &job->waitchk[i];
> +		struct host1x_syncpt *sp =
> +			host1x_syncpt_get(host, wait->syncpt_id);
> +
> +		/* validate syncpt id */
> +		if (wait->syncpt_id > host1x_syncpt_nb_pts(host))
> +			continue;
> +
> +		/* skip all other gathers */
> +		if (patch_mem != wait->mem)
> +			continue;
> +
> +		trace_host1x_syncpt_wait_check(wait->mem, wait->offset,
> +				wait->syncpt_id, wait->thresh,
> +				host1x_syncpt_read_min(sp));
> +		if (host1x_syncpt_is_expired(
> +			host1x_syncpt_get(host, wait->syncpt_id),
> +			wait->thresh)) {

You already have the sp variable that you could use here to make it more
readable.

> +			struct host1x_syncpt *sp =
> +				host1x_syncpt_get(host, wait->syncpt_id);

And you don't need this then, since you already have sp pointing to the
same syncpoint.

> +			void *patch_addr = NULL;
> +
> +			/*
> +			 * NULL an already satisfied WAIT_SYNCPT host method,
> +			 * by patching its args in the command stream. The
> +			 * method data is changed to reference a reserved
> +			 * (never given out or incr) NVSYNCPT_GRAPHICS_HOST
> +			 * syncpt with a matching threshold value of 0, so
> +			 * is guaranteed to be popped by the host HW.
> +			 */
> +			dev_dbg(&host->dev->dev,
> +			    "drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n",
> +			    wait->syncpt_id, sp->name, wait->thresh,
> +			    host1x_syncpt_read_min(sp));
> +
> +			/* patch the wait */
> +			patch_addr = host1x_memmgr_kmap(h,
> +					wait->offset >> PAGE_SHIFT);
> +			if (patch_addr) {
> +				host1x_syncpt_patch_wait(sp,
> +					(patch_addr +
> +						(wait->offset & ~PAGE_MASK)));
> +				host1x_memmgr_kunmap(h,
> +						wait->offset >> PAGE_SHIFT,
> +						patch_addr);
> +			} else {
> +				pr_err("Couldn't map cmdbuf for wait check\n");
> +			}

This is a case where splitting out a small function would actually be
useful to make the code more readable since you can remove two levels of
indentation. You can just pass in the handle and the offset, let it do
the actual patching. Maybe

	host1x_syncpt_patch_offset(sp, h, wait->offset);

?

> +		}
> +
> +		wait->mem = 0;
> +	}
> +	return 0;
> +}
> +
> +

There's a gratuitous blank line.

> +static int pin_job_mem(struct host1x_job *job)
> +{
> +	int i;
> +	int count = 0;
> +	int result;

These (and the return value) can all be unsigned int.

> +static int do_relocs(struct host1x_job *job,
> +		u32 cmdbuf_mem, struct mem_handle *h)
> +{
> +	int i = 0;

This can also be unsigned int.

> +	int last_page = -1;

And this should match the type of cmdbuf_offset (u32). You can initially
set it to something like ~0 to make sure it doesn't match any valid
offset.

> +	void *cmdbuf_page_addr = NULL;
> +
> +	/* pin & patch the relocs for one gather */
> +	while (i < job->num_relocs) {
> +		struct host1x_reloc *reloc = &job->relocarray[i];
> +
> +		/* skip all other gathers */
> +		if (cmdbuf_mem != reloc->cmdbuf_mem) {
> +			i++;
> +			continue;
> +		}
> +
> +		if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
> +			if (cmdbuf_page_addr)
> +				host1x_memmgr_kunmap(h,
> +						last_page, cmdbuf_page_addr);
> +
> +			cmdbuf_page_addr = host1x_memmgr_kmap(h,
> +					reloc->cmdbuf_offset >> PAGE_SHIFT);
> +			last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
> +
> +			if (unlikely(!cmdbuf_page_addr)) {
> +				pr_err("Couldn't map cmdbuf for relocation\n");
> +				return -ENOMEM;
> +			}
> +		}
> +
> +		__raw_writel(
> +			(job->reloc_addr_phys[i] +
> +				reloc->target_offset) >> reloc->shift,
> +			(cmdbuf_page_addr +
> +				(reloc->cmdbuf_offset & ~PAGE_MASK)));

Again, wrong __raw_writel() usage.

> +
> +		/* remove completed reloc from the job */
> +		if (i != job->num_relocs - 1) {
> +			struct host1x_reloc *reloc_last =
> +				&job->relocarray[job->num_relocs - 1];
> +			reloc->cmdbuf_mem	= reloc_last->cmdbuf_mem;
> +			reloc->cmdbuf_offset	= reloc_last->cmdbuf_offset;
> +			reloc->target		= reloc_last->target;
> +			reloc->target_offset	= reloc_last->target_offset;
> +			reloc->shift		= reloc_last->shift;
> +			job->reloc_addr_phys[i] =
> +				job->reloc_addr_phys[job->num_relocs - 1];
> +			job->num_relocs--;
> +		} else {
> +			break;
> +		}
> +	}
> +
> +	if (cmdbuf_page_addr)
> +		host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
> +
> +	return 0;
> +}

Also the algorithm seems a bit strange and hard to follow. Instead of
removing relocs from the job, replacing them with the last entry and
decrementing job->num_relocs, how much is the penalty for always
iterating over all relocs? This is one of the other cases where I'd
argue that simplicity is key. Furthermore you need to copy quite a bit
of data to replace the completed relocs, so I'm not sure it buys you
much.

It could always be optimized later on by just setting a bit in the reloc
to mark it as completed, or keep a bitmask of completed relocations or
whatever.

> +static int check_reloc(struct host1x_reloc *reloc,
> +		u32 cmdbuf_id, int offset)

offset can be unsigned int.

> +{
> +	int err = 0;
> +	if (reloc->cmdbuf_mem != cmdbuf_id
> +			|| reloc->cmdbuf_offset != offset * sizeof(u32))
> +		err = -EINVAL;
> +
> +	return err;
> +}

More canonically:

	offset *= sizeof(u32);

	if (reloc->cmdbuf_mem != cmdbuf_id || reloc->cmdbuf_offset != offset)
		return -EINVAL;

	return 0;

> +
> +static int check_mask(struct host1x_job *job,
> +		struct platform_device *pdev,
> +		struct host1x_reloc **reloc, int *num_relocs,
> +		u32 cmdbuf_id, int *offset,
> +		u32 *words, u32 class, u32 reg, u32 mask)

num_relocs and offset can be unsigned int *.

Same comment for the other check_*() functions. That said I think the
code would become a lot more readable if you were to wrap all of these
parameters into a structure, say host1x_firewall, and just pass that
into the functions.

> +static inline int copy_gathers(struct host1x_job *job,
> +		struct platform_device *pdev)

struct device *

> +{
> +	size_t size = 0;
> +	size_t offset = 0;
> +	int i;
> +
> +	for (i = 0; i < job->num_gathers; i++) {
> +		struct host1x_job_gather *g = &job->gathers[i];
> +		size += g->words * sizeof(u32);
> +	}
> +
> +	job->gather_copy_mapped = dma_alloc_writecombine(&pdev->dev,
> +			size, &job->gather_copy, GFP_KERNEL);
> +	if (IS_ERR(job->gather_copy_mapped)) {

dma_alloc_writecombine() returns NULL on failure, so this check is
wrong.

> +		int err = PTR_ERR(job->gather_copy_mapped);
> +		job->gather_copy_mapped = NULL;
> +		return err;
> +	}
> +
> +	job->gather_copy_size = size;
> +
> +	for (i = 0; i < job->num_gathers; i++) {
> +		struct host1x_job_gather *g = &job->gathers[i];
> +		void *gather = host1x_memmgr_mmap(g->ref);
> +		memcpy(job->gather_copy_mapped + offset,
> +				gather + g->offset,
> +				g->words * sizeof(u32));
> +
> +		g->mem_base = job->gather_copy;
> +		g->offset = offset;
> +		g->mem_id = 0;
> +		g->ref = 0;
> +
> +		host1x_memmgr_munmap(g->ref, gather);
> +		offset += g->words * sizeof(u32);
> +	}
> +
> +	return 0;
> +}

I wonder, where's this DMA buffer actually used? I can't find any use
between this copy and the corresponding dma_free_writecombine() call.

> +int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev)
> +{
> +	int err = 0, i = 0, j = 0;

No need to initialize these here. i and j can also be unsigned.

> +	struct host1x *host = host1x_get_host(pdev);
> +	DECLARE_BITMAP(waitchk_mask, host1x_syncpt_nb_pts(host));
> +
> +	bitmap_zero(waitchk_mask, host1x_syncpt_nb_pts(host));
> +	for (i = 0; i < job->num_waitchk; i++) {
> +		u32 syncpt_id = job->waitchk[i].syncpt_id;
> +		if (syncpt_id < host1x_syncpt_nb_pts(host))
> +			set_bit(syncpt_id, waitchk_mask);
> +	}
> +
> +	/* get current syncpt values for waitchk */
> +	for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
> +		host1x_syncpt_load_min(host->syncpt + i);
> +
> +	/* pin memory */
> +	err = pin_job_mem(job);
> +	if (err <= 0)
> +		goto out;

pin_job_mem() never returns negative.

> +	/* patch gathers */
> +	for (i = 0; i < job->num_gathers; i++) {
> +		struct host1x_job_gather *g = &job->gathers[i];
> +
> +		/* process each gather mem only once */
> +		if (!g->ref) {
> +			g->ref = host1x_memmgr_get(g->mem_id, job->ch->dev);
> +			if (IS_ERR(g->ref)) {

host1x_memmgr_get() also seems to return NULL on error.

> +				err = PTR_ERR(g->ref);
> +				g->ref = NULL;
> +				break;
> +			}
> +
> +			g->mem_base = job->gather_addr_phys[i];
> +
> +			for (j = 0; j < job->num_gathers; j++) {
> +				struct host1x_job_gather *tmp =
> +					&job->gathers[j];
> +				if (!tmp->ref && tmp->mem_id == g->mem_id) {
> +					tmp->ref = g->ref;
> +					tmp->mem_base = g->mem_base;
> +				}
> +			}
> +			err = 0;
> +			if (host1x_firewall)

if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))

> +				err = validate(job, pdev, g);
> +			if (err)
> +				dev_err(&pdev->dev,
> +					"Job validate returned %d\n", err);
> +			if (!err)
> +				err = do_relocs(job, g->mem_id,  g->ref);
> +			if (!err)
> +				err = do_waitchks(job, host,
> +						g->mem_id, g->ref);
> +			host1x_memmgr_put(g->ref);
> +			if (err)
> +				break;
> +		}
> +	}
> +
> +	if (host1x_firewall && !err) {

And here.

> +/*
> + * Debug routine used to dump job entries
> + */
> +void host1x_job_dump(struct device *dev, struct host1x_job *job)
> +{
> +	dev_dbg(dev, "    SYNCPT_ID   %d\n",
> +		job->syncpt_id);
> +	dev_dbg(dev, "    SYNCPT_VAL  %d\n",
> +		job->syncpt_end);
> +	dev_dbg(dev, "    FIRST_GET   0x%x\n",
> +		job->first_get);
> +	dev_dbg(dev, "    TIMEOUT     %d\n",
> +		job->timeout);
> +	dev_dbg(dev, "    NUM_SLOTS   %d\n",
> +		job->num_slots);
> +	dev_dbg(dev, "    NUM_HANDLES %d\n",
> +		job->num_unpins);
> +}

These don't need to be wrapped.

> diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
[...]
> +struct host1x_job_gather {
> +	u32 words;
> +	dma_addr_t mem_base;
> +	u32 mem_id;
> +	int offset;
> +	struct mem_handle *ref;
> +};
> +
> +struct host1x_cmdbuf {
> +	__u32 mem;
> +	__u32 offset;
> +	__u32 words;
> +	__u32 pad;
> +};
> +
> +struct host1x_reloc {
> +	__u32 cmdbuf_mem;
> +	__u32 cmdbuf_offset;
> +	__u32 target;
> +	__u32 target_offset;
> +	__u32 shift;
> +	__u32 pad;
> +};
> +
> +struct host1x_waitchk {
> +	__u32 mem;
> +	__u32 offset;
> +	__u32 syncpt_id;
> +	__u32 thresh;
> +};

None of these are shared with userspace, so they shouldn't take the
__u32 types, but the regular u32 ones.

> +/*
> + * Each submit is tracked as a host1x_job.
> + */
> +struct host1x_job {
> +	/* When refcount goes to zero, job can be freed */
> +	struct kref ref;
> +
> +	/* List entry */
> +	struct list_head list;
> +
> +	/* Channel where job is submitted to */
> +	struct host1x_channel *ch;

Maybe write it out as "channel"?

> +
> +	int clientid;

Subsequent patches assign u32 to this field, so maybe the type should be
changed here. And maybe leave out the id suffix. It doesn't really add
any information.

> +	/* Gathers and their memory */
> +	struct host1x_job_gather *gathers;
> +	int num_gathers;

unsigned int

> +	/* Wait checks to be processed at submit time */
> +	struct host1x_waitchk *waitchk;
> +	int num_waitchk;

unsigned int

> +	u32 waitchk_mask;

This might need to be changed to a bitfield once future Tegra versions
start supporting more than 32 syncpoints.

> +	/* Array of handles to be pinned & unpinned */
> +	struct host1x_reloc *relocarray;
> +	int num_relocs;

unsigned int

> +	struct host1x_job_unpin_data *unpins;
> +	int num_unpins;

unsigned int

> +	dma_addr_t *addr_phys;
> +	dma_addr_t *gather_addr_phys;
> +	dma_addr_t *reloc_addr_phys;
> +
> +	/* Sync point id, number of increments and end related to the submit */
> +	u32 syncpt_id;
> +	u32 syncpt_incrs;
> +	u32 syncpt_end;
> +
> +	/* Maximum time to wait for this job */
> +	int timeout;

unsigned int. I think we discussed this already in a slightly different
context in patch 2.

> +	/* Null kickoff prevents submit from being sent to hardware */
> +	bool null_kickoff;

I don't think this is used anywhere.

> +	/* Index and number of slots used in the push buffer */
> +	int first_get;
> +	int num_slots;

unsigned int

> +
> +	/* Copy of gathers */
> +	size_t gather_copy_size;
> +	dma_addr_t gather_copy;
> +	u8 *gather_copy_mapped;

Are these really needed? They don't seem to be used anywhere except to
store a copy and free that copy sometime later.

> +
> +	/* Temporary space for unpin ids */
> +	long unsigned int *pin_ids;

unsigned long

> +	/* Check if register is marked as an address reg */
> +	int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);

is_addr_reg() sounds a bit unusual. Maybe match this to the name of the
main firewall routine, validate()?

> +	/* Request a SETCLASS to this class */
> +	u32 class;
> +
> +	/* Add a channel wait for previous ops to complete */
> +	u32 serialize;

This is used in code as a boolean. Why does it need to be 32 bits?

> diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
[...]
> +struct mem_handle;
> +struct platform_device;
> +
> +struct host1x_job_unpin_data {
> +	struct mem_handle *h;
> +	struct sg_table *mem;
> +};
> +
> +enum mem_mgr_flag {
> +	mem_mgr_flag_uncacheable = 0,
> +	mem_mgr_flag_write_combine = 1,
> +};

I'd like to see this use a more object-oriented approach and more common
terminology. All of these handles are essentially buffer objects, so
maybe something like host1x_bo would be a nice and short name.

To make this more object-oriented, I propose something like:

	struct host1x_bo_ops {
		int (*alloc)(struct host1x_bo *bo, size_t size, unsigned long align,
			     unsigned long flags);
		int (*free)(struct host1x_bo *bo);
		...
	};

	struct host1x_bo {
		const struct host1x_bo_ops *ops;
	};

	struct host1x_cma_bo {
		struct host1x_bo base;
		struct drm_gem_cma_object *obj;
	};

	static inline struct host1x_cma_bo *to_host1x_cma_bo(struct host1x_bo *bo)
	{
		return container_of(bo, struct host1x_cma_bo, base);
	}

	static inline int host1x_bo_alloc(struct host1x_bo *bo, size_t size,
					  unsigned long align, unsigned long flags)
	{
		return bo->ops->alloc(bo, size, align, flags);
	}

	...

That should be easy to extend with a new type of BO once the IOMMU-based
allocator is ready. And as I said it is much closer in terminology to
what other drivers do.

> diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
> index b46d044..255a3a3 100644
> --- a/drivers/gpu/host1x/syncpt.h
> +++ b/drivers/gpu/host1x/syncpt.h
> @@ -26,6 +26,7 @@
>  struct host1x;
>  
>  #define NVSYNCPT_INVALID			(-1)
> +#define NVSYNCPT_GRAPHICS_HOST			0

I think these should match other naming, so:

	#define HOST1X_SYNCPT_INVALID	-1
	#define HOST1X_SYNCPT_HOST1X	 0

There are a few more occurrences where platform_device is used but I
haven't commented on them. I don't think any of them won't work with
just a struct device instead. Also I may not have caught all of the
places where you should rather be using unsigned int instead of int,
so you might want to look out for some of those.

Generally I very much like where this is going. Are there any plans to
move the userspace binary driver to this interface at some point so we
can more actively test it? Also, is anything else blocking adding a
gr3d device similar to gr2d from this patch series?

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-02-25 15:24   ` Thierry Reding
@ 2013-02-26  9:48     ` Terje Bergström
  2013-02-27  8:56       ` Thierry Reding
  2013-03-08 16:16       ` Terje Bergström
  0 siblings, 2 replies; 49+ messages in thread
From: Terje Bergström @ 2013-02-26  9:48 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 25.02.2013 17:24, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:43:59PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
>> index e89fb2b..57680a6 100644
>> --- a/drivers/gpu/host1x/Kconfig
>> +++ b/drivers/gpu/host1x/Kconfig
>> @@ -3,4 +3,27 @@ config TEGRA_HOST1X
>>       help
>>         Driver for the Tegra host1x hardware.
>>
>> -       Required for enabling tegradrm.
>> +       Required for enabling tegradrm and 2D acceleration.
> 
> I don't think I commented on this in the other patches, but I think this
> could use a bit more information about what host1x is. Also mentioning
> that it is a requirement for tegra-drm and 2D acceleration isn't very
> useful because it can equally well be expressed in Kconfig. If you add
> some description about what host1x is, people will know that they want
> to enable it.

Ok, we'll rewrite that. I think we can reuse the text from commit msg
that I stole from Stephen's appnote.

> 
>> +if TEGRA_HOST1X
>> +
>> +config TEGRA_HOST1X_CMA
>> +     bool "Support DRM CMA buffers"
>> +     depends on DRM
>> +     default y
>> +     select DRM_GEM_CMA_HELPER
>> +     select DRM_KMS_CMA_HELPER
>> +     help
>> +       Say yes if you wish to use DRM CMA buffers.
>> +
>> +       If unsure, choose Y.
> 
> Perhaps make this not user-selectable (for now)? If somebody disables
> this explicitly they won't get a working driver, right?

True, there's no alternative, so it should not be user selectable.

> 
>> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
> [...]
>> +#include "cdma.h"
>> +#include "channel.h"
>> +#include "dev.h"
>> +#include "memmgr.h"
>> +#include "job.h"
>> +#include <asm/cacheflush.h>
>> +
>> +#include <linux/slab.h>
>> +#include <linux/kfifo.h>
>> +#include <linux/interrupt.h>
>> +#include <trace/events/host1x.h>
>> +
>> +#define TRACE_MAX_LENGTH 128U
> 
> "" includes generally follow <> ones.

Will do.

> 
>> +/*
>> + * Add an entry to the sync queue.
>> + */
>> +static void add_to_sync_queue(struct host1x_cdma *cdma,
>> +                           struct host1x_job *job,
>> +                           u32 nr_slots,
>> +                           u32 first_get)
>> +{
>> +     if (job->syncpt_id == NVSYNCPT_INVALID) {
>> +             dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n",
>> +                             __func__);
>> +             return;
>> +     }
>> +
>> +     job->first_get = first_get;
>> +     job->num_slots = nr_slots;
>> +     host1x_job_get(job);
>> +     list_add_tail(&job->list, &cdma->sync_queue);
>> +}
> 
> It's a bit odd that you pass a job in here along with some parameters
> that are then assigned to the job's fields. Couldn't you just assign
> them to the job's fields before passing the job into this function?
> 
> I also see that you only use this function once, so maybe you could
> open-code it instead.

I think open coding would be the best choice. There's no real reason to
have this as separate function. That'd solve the odd parameters
phenomenon, too.

> 
>> +/*
>> + * Return the status of the cdma's sync queue or push buffer for the given event
>> + *  - sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-)
>> + *  - pb space: returns the number of free slots in the channel's push buffer
>> + * Must be called with the cdma lock held.
>> + */
>> +static unsigned int cdma_status_locked(struct host1x_cdma *cdma,
>> +             enum cdma_event event)
>> +{
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +     switch (event) {
>> +     case CDMA_EVENT_SYNC_QUEUE_EMPTY:
>> +             return list_empty(&cdma->sync_queue) ? 1 : 0;
>> +     case CDMA_EVENT_PUSH_BUFFER_SPACE: {
>> +             struct push_buffer *pb = &cdma->push_buffer;
>> +             return host1x->cdma_pb_op.space(pb);
>> +     }
>> +     default:
>> +             return 0;
>> +     }
>> +}
> 
> Similarly this function is only used in one place and it requires a
> whole lot of documentation to define the meaning of the return value. If
> you implement this functionality directly in host1x_cdma_wait_locked()
> you have much more context and don't require all this "protocol".

I agree, this function is confusing. For some future functionality, it's
going to be called from a second place with CDMA_EVENT_SYNC_QUEUE_EMPTY,
but it's better of both of those calls are just opened up to get rid of
the extra switch().

> 
>> +/*
>> + * Start timer for a buffer submition that has completed yet.
> 
> "submission". And I don't understand the "that has completed yet" part.

It should become "Start timer that tracks the time spent by the job".

> 
>> + * Must be called with the cdma lock held.
>> + */
>> +static void cdma_start_timer_locked(struct host1x_cdma *cdma,
>> +             struct host1x_job *job)
> 
> You use two different styles to indent the function parameters. You
> might want to stick to one, preferably aligning them with the first
> parameter on the first line.

I've generally favored "two tabs" indenting, but we'll anyway
standardize on one.

> 
>> +{
>> +     struct host1x *host = cdma_to_host1x(cdma);
>> +
>> +     if (cdma->timeout.clientid) {
>> +             /* timer already started */
>> +             return;
>> +     }
>> +
>> +     cdma->timeout.clientid = job->clientid;
>> +     cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
>> +     cdma->timeout.syncpt_val = job->syncpt_end;
>> +     cdma->timeout.start_ktime = ktime_get();
>> +
>> +     schedule_delayed_work(&cdma->timeout.wq,
>> +                     msecs_to_jiffies(job->timeout));
>> +}
>> +
>> +/*
>> + * Stop timer when a buffer submition completes.
> 
> "submission"

Will fix.

> 
>> +/*
>> + * For all sync queue entries that have already finished according to the
>> + * current sync point registers:
>> + *  - unpin & unref their mems
>> + *  - pop their push buffer slots
>> + *  - remove them from the sync queue
>> + * This is normally called from the host code's worker thread, but can be
>> + * called manually if necessary.
>> + * Must be called with the cdma lock held.
>> + */
>> +static void update_cdma_locked(struct host1x_cdma *cdma)
>> +{
>> +     bool signal = false;
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +     struct host1x_job *job, *n;
>> +
>> +     /* If CDMA is stopped, queue is cleared and we can return */
>> +     if (!cdma->running)
>> +             return;
>> +
>> +     /*
>> +      * Walk the sync queue, reading the sync point registers as necessary,
>> +      * to consume as many sync queue entries as possible without blocking
>> +      */
>> +     list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
>> +             struct host1x_syncpt *sp = host1x->syncpt + job->syncpt_id;
> 
> host1x_syncpt_get()?

Yes, that should be used.

> 
>> +
>> +             /* Check whether this syncpt has completed, and bail if not */
>> +             if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
>> +                     /* Start timer on next pending syncpt */
>> +                     if (job->timeout)
>> +                             cdma_start_timer_locked(cdma, job);
>> +                     break;
>> +             }
>> +
>> +             /* Cancel timeout, when a buffer completes */
>> +             if (cdma->timeout.clientid)
>> +                     stop_cdma_timer_locked(cdma);
>> +
>> +             /* Unpin the memory */
>> +             host1x_job_unpin(job);
>> +
>> +             /* Pop push buffer slots */
>> +             if (job->num_slots) {
>> +                     struct push_buffer *pb = &cdma->push_buffer;
>> +                     host1x->cdma_pb_op.pop_from(pb, job->num_slots);
>> +                     if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
>> +                             signal = true;
>> +             }
>> +
>> +             list_del(&job->list);
>> +             host1x_job_put(job);
>> +     }
>> +
>> +     if (list_empty(&cdma->sync_queue) &&
>> +                             cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
>> +                     signal = true;
> 
> This looks funny, maybe:
> 
>         if (cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY &&
>             list_empty(&cdma->sync_queue))
>                 signal = true;
> 
> ?

Indenting at least is strange. I don't have a preference for the
ordering of conditions, so if you like the latter order, we can just use
that.

> 
>> +
>> +     /* Wake up CdmaWait() if the requested event happened */
> 
> CdmaWait()? Where's that?

host1x_cdma_wait_locked(). Will fix.

> 
>> +     if (signal) {
>> +             cdma->event = CDMA_EVENT_NONE;
>> +             up(&cdma->sem);
>> +     }
>> +}
>> +
>> +void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
>> +             struct platform_device *dev)
> 
> There's nothing in this function that requires a platform_device, so
> passing struct device should be enough. Or maybe host1x_cdma should get
> a struct device * field?

I think we'll just start using struct device * in general in code.
Arto's been already fixing a lot of these, so he might've already fixed
this.

> 
>> +{
>> +     u32 get_restart;
> 
> Maybe just call this "restart" or "restart_addr". get_restart sounds
> like a function name.

Ok, how about "restart_dmaget_addr"? That indicates what we're doing
with the restart address.

> 
>> +     u32 syncpt_incrs;
>> +     struct host1x_job *job = NULL;
>> +     u32 syncpt_val;
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +
>> +     syncpt_val = host1x_syncpt_load_min(cdma->timeout.syncpt);
>> +
>> +     dev_dbg(&dev->dev,
>> +             "%s: starting cleanup (thresh %d)\n",
>> +             __func__, syncpt_val);
> 
> This fits on two lines.

Will merge.

> 
>> +
>> +     /*
>> +      * Move the sync_queue read pointer to the first entry that hasn't
>> +      * completed based on the current HW syncpt value. It's likely there
>> +      * won't be any (i.e. we're still at the head), but covers the case
>> +      * where a syncpt incr happens just prior/during the teardown.
>> +      */
>> +
>> +     dev_dbg(&dev->dev,
>> +             "%s: skip completed buffers still in sync_queue\n",
>> +             __func__);
> 
> This too.

Ok.

> 
>> +     list_for_each_entry(job, &cdma->sync_queue, list) {
>> +             if (syncpt_val < job->syncpt_end)
>> +                     break;
>> +
>> +             host1x_job_dump(&dev->dev, job);
>> +     }
> 
> That's potentially a lot of debug output. I wonder if it might make
> sense to control parts of this via a module parameter. Then again, if
> somebody really needs to debug this, maybe they really want *all* the
> information.

host1x_job_dump() uses dev_dbg(), so it only dumps a lot if DEBUG has
been defined in that file.

> 
>> +     /*
>> +      * Walk the sync_queue, first incrementing with the CPU syncpts that
>> +      * are partially executed (the first buffer) or fully skipped while
>> +      * still in the current context (slots are also NOP-ed).
>> +      *
>> +      * At the point contexts are interleaved, syncpt increments must be
>> +      * done inline with the pushbuffer from a GATHER buffer to maintain
>> +      * the order (slots are modified to be a GATHER of syncpt incrs).
>> +      *
>> +      * Note: save in get_restart the location where the timed out buffer
>> +      * started in the PB, so we can start the refetch from there (with the
>> +      * modified NOP-ed PB slots). This lets things appear to have completed
>> +      * properly for this buffer and resources are freed.
>> +      */
>> +
>> +     dev_dbg(&dev->dev,
>> +             "%s: perform CPU incr on pending same ctx buffers\n",
>> +             __func__);
> 
> Can be collapsed to two lines.

Sure.

> 
>> +
>> +     get_restart = cdma->last_put;
>> +     if (!list_empty(&cdma->sync_queue))
>> +             get_restart = job->first_get;
> 
> Perhaps:
> 
>         if (list_empty(&cdma->sync_queue))
>                 restart = cdma->last_put;
>         else
>                 restart = job->first_get;
> 
> ?

That's equivalent in functionality, and there's one less assignment for
one path, so sounds good.

> 
>> +     list_for_each_entry_from(job, &cdma->sync_queue, list)
>> +             if (job->clientid == cdma->timeout.clientid)
>> +                     job->timeout = 500;
> 
> I think this warrants a comment.

Sure. We're accelerating timing out jobs for the client that submitted
the job that timed out. But we'll add a comment. And, in downstream, we
already changed this to "job->timeout = max(job->timeout, 500), so we
should use that.

> 
>> +/*
>> + * Destroy a cdma
>> + */
>> +void host1x_cdma_deinit(struct host1x_cdma *cdma)
>> +{
>> +     struct push_buffer *pb = &cdma->push_buffer;
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +
>> +     if (cdma->running) {
>> +             pr_warn("%s: CDMA still running\n",
>> +                             __func__);
>> +     } else {
>> +             host1x->cdma_pb_op.destroy(pb);
>> +             host1x->cdma_op.timeout_destroy(cdma);
>> +     }
>> +}
> 
> There's no way to recover from the situation where a cdma is still
> running. Can this not return an error code (-EBUSY?) if the cdma can't
> be destroyed?

It's called from close(), which cannot return an error code. It's
actually more of a power optimization. The effect is that if there are
no users for channel, we'll just not free up the push buffer.

I think the proper fix would actually be to check in host1x_cdma_init()
if push buffer is already allocated and cdma->running. In that case we
could skip most of initialization.

> 
>> +/*
>> + * End a cdma submit
>> + * Kick off DMA, add job to the sync queue, and a number of slots to be freed
>> + * from the pushbuffer. The handles for a submit must all be pinned at the same
>> + * time, but they can be unpinned in smaller chunks.
>> + */
>> +void host1x_cdma_end(struct host1x_cdma *cdma,
>> +             struct host1x_job *job)
>> +{
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +     bool was_idle = list_empty(&cdma->sync_queue);
> 
> Maybe just "idle"? It reflects the current state of the CDMA, not any
> old state.

Ok.

> 
>> +
>> +     host1x->cdma_op.kick(cdma);
>> +
>> +     add_to_sync_queue(cdma,
>> +                     job,
>> +                     cdma->slots_used,
>> +                     cdma->first_get);
> 
> No need to split this over so many lines. Also, shouldn't the order be
> reversed here? I.e. first add to sync queue, then start DMA?

Yeah, I think the order should be reversed. And, we're anyway moving the
code inline, so there's no function call.

> 
>> +     /* start timer on idle -> active transitions */
>> +     if (job->timeout && was_idle)
>> +             cdma_start_timer_locked(cdma, job);
> 
> This could be part of add_to_sync_queue(), but if you open-code that as
> I suggest earlier it should obviously stay.

Yep, let's open-code that.

> 
>> diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
> [...]
>> +struct platform_device;
> 
> No need for this if you pass struct device * instead.

Will change.

> 
>> +/*
>> + * cdma
>> + *
>> + * This is in charge of a host command DMA channel.
>> + * Sends ops to a push buffer, and takes responsibility for unpinning
>> + * (& possibly freeing) of memory after those ops have completed.
>> + * Producer:
>> + *   begin
>> + *           push - send ops to the push buffer
>> + *   end - start command DMA and enqueue handles to be unpinned
>> + * Consumer:
>> + *   update - call to update sync queue and push buffer, unpin memory
>> + */
> 
> I find the name to be a bit confusing. For some reason I automatically
> think of GSM when I read CDMA. This really is more of a job queue, so
> maybe calling it host1x_job_queue might be more appropriate. But I've
> already requested a lot of things to be renamed, so I think I can live
> with this being called CDMA if you don't want to change it.
> 
> Alternatively all of these could be moved to the struct host1x_channel
> given that there's only one of each of the push_buffer, buffer_timeout
> and host1x_cma objects per channel.

I did consider merging those two at a time. That should work, as they
both deal with channels essentially. I also saw that the resulting file
and data structures became quite large, so I have so far preferred to
keep them separate.

This way I can keep the "higher level" stuff (inserting setclass,
serializing, allocating sync point ranges, etc) in one file and lower
level stuff (write to hardware, deal with push buffer pointers, etc) in
another.

> 
>> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
> [...]
>> +#include "channel.h"
>> +#include "dev.h"
>> +#include "job.h"
>> +
>> +#include <linux/slab.h>
>> +#include <linux/module.h>
> 
> Again the include ordering is strange.

Will fix.

> 
>> +/*
>> + * Iterator function for host1x device list
>> + * It takes a fptr as an argument and calls that function for each
>> + * device in the list
>> + */
>> +void host1x_channel_for_all(struct host1x *host1x, void *data,
>> +     int (*fptr)(struct host1x_channel *ch, void *fdata))
>> +{
>> +     struct host1x_channel *ch;
>> +     int ret;
>> +
>> +     list_for_each_entry(ch, &host1x->chlist.list, list) {
>> +             if (ch && fptr) {
>> +                     ret = fptr(ch, data);
>> +                     if (ret) {
>> +                             pr_info("%s: iterator error\n", __func__);
>> +                             break;
>> +                     }
>> +             }
>> +     }
>> +}
> 
> Couldn't you rewrite this as a macro, similar to list_for_each_entry()
> so that users could do something like:
> 
>         host1x_for_each_channel(channel, host1x) {
>                 ...
>         }
> 
> That's a bit friendlier than having each user write a separate function
> to be called from this iterator.

Sounds good, we'll try that. My macro magic is rusty, but I trust
list_for_each_entry() will give a template.

> 
>> +int host1x_channel_submit(struct host1x_job *job)
>> +{
>> +     return host1x_get_host(job->ch->dev)->channel_op.submit(job);
>> +}
> 
> I'd expect a function named host1x_channel_submit() to take a struct
> host1x_channel *. Should this perhaps be called host1x_job_submit()?

It calls into channel code directly, and the underlying op also just
takes a job. We could add channel as a parameter, and not pass it in
host1x_job_alloc(). but we actually need the channel data already in
host1x_job_pin(), which comes before submit. We need it so that we pin
the buffer to correct engine.

> 
>> +struct host1x_channel *host1x_channel_get(struct host1x_channel *ch)
>> +{
>> +     int err = 0;
>> +
>> +     mutex_lock(&ch->reflock);
>> +     if (ch->refcount == 0)
>> +             err = host1x_cdma_init(&ch->cdma);
>> +     if (!err)
>> +             ch->refcount++;
>> +
>> +     mutex_unlock(&ch->reflock);
>> +
>> +     return err ? NULL : ch;
>> +}
> 
> Why don't you use any of the kernel's reference counting mechanisms?
> 
>> +void host1x_channel_put(struct host1x_channel *ch)
>> +{
>> +     mutex_lock(&ch->reflock);
>> +     if (ch->refcount == 1) {
>> +             host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
>> +             host1x_cdma_deinit(&ch->cdma);
>> +     }
>> +     ch->refcount--;
>> +     mutex_unlock(&ch->reflock);
>> +}
> 
> I think you can do all of this using a kref.

I think the original reason was that there's no reason to use atomic
kref, as we anyway have to do mutual exclusion via mutex. But, using
kref won't be any problem, so we could use that.

> 
>> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
>> +{
>> +     struct host1x_channel *ch = NULL;
>> +     struct host1x *host1x = host1x_get_host(pdev);
>> +     int chindex;
>> +     int max_channels = host1x->info.nb_channels;
>> +     int err;
>> +
>> +     mutex_lock(&host1x->chlist_mutex);
>> +
>> +     chindex = host1x->allocated_channels;
>> +     if (chindex > max_channels)
>> +             goto fail;
>> +
>> +     ch = kzalloc(sizeof(*ch), GFP_KERNEL);
>> +     if (ch == NULL)
>> +             goto fail;
>> +
>> +     /* Link platform_device to host1x_channel */
>> +     err = host1x->channel_op.init(ch, host1x, chindex);
>> +     if (err < 0)
>> +             goto fail;
>> +
>> +     ch->dev = pdev;
>> +
>> +     /* Add to channel list */
>> +     list_add_tail(&ch->list, &host1x->chlist.list);
>> +
>> +     host1x->allocated_channels++;
>> +
>> +     mutex_unlock(&host1x->chlist_mutex);
>> +     return ch;
>> +
>> +fail:
>> +     dev_err(&pdev->dev, "failed to init channel\n");
>> +     kfree(ch);
>> +     mutex_unlock(&host1x->chlist_mutex);
>> +     return NULL;
>> +}
> 
> I think the critical section could be shorter here. It's probably not
> worth the extra trouble, though, given that channels are not often
> allocated.

Yeah, boot time isn't measured in microseconds. :-) But, if we just make
allocated_channels an atomic, we should be able to drop chlist_mutex
altogether and it could simplify the code.

> 
>> +void host1x_channel_free(struct host1x_channel *ch)
>> +{
>> +     struct host1x *host1x = host1x_get_host(ch->dev);
>> +     struct host1x_channel *chiter, *tmp;
>> +     list_for_each_entry_safe(chiter, tmp, &host1x->chlist.list, list) {
>> +             if (chiter == ch) {
>> +                     list_del(&chiter->list);
>> +                     kfree(ch);
>> +                     host1x->allocated_channels--;
>> +
>> +                     return;
>> +             }
>> +     }
>> +}
> 
> This doesn't free the channel if it happens to not be part of the host1x
> channel list. Perhaps an easier way to write it would be:
> 
>         host1x = host1x_get_host(ch->dev);
> 
>         list_del(&ch->list);
>         kfree(ch);
> 
>         host1x->allocated_channels--;
> 
> Looking at the rest of the code, it seems like a channel will never not
> be part of the host1x channel list, so I don't think there's a need to
> to scan the list.

I think you're right. This is just overprotective. Your variant does the
same thing with much less code.

> 
> On a side-note: generally if you break out of the loop right after
> freeing the memory of a removed node, there's no need to use the _safe
> variant since you won't be accessing the .next field of the freed node
> anyway.

That's true.

> 
> Maybe these should also adopt a similar naming as what we discussed for
> the syncpoints. That is:
> 
>         struct host1x_channel *host1x_channel_request(struct device *dev);
> 
> ?

Sounds good.

> 
>> diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
> [...]
>> +
>> +/*
>> + * host1x device list in debug-fs dump of host1x and client device
>> + * as well as channel state
>> + */
> 
> I don't understand this comment.

Probably because it's not a sentence and doesn't make sense. I think
it's just misplaced. We'll find its proper home.

> 
>> +struct host1x_channel {
>> +     struct list_head list;
>> +
>> +     int refcount;
>> +     int chid;
> 
> This can probably just be id. It is a field of host1x_channel, so the ch
> prefix is redundant.

Ok.

> 
>> +     struct mutex reflock;
>> +     struct mutex submitlock;
>> +     void __iomem *regs;
>> +     struct device *node;
> 
> This is never used.

Yep, let's remove "node".

> 
>> +     struct platform_device *dev;
> 
> Can this be just struct device *?

I think so. I'll let Arto look at all places where we could change
platform_device->device. He was already on it.

> 
>> +     struct cdev cdev;
> 
> This is never used.

Will remove.

> 
>> +/* channel list operations */
>> +void host1x_channel_list_init(struct host1x *);
>> +void host1x_channel_for_all(struct host1x *, void *data,
>> +     int (*fptr)(struct host1x_channel *ch, void *fdata));
>> +
>> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev);
>> +void host1x_channel_free(struct host1x_channel *ch);
> 
> Is it a good idea to make host1x_channel_free() publicly available?
> Shouldn't the host1x_channel_alloc()/host1x_channel_request() return a
> host1x_channel with a reference count of 1 and everybody release their
> reference using host1x_channel_put() to make sure the channel is freed
> only after the last reference disappears?
> 
> Otherwise whoever calls host1x_channel_free() will confuse everybody
> else that's still keeping a reference.

The difference is that _put and _get are called to indicate how many
user space processes there are for the channel. Even if there are no
processes, we won't free the channel structure - we just freeze the channel.

_alloc and _free are different in that they actually create the channel
structs and delete them and they follow the lifecycle of the driver.
Perhaps we should figure new naming, but refcounting and alloc/free
cannot be merged here.

> 
>> diff --git a/drivers/gpu/host1x/cma.c b/drivers/gpu/host1x/cma.c
> [...]
> 
> Various spurious blank lines in this file, and the alignment of function
> parameters is off.

Will fix.

> 
>> +struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev)
> 
> I don't think this needs platform_device either.

Will fix.

> 
>> +{
>> +     struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
>> +     struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
>> +
>> +     mutex_lock(struct_mutex);
>> +     drm_gem_object_reference(&obj->base);
>> +     mutex_unlock(struct_mutex);
> 
> I think it's more customary to obtain a pointer to struct drm_device and
> then use mutex_{lock,unlock}(&drm->struct_mutex). Or you could just use
> drm_gem_object_reference_unlocked(&obj->base) instead. Which doesn't
> exist yet, apparently. But it could be added.

I think we could take the former path - just refer to mutex in a
different way.

>> +int host1x_cma_pin_array_ids(struct platform_device *dev,
>> +             long unsigned *ids,
>> +             long unsigned id_type_mask,
>> +             long unsigned id_type,
>> +             u32 count,
>> +             struct host1x_job_unpin_data *unpin_data,
>> +             dma_addr_t *phys_addr)
> 
> struct device * and unsigned long please. count can also doesn't need to
> be a sized type. unsigned int will do just fine. The return value can
> also be unsigned int if you don't expect to return any error conditions.

I think we'll need to check these. ids probably needs to be a u32 *, and
id_type_mask and id_type should be u32. They come like that from user space.

> 
>> +{
>> +     int i;
>> +     int pin_count = 0;
> 
> Both should be unsigned as well, and can go on one line:
> 
>         unsigned int pin_count = 0, i;

Ok.

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> [...]
>>  struct host1x;
>> +struct host1x_intr;
>>  struct host1x_syncpt;
>> +struct host1x_channel;
>> +struct host1x_cdma;
>> +struct host1x_job;
>> +struct push_buffer;
>> +struct dentry;
> 
> I think this already belongs in a previous patch. The debugfs dentry
> isn't added in this patch.

Ok, that was a mistake I did when I re-split after one of the previous
rounds. I compiled (at least thought I did) after each patch, so it
might be that these aren't actually needed.

> 
>> +struct host1x_channel_ops {
>> +     int (*init)(struct host1x_channel *,
>> +                 struct host1x *,
>> +                 int chid);
> 
> Please add the parameter names as well (the same goes for all ops
> declared in this file). And "id" will be enough. Also the channel ID can
> surely be unsigned, right?

Sure to all of these.

> 
>> +struct host1x_cdma_ops {
>> +     void (*start)(struct host1x_cdma *);
>> +     void (*stop)(struct host1x_cdma *);
>> +     void (*kick)(struct  host1x_cdma *);
>> +     int (*timeout_init)(struct host1x_cdma *,
>> +                         u32 syncpt_id);
>> +     void (*timeout_destroy)(struct host1x_cdma *);
>> +     void (*timeout_teardown_begin)(struct host1x_cdma *);
>> +     void (*timeout_teardown_end)(struct host1x_cdma *,
>> +                                  u32 getptr);
>> +     void (*timeout_cpu_incr)(struct host1x_cdma *,
>> +                              u32 getptr,
>> +                              u32 syncpt_incrs,
>> +                              u32 syncval,
>> +                              u32 nr_slots);
>> +};
> 
> Can the timeout_ prefix not be dropped? The functions are generally
> useful and not directly related to timeouts, even though they seem to
> only be used during timeout handling.

All the timeout functions actually access the timeout struct, so they're
not generic. Teardown functions are the only ones which don't access
timeout.

> 
> Also, is it really necessary to abstract these into an ops structure? I
> get that newer hardware revisions might require different ops for sync-
> point handling because the register layout or number of syncpoints may
> be different, but the CDMA and push buffer (below) concepts are pretty
> much a software abstraction, and as such its implementation is unlikely
> to change with some future hardware revision.

Pushbuffer ops can become generic. There's only one catch - init uses
the restart opcode. But the opcode is not going to change, so we can
generalize that.

> 
>> +struct host1x_pushbuffer_ops {
>> +     void (*reset)(struct push_buffer *);
>> +     int (*init)(struct push_buffer *);
>> +     void (*destroy)(struct push_buffer *);
>> +     void (*push_to)(struct push_buffer *,
>> +                     struct mem_handle *,
>> +                     u32 op1, u32 op2);
>> +     void (*pop_from)(struct push_buffer *,
>> +                      unsigned int slots);
> 
> Maybe just push() and pop()?

Can do.

> 
>> +     u32 (*space)(struct push_buffer *);
>> +     u32 (*putptr)(struct push_buffer *);
>> +};
>>
>>  struct host1x_syncpt_ops {
>>       void (*reset)(struct host1x_syncpt *);
>> @@ -64,9 +111,19 @@ struct host1x {
>>       struct host1x_device_info info;
>>       struct clk *clk;
>>
>> +     /* Sync point dedicated to replacing waits for expired fences */
>> +     struct host1x_syncpt *nop_sp;
>> +
>> +     struct host1x_channel_ops channel_op;
>> +     struct host1x_cdma_ops cdma_op;
>> +     struct host1x_pushbuffer_ops cdma_pb_op;
>>       struct host1x_syncpt_ops syncpt_op;
>>       struct host1x_intr_ops intr_op;
>>
>> +     struct mutex chlist_mutex;
>> +     struct host1x_channel chlist;
> 
> Shouldn't this just be struct list_head?

I think you're right, to follow the normal kernel conventions.

> 
>> +     int allocated_channels;
> 
> unsigned int? And maybe just "num_channels"?

num_channels could be thought as "number of available channels", so I'd
like to use num_allocated_channels here.

> 
>> diff --git a/drivers/gpu/host1x/host1x.h b/drivers/gpu/host1x/host1x.h
> [...]
>> +enum host1x_class {
>> +     NV_HOST1X_CLASS_ID              = 0x1,
>> +     NV_GRAPHICS_2D_CLASS_ID         = 0x51,
> 
> This entry belongs in a later patch, right? And I find it convenient if
> enumeration constants start with the enum name as prefix. Furthermore
> it'd be nice to reuse the hardware module names, like so:
> 
>         enum host1x_class {
>                 HOST1X_CLASS_HOST1X,
>                 HOST1X_CLASS_GR2D,
>                 HOST1X_CLASS_GR3D,
>         };

The naming sounds good. We already use HOST1X_CLASS_HOST1X in code to
insert a wait. If you'd prefer, we can move the definition of
HOST1X_CLASS_GR2D to the later patch.

> 
>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> [...]
>> +#include <linux/slab.h>
>> +#include <linux/scatterlist.h>
>> +#include <linux/dma-mapping.h>
>> +#include "cdma.h"
>> +#include "channel.h"
>> +#include "dev.h"
>> +#include "memmgr.h"
>> +
>> +#include "cdma_hw.h"
>> +
>> +static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get)
>> +{
>> +     return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop)
>> +             | HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst)
>> +             | HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
> 
> I think it is more customary to put the | at the end of the preceding
> line:
> 
>         return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop) |
>                HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst) |
>                HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
> 
> Also since these are all single bits, I'd prefer if you could drop the
> _F suffix and not make them take a parameter. I think it'd even be
> better not to have this function at all, but make the intent explicit
> where the register is written. That is, have each call site set the bits
> explicitly instead of calling this helper. Having a parameter list such
> as (true, false, false) or (true, true, true) is confusing since you
> have to keep looking up the meaning of the parameters.

The operation that the _F macros do is masking and bit shifting the
fields correctly. Without that, we'd need to expose several macros to
mask and shift, and I'd rather just have one macro to take care of that.

But, we can open code the function to wherever it's used if that's more
readable.

> 
>> +}
>> +
>> +static void cdma_timeout_handler(struct work_struct *work);
> 
> Can this prototype be avoided?

We could try shuffling the code. There might be some dependency problems
that forced this ordering, but we'll try.

> 
>> +/**
>> + * Reset to empty push buffer
>> + */
>> +static void push_buffer_reset(struct push_buffer *pb)
>> +{
>> +     pb->fence = PUSH_BUFFER_SIZE - 8;
>> +     pb->cur = 0;
> 
> Maybe position is a better name than cur.

Sure.

> 
>> +/**
>> + * Init push buffer resources
>> + */
>> +static void push_buffer_destroy(struct push_buffer *pb);
> 
> You should be careful with these comment blocks. If you start them with
> /**, then you should make them proper kerneldoc comments. But you don't
> really need that for static functions, so you could just make them /*-
> style.
> 
> Also this particular comment is confusingly place on top of the proto-
> type of push_buffer_destroy().

You're right. We'll just remove the /** */ notation and use normal
comments. And the comment is just misplaced, so we'll move it.

> 
>> +/*
>> + * Push two words to the push buffer
>> + * Caller must ensure push buffer is not full
>> + */
>> +static void push_buffer_push_to(struct push_buffer *pb,
>> +             struct mem_handle *handle,
>> +             u32 op1, u32 op2)
>> +{
>> +     u32 cur = pb->cur;
>> +     u32 *p = (u32 *)((u32)pb->mapped + cur);
> 
> You do all this extra casting to make sure to increment by bytes and not
> 32-bit words. How about you change pb->cur to contain the word index, so
> that you don't have to go through hoops each time around.
> 
> Alternatively you could make it a pointer to u32 and not have to index
> or cast at all. So you'd end up with something like:
> 
>         struct push_buffer {
>                 u32 *start;
>                 u32 *end;
>                 u32 *ptr;
>         };

The complexity comes from the fact that we deal both with device virtual
addresses and CPU addresses to the same buffer. We'll need the indexes
so that we can convert between the two address spaces, but we might be
able to use word indexes. We'll check this.

> 
>> +/*
>> + * Return the number of two word slots free in the push buffer
>> + */
>> +static u32 push_buffer_space(struct push_buffer *pb)
>> +{
>> +     return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
>> +}
> 
> Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than
> PUSH_BUFFER_SIZE, can it?

You're right, this function doesn't need to worry about wrapping.

> 
>> +/*
>> + * Init timeout resources
>> + */
>> +static int cdma_timeout_init(struct host1x_cdma *cdma,
>> +                              u32 syncpt_id)
>> +{
>> +     if (syncpt_id == NVSYNCPT_INVALID)
>> +             return -EINVAL;
> 
> Do we really need the syncpt_id check here? It is the only reason why we
> need to pass the parameter in the first place, and if we get to this
> point we should already have made sure that the syncpoint is actually
> valid.

True, we can drop this.

> 
>> +/*
>> + * Increment timedout buffer's syncpt via CPU.
> 
> Nit: "timed out buffer's"

Will fix.

> 
>> + */
>> +static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
>> +                             u32 syncpt_incrs, u32 syncval, u32 nr_slots)
> 
> The syncval parameter isn't used.

True, that'd be used only with wait base support, as we need to
synchronize wait base with the sync point. Will remove.

> 
>> +{
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +     struct push_buffer *pb = &cdma->push_buffer;
>> +     u32 i, getidx;
>> +
>> +     for (i = 0; i < syncpt_incrs; i++)
>> +             host1x_syncpt_cpu_incr(cdma->timeout.syncpt);
>> +
>> +     /* after CPU incr, ensure shadow is up to date */
>> +     host1x_syncpt_load_min(cdma->timeout.syncpt);
>> +
>> +     /* NOP all the PB slots */
>> +     getidx = getptr - pb->phys;
>> +     while (nr_slots--) {
>> +             u32 *p = (u32 *)((u32)pb->mapped + getidx);
>> +             *(p++) = HOST1X_OPCODE_NOOP;
>> +             *(p++) = HOST1X_OPCODE_NOOP;
>> +             dev_dbg(&host1x->dev->dev, "%s: NOP at 0x%x\n",
>> +                     __func__, pb->phys + getidx);
>> +             getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1);
>> +     }
>> +     wmb();
> 
> Why the memory barrier?

Can't think of any good reason. Will try removing.

> 
>> +/*
>> + * Similar to cdma_start(), but rather than starting from an idle
>> + * state (where DMA GET is set to DMA PUT), on a timeout we restore
>> + * DMA GET from an explicit value (so DMA may again be pending).
>> + */
>> +static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
>> +{
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +     struct host1x_channel *ch = cdma_to_channel(cdma);
>> +
>> +     if (cdma->running)
>> +             return;
>> +
>> +     cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
>> +
>> +     host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
>> +             HOST1X_CHANNEL_DMACTRL);
>> +
>> +     /* set base, end pointer (all of memory) */
>> +     host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART);
>> +     host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND);
> 
> According to the TRM, writing to HOST1X_CHANNEL_DMASTART will start a
> DMA transfer on the channel (if DMA_PUT != DMA_GET). Irrespective of
> that, why set the valid range to all of physical memory? We know the
> valid range of the push buffer, why not set the limits accordingly?

That'd make sense. Currently we use the RESTART as the barrier, but
having hardware check against DMAEND is a good idea, too.

> 
>> +/*
>> + * Kick channel DMA into action by writing its PUT offset (if it has changed)
>> + */
>> +static void cdma_kick(struct host1x_cdma *cdma)
>> +{
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +     struct host1x_channel *ch = cdma_to_channel(cdma);
>> +     u32 put;
>> +
>> +     put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
>> +
>> +     if (put != cdma->last_put) {
>> +             host1x_ch_writel(ch, put, HOST1X_CHANNEL_DMAPUT);
>> +             cdma->last_put = put;
>> +     }
>> +}
> 
> kick() sounds unusual. Maybe flush or commit or something similar would
> be more accurate.

We could use flush.

> 
>> +static void cdma_stop(struct host1x_cdma *cdma)
>> +{
>> +     struct host1x_channel *ch = cdma_to_channel(cdma);
>> +
>> +     mutex_lock(&cdma->lock);
>> +     if (cdma->running) {
>> +             host1x_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY);
>> +             host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
>> +                     HOST1X_CHANNEL_DMACTRL);
>> +             cdma->running = false;
>> +     }
>> +     mutex_unlock(&cdma->lock);
>> +}
> 
> Perhaps this should be ranem cdma_stop_sync() or similar to make it
> clear that it waits for the queue to run empty.

Ok, sounds good.

> 
>> +static void cdma_timeout_teardown_end(struct host1x_cdma *cdma, u32 getptr)
> 
> Maybe the last parameter should be called restart to match its purpose?

Makes sense, will do.

> 
>> +{
>> +     struct host1x *host1x = cdma_to_host1x(cdma);
>> +     struct host1x_channel *ch = cdma_to_channel(cdma);
>> +     u32 cmdproc_stop;
>> +
>> +     dev_dbg(&host1x->dev->dev,
>> +             "end channel teardown (id %d, DMAGET restart = 0x%x)\n",
>> +             ch->chid, getptr);
>> +
>> +     cmdproc_stop = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
>> +     cmdproc_stop &= ~(BIT(ch->chid));
> 
> No need for the extra parentheses.

Ok, will remove.

> 
>> +     host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
>> +
>> +     cdma->torndown = false;
>> +     cdma_timeout_restart(cdma, getptr);
>> +}
> 
> I find this a bit non-intuitive. We teardown a channel, and when we're
> done tearing down, the torndown variable is set to false and the channel
> is actually restarted. Maybe you could explain some more how this works
> and what its purpose is.

Actually, teardown_begin freezes the channel, then we manipulate the
queue, and in the end teardown_end restarts the channel. So these should
be named freeze and resume. We could even drop the timeout from the
names of these functions.

> 
>> +/*
>> + * If this timeout fires, it indicates the current sync_queue entry has
>> + * exceeded its TTL and the userctx should be timed out and remaining
>> + * submits already issued cleaned up (future submits return an error).
>> + */
> 
> I can't seem to find what causes subsequent submits to return an error.
> Also, how is the channel reset so that new jobs can be submitted?

That comment actually applies only downstream. We blacklist contexts for
channels that carry state across submits (=have hardware contexts
implemented). 2D has atomic jobs, so it doesn't need blacklisting.

host1x_cdma_update_sync_queue() purges the failed job, finds the DMAGET
for the next job, and sets sync points correctly. It'll call
teardown_end (which we'll rename) to resume the channel with the new
DMAGET pointer.

> 
>> +static void cdma_timeout_handler(struct work_struct *work)
>> +{
>> +     struct host1x_cdma *cdma;
>> +     struct host1x *host1x;
>> +     struct host1x_channel *ch;
>> +
>> +     u32 syncpt_val;
>> +
>> +     u32 prev_cmdproc, cmdproc_stop;
>> +
>> +     cdma = container_of(to_delayed_work(work), struct host1x_cdma,
>> +                         timeout.wq);
>> +     host1x = cdma_to_host1x(cdma);
>> +     ch = cdma_to_channel(cdma);
>> +
>> +     mutex_lock(&cdma->lock);
>> +
>> +     if (!cdma->timeout.clientid) {
>> +             dev_dbg(&host1x->dev->dev,
>> +                      "cdma_timeout: expired, but has no clientid\n");
>> +             mutex_unlock(&cdma->lock);
>> +             return;
>> +     }
> 
> How can the CDMA not have a client?

I don't think that's possible. :-) We should just remove the check. It
might be that we were just protecting some kind of race between timeout
code triggering and something else, but I can't really think of a scenario.

> 
>> +
>> +     /* stop processing to get a clean snapshot */
>> +     prev_cmdproc = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
>> +     cmdproc_stop = prev_cmdproc | BIT(ch->chid);
>> +     host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
>> +
>> +     dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
>> +             prev_cmdproc, cmdproc_stop);
>> +
>> +     syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
>> +
>> +     /* has buffer actually completed? */
>> +     if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
>> +             dev_dbg(&host1x->dev->dev,
>> +                      "cdma_timeout: expired, but buffer had completed\n");
> 
> Maybe this should really be a warning?

Not really - it's actually just a normal state. We got a timeout event,
but before we process it, it might be that the job manages to complete.
This can happen, and is not an error case.

> 
>> +             /* restore */
>> +             cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid));
> 
> No need for the extra parentheses. Also, why not just use prev_cmdproc,
> which shouldn't have the bit set anyway?

Yeah, prev_cmdproc is the one we should use directly.

> 
>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.h b/drivers/gpu/host1x/hw/cdma_hw.h
> [...]
>> +/*
>> + * Size of the sync queue. If it is too small, we won't be able to queue up
>> + * many command buffers. If it is too large, we waste memory.
>> + */
>> +#define HOST1X_SYNC_QUEUE_SIZE 512
> 
> I don't see this used anywhere.

Sync queue used to be an array. It hasn't been for a long time, but this
remained. Will remove.

> 
>> +/*
>> + * Number of gathers we allow to be queued up per channel. Must be a
>> + * power of two. Currently sized such that pushbuffer is 4KB (512*8B).
>> + */
>> +#define HOST1X_GATHER_QUEUE_SIZE 512
> 
> More pieces falling into place.

Great. :-)

> 
>> diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
> [...]
>> +#include "host1x.h"
>> +#include "channel.h"
>> +#include "dev.h"
>> +#include <linux/slab.h>
>> +#include "intr.h"
>> +#include "job.h"
>> +#include <trace/events/host1x.h>
> 
> More include ordering issues.

Will fix.

> 
>> +static void submit_gathers(struct host1x_job *job)
>> +{
>> +     /* push user gathers */
>> +     int i;
> 
> unsigned int?
> 
>> +     for (i = 0 ; i < job->num_gathers; i++) {
>> +             struct host1x_job_gather *g = &job->gathers[i];
>> +             u32 op1 = host1x_opcode_gather(g->words);
>> +             u32 op2 = g->mem_base + g->offset;
>> +             host1x_cdma_push_gather(&job->ch->cdma,
>> +                             job->gathers[i].ref,
>> +                             job->gathers[i].offset,
>> +                             op1, op2);
>> +     }
>> +}
> 
> Perhaps inline this into channel_submit()? I'm not sure how useful it
> really is to split off smallish functions such as this which aren't
> reused anywhere else. I don't have any major objection though, so you
> can keep it separate if you want.

I split these out because channel_submit() became so long that I
couldn't understand it anymore. I'd prefer keeping separate just to keep
myself (semi-)sane.

> 
>> +static inline void __iomem *host1x_channel_regs(void __iomem *p, int ndx)
>> +{
>> +     p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES;
>> +     return p;
>> +}
>> +
>> +static int host1x_channel_init(struct host1x_channel *ch,
>> +     struct host1x *dev, int index)
>> +{
>> +     ch->chid = index;
>> +     mutex_init(&ch->reflock);
>> +     mutex_init(&ch->submitlock);
>> +
>> +     ch->regs = host1x_channel_regs(dev->regs, index);
>> +     return 0;
>> +}
> 
> You only use host1x_channel_regs() once, so I really don't think it buys
> you anything to split it off. Both host1x_channel_regs() and
> host1x_channel_init() are short enough that they can be collapsed.

True, will merge.

> 
>> diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
> [...]
>>  #include "hw/host1x01.h"
>>  #include "dev.h"
>> +#include "channel.h"
>>  #include "hw/host1x01_hardware.h"
>>
>> +#include "hw/channel_hw.c"
>> +#include "hw/cdma_hw.c"
>>  #include "hw/syncpt_hw.c"
>>  #include "hw/intr_hw.c"
>>
>>  int host1x01_init(struct host1x *host)
>>  {
>> +     host->channel_op = host1x_channel_ops;
>> +     host->cdma_op = host1x_cdma_ops;
>> +     host->cdma_pb_op = host1x_pushbuffer_ops;
>>       host->syncpt_op = host1x_syncpt_ops;
>>       host->intr_op = host1x_intr_ops;
> 
> I think I mentioned this before, but I'd prefer not to have the .c files
> included here, but rather reference the ops structures externally. But I
> still think that especially CDMA and push buffer ops don't need to be in
> separate structures since they aren't likely to change with new hardware
> revisions.

The C files need to be included here so that they pick up the hardware
defs for the correct SoC. Pushbuffer is probably something we can
generalize, but channel registers can change, so they need to be per SoC.

> 
>> diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
> [...]
>> index c1d5324..03873c0 100644
>> --- a/drivers/gpu/host1x/hw/host1x01_hardware.h
>> +++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
>> @@ -21,6 +21,130 @@
>>
>>  #include <linux/types.h>
>>  #include <linux/bitops.h>
>> +#include "hw_host1x01_channel.h"
>>  #include "hw_host1x01_sync.h"
>> +#include "hw_host1x01_uclass.h"
>> +
>> +/* channel registers */
>> +#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384
> 
> The only user of this seems to be host1x_channel_regs(), so it could be
> moved to that file. Also the name is overly long, why not something like
> HOST1X_CHANNEL_SIZE?

Sounds good.

> 
>> +#define HOST1X_OPCODE_NOOP host1x_opcode_nonincr(0, 0)
> 
> HOST1X_OPCODE_NOP would be more canonical in my opinion.

Ok, can change.

> 
> 
>> +static inline u32 host1x_mask2(unsigned x, unsigned y)
>> +{
>> +     return 1 | (1 << (y - x));
>> +}
> 
> What's this? I don't see it used anywhere.

It's a shortcut to add two register writes to one MASK opcode, but we'll
remove the def as it's not used.

> 
>> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
> [...]
>> +#define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \
>> +     host1x_channel_dmactrl_dmastop_f(v)
> 
> I mentioned this elsewhere already, but I think the _F suffix (and _f
> for that matter) along with the v parameter should go away.

I'd prefer keeping so that I don't have to use two #defines to replace
one. That IMO makes the usage harder and more error prone.

> 
>> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
> [...]
> 
> What does the "uclass" stand for? It seems a bit useless to me.

It means host1x class, i.e. the host1x registers that can be written to
from push buffers.

> 
>> diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
>> index 16e3ada..ba48cee 100644
>> --- a/drivers/gpu/host1x/hw/syncpt_hw.c
>> +++ b/drivers/gpu/host1x/hw/syncpt_hw.c
>> @@ -97,6 +97,15 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp)
>>       wmb();
>>  }
>>
>> +/* remove a wait pointed to by patch_addr */
>> +static int syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr)
>> +{
>> +     u32 override = host1x_class_host_wait_syncpt(
>> +                     NVSYNCPT_GRAPHICS_HOST, 0);
>> +     __raw_writel(override, patch_addr);
> 
> __raw_writel() isn't meant to be used for regular memory addresses, but
> only for MMIO addresses. patch_addr will be a kernel virtual address to
> an location in RAM, so you can just treat it as a normal pointer, so:
> 
>         *(u32 *)patch_addr = override;

Sure, you mentioned it earlier, but I've just forgotten that. Sorry
about that.

> 
> A small optimization might be to make override a static const, so that
> it doesn't have to be composed every time.

Can do.

> 
>> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
> [...]
>> +static void action_submit_complete(struct host1x_waitlist *waiter)
>> +{
>> +     struct host1x_channel *channel = waiter->data;
>> +     int nr_completed = waiter->count;
> 
> No need for this variable.

I'm using it for tracing in a follow-up patch. It can be used in traces
for checking the queue length at each point of time.

> 
>> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> [...]
>> +#ifdef CONFIG_TEGRA_HOST1X_FIREWALL
>> +static int host1x_firewall = 1;
>> +#else
>> +static int host1x_firewall;
>> +#endif
> 
> You could use IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) in the code,
> which will have the nice side-effect of compiling code out if the symbol
> isn't selected.

Sure, I just wasn't aware of IS_ENABLED.

> 
>> +struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
>> +             u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
> 
> Maybe make the parameters unsigned int instead of u32?

I'll check this, but we're getting them from user space, and that API
has a fixed length field. That's why I'm carrying that type over.

> 
>> +{
>> +     struct host1x_job *job = NULL;
>> +     int num_unpins = num_cmdbufs + num_relocs;
> 
> unsigned int?

Sounds good.

> 
>> +     s64 total;
> 
> This doesn't need to be signed, u64 will be good enough. None of the
> terms in the expression that assigns to total can be negative.

True, will change.

> 
>> +     void *mem;
>> +
>> +     /* Check that we're not going to overflow */
>> +     total = sizeof(struct host1x_job)
>> +                     + num_relocs * sizeof(struct host1x_reloc)
>> +                     + num_unpins * sizeof(struct host1x_job_unpin_data)
>> +                     + num_waitchks * sizeof(struct host1x_waitchk)
>> +                     + num_cmdbufs * sizeof(struct host1x_job_gather)
>> +                     + num_unpins * sizeof(dma_addr_t)
>> +                     + num_unpins * sizeof(u32 *);
> 
> "+"s at the end of the preceding lines.

Ok.

> 
>> +     if (total > ULONG_MAX)
>> +             return NULL;
>> +
>> +     mem = job = kzalloc(total, GFP_KERNEL);
>> +     if (!job)
>> +             return NULL;
>> +
>> +     kref_init(&job->ref);
>> +     job->ch = ch;
>> +
>> +     /* First init state to zero */
>> +
>> +     /*
>> +      * Redistribute memory to the structs.
>> +      * Overflows and negative conditions have
>> +      * already been checked in job_alloc().
>> +      */
> 
> The last two lines don't really apply here. The checks are in this same
> function and they check only for overflow, not negative conditions,
> which can't happen anyway since the counts are all unsigned.

Actually overflow and negative in this case meant the same thing. Will
fix comment.

> 
>> +void host1x_job_get(struct host1x_job *job)
>> +{
>> +     kref_get(&job->ref);
>> +}
> 
> I think it is common for *_get() functions to return a pointer to the
> referenced object.

Ok, can do.

> 
>> +void host1x_job_add_gather(struct host1x_job *job,
>> +             u32 mem_id, u32 words, u32 offset)
>> +{
>> +     struct host1x_job_gather *cur_gather =
>> +                     &job->gathers[job->num_gathers];
> 
> Should this check for overflow?

As defensive measure, could do, but this is not exploitable.

> 
>> +/*
>> + * Check driver supplied waitchk structs for syncpt thresholds
>> + * that have already been satisfied and NULL the comparison (to
>> + * avoid a wrap condition in the HW).
>> + */
>> +static int do_waitchks(struct host1x_job *job, struct host1x *host,
>> +             u32 patch_mem, struct mem_handle *h)
>> +{
>> +     int i;
>> +
>> +     /* compare syncpt vs wait threshold */
>> +     for (i = 0; i < job->num_waitchk; i++) {
>> +             struct host1x_waitchk *wait = &job->waitchk[i];
>> +             struct host1x_syncpt *sp =
>> +                     host1x_syncpt_get(host, wait->syncpt_id);
>> +
>> +             /* validate syncpt id */
>> +             if (wait->syncpt_id > host1x_syncpt_nb_pts(host))
>> +                     continue;
>> +
>> +             /* skip all other gathers */
>> +             if (patch_mem != wait->mem)
>> +                     continue;
>> +
>> +             trace_host1x_syncpt_wait_check(wait->mem, wait->offset,
>> +                             wait->syncpt_id, wait->thresh,
>> +                             host1x_syncpt_read_min(sp));
>> +             if (host1x_syncpt_is_expired(
>> +                     host1x_syncpt_get(host, wait->syncpt_id),
>> +                     wait->thresh)) {
> 
> You already have the sp variable that you could use here to make it more
> readable.

True, will use that.

> 
>> +                     struct host1x_syncpt *sp =
>> +                             host1x_syncpt_get(host, wait->syncpt_id);
> 
> And you don't need this then, since you already have sp pointing to the
> same syncpoint.

Ok.

> 
>> +                     void *patch_addr = NULL;
>> +
>> +                     /*
>> +                      * NULL an already satisfied WAIT_SYNCPT host method,
>> +                      * by patching its args in the command stream. The
>> +                      * method data is changed to reference a reserved
>> +                      * (never given out or incr) NVSYNCPT_GRAPHICS_HOST
>> +                      * syncpt with a matching threshold value of 0, so
>> +                      * is guaranteed to be popped by the host HW.
>> +                      */
>> +                     dev_dbg(&host->dev->dev,
>> +                         "drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n",
>> +                         wait->syncpt_id, sp->name, wait->thresh,
>> +                         host1x_syncpt_read_min(sp));
>> +
>> +                     /* patch the wait */
>> +                     patch_addr = host1x_memmgr_kmap(h,
>> +                                     wait->offset >> PAGE_SHIFT);
>> +                     if (patch_addr) {
>> +                             host1x_syncpt_patch_wait(sp,
>> +                                     (patch_addr +
>> +                                             (wait->offset & ~PAGE_MASK)));
>> +                             host1x_memmgr_kunmap(h,
>> +                                             wait->offset >> PAGE_SHIFT,
>> +                                             patch_addr);
>> +                     } else {
>> +                             pr_err("Couldn't map cmdbuf for wait check\n");
>> +                     }
> 
> This is a case where splitting out a small function would actually be
> useful to make the code more readable since you can remove two levels of
> indentation. You can just pass in the handle and the offset, let it do
> the actual patching. Maybe
> 
>         host1x_syncpt_patch_offset(sp, h, wait->offset);
> 
> ?

Sounds good, for readability point of view.

> 
>> +             }
>> +
>> +             wait->mem = 0;
>> +     }
>> +     return 0;
>> +}
>> +
>> +
> 
> There's a gratuitous blank line.

Will remove.

> 
>> +static int pin_job_mem(struct host1x_job *job)
>> +{
>> +     int i;
>> +     int count = 0;
>> +     int result;
> 
> These (and the return value) can all be unsigned int.

True. will fix.

> 
>> +static int do_relocs(struct host1x_job *job,
>> +             u32 cmdbuf_mem, struct mem_handle *h)
>> +{
>> +     int i = 0;
> 
> This can also be unsigned int.

True, will fix.

> 
>> +     int last_page = -1;
> 
> And this should match the type of cmdbuf_offset (u32). You can initially
> set it to something like ~0 to make sure it doesn't match any valid
> offset.

You're right, will change.

> 
>> +     void *cmdbuf_page_addr = NULL;
>> +
>> +     /* pin & patch the relocs for one gather */
>> +     while (i < job->num_relocs) {
>> +             struct host1x_reloc *reloc = &job->relocarray[i];
>> +
>> +             /* skip all other gathers */
>> +             if (cmdbuf_mem != reloc->cmdbuf_mem) {
>> +                     i++;
>> +                     continue;
>> +             }
>> +
>> +             if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
>> +                     if (cmdbuf_page_addr)
>> +                             host1x_memmgr_kunmap(h,
>> +                                             last_page, cmdbuf_page_addr);
>> +
>> +                     cmdbuf_page_addr = host1x_memmgr_kmap(h,
>> +                                     reloc->cmdbuf_offset >> PAGE_SHIFT);
>> +                     last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
>> +
>> +                     if (unlikely(!cmdbuf_page_addr)) {
>> +                             pr_err("Couldn't map cmdbuf for relocation\n");
>> +                             return -ENOMEM;
>> +                     }
>> +             }
>> +
>> +             __raw_writel(
>> +                     (job->reloc_addr_phys[i] +
>> +                             reloc->target_offset) >> reloc->shift,
>> +                     (cmdbuf_page_addr +
>> +                             (reloc->cmdbuf_offset & ~PAGE_MASK)));
> 
> Again, wrong __raw_writel() usage.

Yes, sorry, I forgot about this.

> 
>> +
>> +             /* remove completed reloc from the job */
>> +             if (i != job->num_relocs - 1) {
>> +                     struct host1x_reloc *reloc_last =
>> +                             &job->relocarray[job->num_relocs - 1];
>> +                     reloc->cmdbuf_mem       = reloc_last->cmdbuf_mem;
>> +                     reloc->cmdbuf_offset    = reloc_last->cmdbuf_offset;
>> +                     reloc->target           = reloc_last->target;
>> +                     reloc->target_offset    = reloc_last->target_offset;
>> +                     reloc->shift            = reloc_last->shift;
>> +                     job->reloc_addr_phys[i] =
>> +                             job->reloc_addr_phys[job->num_relocs - 1];
>> +                     job->num_relocs--;
>> +             } else {
>> +                     break;
>> +             }
>> +     }
>> +
>> +     if (cmdbuf_page_addr)
>> +             host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
>> +
>> +     return 0;
>> +}
> 
> Also the algorithm seems a bit strange and hard to follow. Instead of
> removing relocs from the job, replacing them with the last entry and
> decrementing job->num_relocs, how much is the penalty for always
> iterating over all relocs? This is one of the other cases where I'd
> argue that simplicity is key. Furthermore you need to copy quite a bit
> of data to replace the completed relocs, so I'm not sure it buys you
> much.
> 
> It could always be optimized later on by just setting a bit in the reloc
> to mark it as completed, or keep a bitmask of completed relocations or
> whatever.

This was done in a big optimization patch, but we'll check if we could
remove this. Previously we just set cmdbuf_mem for the completed reloc
to 0, and that should work in this case.

> 
>> +static int check_reloc(struct host1x_reloc *reloc,
>> +             u32 cmdbuf_id, int offset)
> 
> offset can be unsigned int.

Yep, will change.

> 
>> +{
>> +     int err = 0;
>> +     if (reloc->cmdbuf_mem != cmdbuf_id
>> +                     || reloc->cmdbuf_offset != offset * sizeof(u32))
>> +             err = -EINVAL;
>> +
>> +     return err;
>> +}
> 
> More canonically:
> 
>         offset *= sizeof(u32);
> 
>         if (reloc->cmdbuf_mem != cmdbuf_id || reloc->cmdbuf_offset != offset)
>                 return -EINVAL;
> 
>         return 0;

Ok, both do the same thing, so can change.

> 
>> +
>> +static int check_mask(struct host1x_job *job,
>> +             struct platform_device *pdev,
>> +             struct host1x_reloc **reloc, int *num_relocs,
>> +             u32 cmdbuf_id, int *offset,
>> +             u32 *words, u32 class, u32 reg, u32 mask)
> 
> num_relocs and offset can be unsigned int *.
> 
> Same comment for the other check_*() functions. That said I think the
> code would become a lot more readable if you were to wrap all of these
> parameters into a structure, say host1x_firewall, and just pass that
> into the functions.

True, might improve performance, too. We'll do that.

> 
>> +static inline int copy_gathers(struct host1x_job *job,
>> +             struct platform_device *pdev)
> 
> struct device *

Will do.

> 
>> +{
>> +     size_t size = 0;
>> +     size_t offset = 0;
>> +     int i;
>> +
>> +     for (i = 0; i < job->num_gathers; i++) {
>> +             struct host1x_job_gather *g = &job->gathers[i];
>> +             size += g->words * sizeof(u32);
>> +     }
>> +
>> +     job->gather_copy_mapped = dma_alloc_writecombine(&pdev->dev,
>> +                     size, &job->gather_copy, GFP_KERNEL);
>> +     if (IS_ERR(job->gather_copy_mapped)) {
> 
> dma_alloc_writecombine() returns NULL on failure, so this check is
> wrong.

Oops, will fix.

> 
>> +             int err = PTR_ERR(job->gather_copy_mapped);
>> +             job->gather_copy_mapped = NULL;
>> +             return err;
>> +     }
>> +
>> +     job->gather_copy_size = size;
>> +
>> +     for (i = 0; i < job->num_gathers; i++) {
>> +             struct host1x_job_gather *g = &job->gathers[i];
>> +             void *gather = host1x_memmgr_mmap(g->ref);
>> +             memcpy(job->gather_copy_mapped + offset,
>> +                             gather + g->offset,
>> +                             g->words * sizeof(u32));
>> +
>> +             g->mem_base = job->gather_copy;
>> +             g->offset = offset;
>> +             g->mem_id = 0;
>> +             g->ref = 0;
>> +
>> +             host1x_memmgr_munmap(g->ref, gather);
>> +             offset += g->words * sizeof(u32);
>> +     }
>> +
>> +     return 0;
>> +}
> 
> I wonder, where's this DMA buffer actually used? I can't find any use
> between this copy and the corresponding dma_free_writecombine() call.

We replace the gathers in host1x_job with the ones we allocate here, so
they are used when pushing the gather's to hardware.

This is done so that user space cannot tamper with the gathers once
they've been checked by firewall.

> 
>> +int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev)
>> +{
>> +     int err = 0, i = 0, j = 0;
> 
> No need to initialize these here. i and j can also be unsigned.

Ok.

> 
>> +     struct host1x *host = host1x_get_host(pdev);
>> +     DECLARE_BITMAP(waitchk_mask, host1x_syncpt_nb_pts(host));
>> +
>> +     bitmap_zero(waitchk_mask, host1x_syncpt_nb_pts(host));
>> +     for (i = 0; i < job->num_waitchk; i++) {
>> +             u32 syncpt_id = job->waitchk[i].syncpt_id;
>> +             if (syncpt_id < host1x_syncpt_nb_pts(host))
>> +                     set_bit(syncpt_id, waitchk_mask);
>> +     }
>> +
>> +     /* get current syncpt values for waitchk */
>> +     for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
>> +             host1x_syncpt_load_min(host->syncpt + i);
>> +
>> +     /* pin memory */
>> +     err = pin_job_mem(job);
>> +     if (err <= 0)
>> +             goto out;
> 
> pin_job_mem() never returns negative.

Ok, will fix.

> 
>> +     /* patch gathers */
>> +     for (i = 0; i < job->num_gathers; i++) {
>> +             struct host1x_job_gather *g = &job->gathers[i];
>> +
>> +             /* process each gather mem only once */
>> +             if (!g->ref) {
>> +                     g->ref = host1x_memmgr_get(g->mem_id, job->ch->dev);
>> +                     if (IS_ERR(g->ref)) {
> 
> host1x_memmgr_get() also seems to return NULL on error.

I think I'll change memmgr_get() to return an ERR_PTR().

> 
>> +                             err = PTR_ERR(g->ref);
>> +                             g->ref = NULL;
>> +                             break;
>> +                     }
>> +
>> +                     g->mem_base = job->gather_addr_phys[i];
>> +
>> +                     for (j = 0; j < job->num_gathers; j++) {
>> +                             struct host1x_job_gather *tmp =
>> +                                     &job->gathers[j];
>> +                             if (!tmp->ref && tmp->mem_id == g->mem_id) {
>> +                                     tmp->ref = g->ref;
>> +                                     tmp->mem_base = g->mem_base;
>> +                             }
>> +                     }
>> +                     err = 0;
>> +                     if (host1x_firewall)
> 
> if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))

Will fix.

> 
>> +                             err = validate(job, pdev, g);
>> +                     if (err)
>> +                             dev_err(&pdev->dev,
>> +                                     "Job validate returned %d\n", err);
>> +                     if (!err)
>> +                             err = do_relocs(job, g->mem_id,  g->ref);
>> +                     if (!err)
>> +                             err = do_waitchks(job, host,
>> +                                             g->mem_id, g->ref);
>> +                     host1x_memmgr_put(g->ref);
>> +                     if (err)
>> +                             break;
>> +             }
>> +     }
>> +
>> +     if (host1x_firewall && !err) {
> 
> And here.

Here, too.

> 
>> +/*
>> + * Debug routine used to dump job entries
>> + */
>> +void host1x_job_dump(struct device *dev, struct host1x_job *job)
>> +{
>> +     dev_dbg(dev, "    SYNCPT_ID   %d\n",
>> +             job->syncpt_id);
>> +     dev_dbg(dev, "    SYNCPT_VAL  %d\n",
>> +             job->syncpt_end);
>> +     dev_dbg(dev, "    FIRST_GET   0x%x\n",
>> +             job->first_get);
>> +     dev_dbg(dev, "    TIMEOUT     %d\n",
>> +             job->timeout);
>> +     dev_dbg(dev, "    NUM_SLOTS   %d\n",
>> +             job->num_slots);
>> +     dev_dbg(dev, "    NUM_HANDLES %d\n",
>> +             job->num_unpins);
>> +}
> 
> These don't need to be wrapped.

True, will merge lines.

> 
>> diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
> [...]
>> +struct host1x_job_gather {
>> +     u32 words;
>> +     dma_addr_t mem_base;
>> +     u32 mem_id;
>> +     int offset;
>> +     struct mem_handle *ref;
>> +};
>> +
>> +struct host1x_cmdbuf {
>> +     __u32 mem;
>> +     __u32 offset;
>> +     __u32 words;
>> +     __u32 pad;
>> +};
>> +
>> +struct host1x_reloc {
>> +     __u32 cmdbuf_mem;
>> +     __u32 cmdbuf_offset;
>> +     __u32 target;
>> +     __u32 target_offset;
>> +     __u32 shift;
>> +     __u32 pad;
>> +};
>> +
>> +struct host1x_waitchk {
>> +     __u32 mem;
>> +     __u32 offset;
>> +     __u32 syncpt_id;
>> +     __u32 thresh;
>> +};
> 
> None of these are shared with userspace, so they shouldn't take the
> __u32 types, but the regular u32 ones.

True. We copy stuff from user space types to these, but we don't use
these directly in user space API.

> 
>> +/*
>> + * Each submit is tracked as a host1x_job.
>> + */
>> +struct host1x_job {
>> +     /* When refcount goes to zero, job can be freed */
>> +     struct kref ref;
>> +
>> +     /* List entry */
>> +     struct list_head list;
>> +
>> +     /* Channel where job is submitted to */
>> +     struct host1x_channel *ch;
> 
> Maybe write it out as "channel"?

Ok.

> 
>> +
>> +     int clientid;
> 
> Subsequent patches assign u32 to this field, so maybe the type should be
> changed here. And maybe leave out the id suffix. It doesn't really add
> any information.

Good catch, will change.

> 
>> +     /* Gathers and their memory */
>> +     struct host1x_job_gather *gathers;
>> +     int num_gathers;
> 
> unsigned int

Will change.

> 
>> +     /* Wait checks to be processed at submit time */
>> +     struct host1x_waitchk *waitchk;
>> +     int num_waitchk;
> 
> unsigned int

Ok.

> 
>> +     u32 waitchk_mask;
> 
> This might need to be changed to a bitfield once future Tegra versions
> start supporting more than 32 syncpoints.

True, I think we'll need to get this changed already now. We actually
drop the usage of waitchk_mask in downstream because of this. It's
basically just an optimization that doesn't gain any real world speed
advantage.

> 
>> +     /* Array of handles to be pinned & unpinned */
>> +     struct host1x_reloc *relocarray;
>> +     int num_relocs;
> 
> unsigned int

Will change.

> 
>> +     struct host1x_job_unpin_data *unpins;
>> +     int num_unpins;
> 
> unsigned int

Will change.

> 
>> +     dma_addr_t *addr_phys;
>> +     dma_addr_t *gather_addr_phys;
>> +     dma_addr_t *reloc_addr_phys;
>> +
>> +     /* Sync point id, number of increments and end related to the submit */
>> +     u32 syncpt_id;
>> +     u32 syncpt_incrs;
>> +     u32 syncpt_end;
>> +
>> +     /* Maximum time to wait for this job */
>> +     int timeout;
> 
> unsigned int. I think we discussed this already in a slightly different
> context in patch 2.

Sure, will change. I think timeouts were discussed wrt syncpt wait timeout.

> 
>> +     /* Null kickoff prevents submit from being sent to hardware */
>> +     bool null_kickoff;
> 
> I don't think this is used anywhere.

True, we can remove this as we haven't posted the code for null kickoff.

> 
>> +     /* Index and number of slots used in the push buffer */
>> +     int first_get;
>> +     int num_slots;
> 
> unsigned int

Ok.

> 
>> +
>> +     /* Copy of gathers */
>> +     size_t gather_copy_size;
>> +     dma_addr_t gather_copy;
>> +     u8 *gather_copy_mapped;
> 
> Are these really needed? They don't seem to be used anywhere except to
> store a copy and free that copy sometime later.

They're needed so that kernel can take a copy of the gathers so that
user space cannot tamper with them post-submit.

> 
>> +
>> +     /* Temporary space for unpin ids */
>> +     long unsigned int *pin_ids;
> 
> unsigned long

Will change.

> 
>> +     /* Check if register is marked as an address reg */
>> +     int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);
> 
> is_addr_reg() sounds a bit unusual. Maybe match this to the name of the
> main firewall routine, validate()?

The point of this op is to just tell if a register for a class is
pointing to a buffer. validate then uses this information. But both
answers (yes/no) and both types of registers are still valid, so
validate() wouldn't be the proper name.

validation is then done by checking that there's a reloc corresponding
to each register write to a register that can hold an address.

> 
>> +     /* Request a SETCLASS to this class */
>> +     u32 class;
>> +
>> +     /* Add a channel wait for previous ops to complete */
>> +     u32 serialize;
> 
> This is used in code as a boolean. Why does it need to be 32 bits?

No need, will change to bool.

> 
>> diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
> [...]
>> +struct mem_handle;
>> +struct platform_device;
>> +
>> +struct host1x_job_unpin_data {
>> +     struct mem_handle *h;
>> +     struct sg_table *mem;
>> +};
>> +
>> +enum mem_mgr_flag {
>> +     mem_mgr_flag_uncacheable = 0,
>> +     mem_mgr_flag_write_combine = 1,
>> +};
> 
> I'd like to see this use a more object-oriented approach and more common
> terminology. All of these handles are essentially buffer objects, so
> maybe something like host1x_bo would be a nice and short name.
> 
> To make this more object-oriented, I propose something like:
> 
>         struct host1x_bo_ops {
>                 int (*alloc)(struct host1x_bo *bo, size_t size, unsigned long align,
>                              unsigned long flags);
>                 int (*free)(struct host1x_bo *bo);
>                 ...
>         };
> 
>         struct host1x_bo {
>                 const struct host1x_bo_ops *ops;
>         };
> 
>         struct host1x_cma_bo {
>                 struct host1x_bo base;
>                 struct drm_gem_cma_object *obj;
>         };
> 
>         static inline struct host1x_cma_bo *to_host1x_cma_bo(struct host1x_bo *bo)
>         {
>                 return container_of(bo, struct host1x_cma_bo, base);
>         }
> 
>         static inline int host1x_bo_alloc(struct host1x_bo *bo, size_t size,
>                                           unsigned long align, unsigned long flags)
>         {
>                 return bo->ops->alloc(bo, size, align, flags);
>         }
> 
>         ...
> 
> That should be easy to extend with a new type of BO once the IOMMU-based
> allocator is ready. And as I said it is much closer in terminology to
> what other drivers do.

One complexity is that we're using the same type for communicating with
user space. Each buffer carries with it a flag indicating its allocator.
We might be able to model the internal structure to be more like what
you propose, but for the API we still need the flag.

> 
>> diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
>> index b46d044..255a3a3 100644
>> --- a/drivers/gpu/host1x/syncpt.h
>> +++ b/drivers/gpu/host1x/syncpt.h
>> @@ -26,6 +26,7 @@
>>  struct host1x;
>>
>>  #define NVSYNCPT_INVALID                     (-1)
>> +#define NVSYNCPT_GRAPHICS_HOST                       0
> 
> I think these should match other naming, so:
> 
>         #define HOST1X_SYNCPT_INVALID   -1
>         #define HOST1X_SYNCPT_HOST1X     0

Sure, sounds good.

> There are a few more occurrences where platform_device is used but I
> haven't commented on them. I don't think any of them won't work with
> just a struct device instead. Also I may not have caught all of the
> places where you should rather be using unsigned int instead of int,
> so you might want to look out for some of those.

Yes, we'll go through the code with this in mind.

> Generally I very much like where this is going. Are there any plans to
> move the userspace binary driver to this interface at some point so we
> can more actively test it? Also, is anything else blocking adding a
> gr3d device similar to gr2d from this patch series?

We're doing this in stages. I don't want to change the code base and
APIs both in one step, because big moves in both user and kernel space
tend to fail easily.

First we upstream code, and try to get feature parity. Then we
re-engineer our downstream driver delta on top of the upstream code, but
in this phase we keep the downstream kernel API.

In the next step, we'll start moving to the DRM APIs.

So, there's quite a few steps still before we're on DRM APIs, but we'll
reach it at some point. :-)

3D driver should work on top of this. I don't see anything blocking that.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-02-26  9:48     ` Terje Bergström
@ 2013-02-27  8:56       ` Thierry Reding
  2013-03-08 16:16       ` Terje Bergström
  1 sibling, 0 replies; 49+ messages in thread
From: Thierry Reding @ 2013-02-27  8:56 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 31649 bytes --]

On Tue, Feb 26, 2013 at 11:48:18AM +0200, Terje Bergström wrote:
> On 25.02.2013 17:24, Thierry Reding wrote:
> > On Tue, Jan 15, 2013 at 01:43:59PM +0200, Terje Bergstrom wrote:
[...]
> >> +/*
> >> + * Start timer for a buffer submition that has completed yet.
> > 
> > "submission". And I don't understand the "that has completed yet" part.
> 
> It should become "Start timer that tracks the time spent by the job".

Yes, that's a lot better.

> >> +     if (list_empty(&cdma->sync_queue) &&
> >> +                             cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
> >> +                     signal = true;
> > 
> > This looks funny, maybe:
> > 
> >         if (cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY &&
> >             list_empty(&cdma->sync_queue))
> >                 signal = true;
> > 
> > ?
> 
> Indenting at least is strange. I don't have a preference for the
> ordering of conditions, so if you like the latter order, we can just use
> that.

I just happen to find it easier to read that way. If you want to keep
the ordering that's fine, but the indentation needs to be fixed.

> >> +{
> >> +     u32 get_restart;
> > 
> > Maybe just call this "restart" or "restart_addr". get_restart sounds
> > like a function name.
> 
> Ok, how about "restart_dmaget_addr"? That indicates what we're doing
> with the restart address.

Sounds good.

> >> +     list_for_each_entry(job, &cdma->sync_queue, list) {
> >> +             if (syncpt_val < job->syncpt_end)
> >> +                     break;
> >> +
> >> +             host1x_job_dump(&dev->dev, job);
> >> +     }
> > 
> > That's potentially a lot of debug output. I wonder if it might make
> > sense to control parts of this via a module parameter. Then again, if
> > somebody really needs to debug this, maybe they really want *all* the
> > information.
> 
> host1x_job_dump() uses dev_dbg(), so it only dumps a lot if DEBUG has
> been defined in that file.

Okay, let's leave it like that then.

> >> +/*
> >> + * Destroy a cdma
> >> + */
> >> +void host1x_cdma_deinit(struct host1x_cdma *cdma)
> >> +{
> >> +     struct push_buffer *pb = &cdma->push_buffer;
> >> +     struct host1x *host1x = cdma_to_host1x(cdma);
> >> +
> >> +     if (cdma->running) {
> >> +             pr_warn("%s: CDMA still running\n",
> >> +                             __func__);
> >> +     } else {
> >> +             host1x->cdma_pb_op.destroy(pb);
> >> +             host1x->cdma_op.timeout_destroy(cdma);
> >> +     }
> >> +}
> > 
> > There's no way to recover from the situation where a cdma is still
> > running. Can this not return an error code (-EBUSY?) if the cdma can't
> > be destroyed?
> 
> It's called from close(), which cannot return an error code. It's
> actually more of a power optimization. The effect is that if there are
> no users for channel, we'll just not free up the push buffer.
> 
> I think the proper fix would actually be to check in host1x_cdma_init()
> if push buffer is already allocated and cdma->running. In that case we
> could skip most of initialization.

Yes, in that case it might be useful to do this. I still think it's
worth to return an error code to the caller, even if it can't be
propagated. That way the caller at least has the possibility to react.

I'm still not quite sure I understand the necessity for this, though.
Maybe you can give an example of when this will actually happen?

> >> +/*
> >> + * cdma
> >> + *
> >> + * This is in charge of a host command DMA channel.
> >> + * Sends ops to a push buffer, and takes responsibility for unpinning
> >> + * (& possibly freeing) of memory after those ops have completed.
> >> + * Producer:
> >> + *   begin
> >> + *           push - send ops to the push buffer
> >> + *   end - start command DMA and enqueue handles to be unpinned
> >> + * Consumer:
> >> + *   update - call to update sync queue and push buffer, unpin memory
> >> + */
> > 
> > I find the name to be a bit confusing. For some reason I automatically
> > think of GSM when I read CDMA. This really is more of a job queue, so
> > maybe calling it host1x_job_queue might be more appropriate. But I've
> > already requested a lot of things to be renamed, so I think I can live
> > with this being called CDMA if you don't want to change it.
> > 
> > Alternatively all of these could be moved to the struct host1x_channel
> > given that there's only one of each of the push_buffer, buffer_timeout
> > and host1x_cma objects per channel.
> 
> I did consider merging those two at a time. That should work, as they
> both deal with channels essentially. I also saw that the resulting file
> and data structures became quite large, so I have so far preferred to
> keep them separate.
> 
> This way I can keep the "higher level" stuff (inserting setclass,
> serializing, allocating sync point ranges, etc) in one file and lower
> level stuff (write to hardware, deal with push buffer pointers, etc) in
> another.

Alright. I can live with that.

> >> +int host1x_channel_submit(struct host1x_job *job)
> >> +{
> >> +     return host1x_get_host(job->ch->dev)->channel_op.submit(job);
> >> +}
> > 
> > I'd expect a function named host1x_channel_submit() to take a struct
> > host1x_channel *. Should this perhaps be called host1x_job_submit()?
> 
> It calls into channel code directly, and the underlying op also just
> takes a job. We could add channel as a parameter, and not pass it in
> host1x_job_alloc(). but we actually need the channel data already in
> host1x_job_pin(), which comes before submit. We need it so that we pin
> the buffer to correct engine.

That's all fine. My point was that this operates on a job object, so I'd
find it more intuitive if the function name reflected that. There's
nothing wrong with submitting a job without explicitly specifying the
channel if it is tied to one channel anyway.

host1x_channel_submit() would imply "submit channel", which doesn't make
sense, so the next best alternative is "submit job to channel", but that
isn't reflected in the parameters. So host1x_job_submit() fits pretty
well. There's no reason why it has to be prefixed host1x_channel_*,
right?

> >> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
> >> +{
> >> +     struct host1x_channel *ch = NULL;
> >> +     struct host1x *host1x = host1x_get_host(pdev);
> >> +     int chindex;
> >> +     int max_channels = host1x->info.nb_channels;
> >> +     int err;
> >> +
> >> +     mutex_lock(&host1x->chlist_mutex);
> >> +
> >> +     chindex = host1x->allocated_channels;
> >> +     if (chindex > max_channels)
> >> +             goto fail;
> >> +
> >> +     ch = kzalloc(sizeof(*ch), GFP_KERNEL);
> >> +     if (ch == NULL)
> >> +             goto fail;
> >> +
> >> +     /* Link platform_device to host1x_channel */
> >> +     err = host1x->channel_op.init(ch, host1x, chindex);
> >> +     if (err < 0)
> >> +             goto fail;
> >> +
> >> +     ch->dev = pdev;
> >> +
> >> +     /* Add to channel list */
> >> +     list_add_tail(&ch->list, &host1x->chlist.list);
> >> +
> >> +     host1x->allocated_channels++;
> >> +
> >> +     mutex_unlock(&host1x->chlist_mutex);
> >> +     return ch;
> >> +
> >> +fail:
> >> +     dev_err(&pdev->dev, "failed to init channel\n");
> >> +     kfree(ch);
> >> +     mutex_unlock(&host1x->chlist_mutex);
> >> +     return NULL;
> >> +}
> > 
> > I think the critical section could be shorter here. It's probably not
> > worth the extra trouble, though, given that channels are not often
> > allocated.
> 
> Yeah, boot time isn't measured in microseconds. :-) But, if we just make
> allocated_channels an atomic, we should be able to drop chlist_mutex
> altogether and it could simplify the code.

You still need to protect the list from concurrent modification.

> >> +/* channel list operations */
> >> +void host1x_channel_list_init(struct host1x *);
> >> +void host1x_channel_for_all(struct host1x *, void *data,
> >> +     int (*fptr)(struct host1x_channel *ch, void *fdata));
> >> +
> >> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev);
> >> +void host1x_channel_free(struct host1x_channel *ch);
> > 
> > Is it a good idea to make host1x_channel_free() publicly available?
> > Shouldn't the host1x_channel_alloc()/host1x_channel_request() return a
> > host1x_channel with a reference count of 1 and everybody release their
> > reference using host1x_channel_put() to make sure the channel is freed
> > only after the last reference disappears?
> > 
> > Otherwise whoever calls host1x_channel_free() will confuse everybody
> > else that's still keeping a reference.
> 
> The difference is that _put and _get are called to indicate how many
> user space processes there are for the channel. Even if there are no
> processes, we won't free the channel structure - we just freeze the channel.
> 
> _alloc and _free are different in that they actually create the channel
> structs and delete them and they follow the lifecycle of the driver.
> Perhaps we should figure new naming, but refcounting and alloc/free
> cannot be merged here.

I understand. Perhaps better names would be host1x_channel_setup() and
host1x_channel_teardown()?

> >> +{
> >> +     struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
> >> +     struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
> >> +
> >> +     mutex_lock(struct_mutex);
> >> +     drm_gem_object_reference(&obj->base);
> >> +     mutex_unlock(struct_mutex);
> > 
> > I think it's more customary to obtain a pointer to struct drm_device and
> > then use mutex_{lock,unlock}(&drm->struct_mutex). Or you could just use
> > drm_gem_object_reference_unlocked(&obj->base) instead. Which doesn't
> > exist yet, apparently. But it could be added.
> 
> I think we could take the former path - just refer to mutex in a
> different way.

You'll get extra points if you add the function =). The documentation in
Documentation/DocBook/drm.tmpl says that it exists, but it doesn't, so
you'd even be fixing a bug along the way.

> >> +int host1x_cma_pin_array_ids(struct platform_device *dev,
> >> +             long unsigned *ids,
> >> +             long unsigned id_type_mask,
> >> +             long unsigned id_type,
> >> +             u32 count,
> >> +             struct host1x_job_unpin_data *unpin_data,
> >> +             dma_addr_t *phys_addr)
> > 
> > struct device * and unsigned long please. count can also doesn't need to
> > be a sized type. unsigned int will do just fine. The return value can
> > also be unsigned int if you don't expect to return any error conditions.
> 
> I think we'll need to check these. ids probably needs to be a u32 *, and
> id_type_mask and id_type should be u32. They come like that from user space.

Okay. My main point was that it's more usual to use "unsigned long" than
"long unsigned":

	linux.git $ git grep -n 'long unsigned' | wc -l
	72
	linux.git $ git grep -n 'unsigned long' | wc -l
	106575

Also the more I think about it, the more I have doubts that passing
around IDs like this (or using ID types and masks) is the right thing to
do. I'll get back to that later.

> > 
> >> +     int allocated_channels;
> > 
> > unsigned int? And maybe just "num_channels"?
> 
> num_channels could be thought as "number of available channels", so I'd
> like to use num_allocated_channels here.

Okay.

> >> diff --git a/drivers/gpu/host1x/host1x.h b/drivers/gpu/host1x/host1x.h
> > [...]
> >> +enum host1x_class {
> >> +     NV_HOST1X_CLASS_ID              = 0x1,
> >> +     NV_GRAPHICS_2D_CLASS_ID         = 0x51,
> > 
> > This entry belongs in a later patch, right? And I find it convenient if
> > enumeration constants start with the enum name as prefix. Furthermore
> > it'd be nice to reuse the hardware module names, like so:
> > 
> >         enum host1x_class {
> >                 HOST1X_CLASS_HOST1X,
> >                 HOST1X_CLASS_GR2D,
> >                 HOST1X_CLASS_GR3D,
> >         };
> 
> The naming sounds good. We already use HOST1X_CLASS_HOST1X in code to
> insert a wait. If you'd prefer, we can move the definition of
> HOST1X_CLASS_GR2D to the later patch.

Yes, it's better to introduce it in the patch that first uses it.

> >> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> > [...]
> >> +#include <linux/slab.h>
> >> +#include <linux/scatterlist.h>
> >> +#include <linux/dma-mapping.h>
> >> +#include "cdma.h"
> >> +#include "channel.h"
> >> +#include "dev.h"
> >> +#include "memmgr.h"
> >> +
> >> +#include "cdma_hw.h"
> >> +
> >> +static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get)
> >> +{
> >> +     return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop)
> >> +             | HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst)
> >> +             | HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
> > 
> > I think it is more customary to put the | at the end of the preceding
> > line:
> > 
> >         return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop) |
> >                HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst) |
> >                HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
> > 
> > Also since these are all single bits, I'd prefer if you could drop the
> > _F suffix and not make them take a parameter. I think it'd even be
> > better not to have this function at all, but make the intent explicit
> > where the register is written. That is, have each call site set the bits
> > explicitly instead of calling this helper. Having a parameter list such
> > as (true, false, false) or (true, true, true) is confusing since you
> > have to keep looking up the meaning of the parameters.
> 
> The operation that the _F macros do is masking and bit shifting the
> fields correctly. Without that, we'd need to expose several macros to
> mask and shift, and I'd rather just have one macro to take care of that.
> 
> But, we can open code the function to wherever it's used if that's more
> readable.

I wasn't arguing against masking and shifting, but rather in favour of
treating these like normal bit definitions. So instead of passing a
boolean parameter to the macro, you just don't use it if the bit isn't
supposed to be set. And if you want to set the bit you or in the value.

So:

	static inline u32 host1x_channel_dmactrl_dmastop(void)
	{
		return 1 << 0;
	}

	#define HOST1X_CHANNEL_DMACTRL_DMASTOP \
		host1x_channel_dmactrl_dmastop()

> >> +/*
> >> + * Similar to cdma_start(), but rather than starting from an idle
> >> + * state (where DMA GET is set to DMA PUT), on a timeout we restore
> >> + * DMA GET from an explicit value (so DMA may again be pending).
> >> + */
> >> +static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
> >> +{
> >> +     struct host1x *host1x = cdma_to_host1x(cdma);
> >> +     struct host1x_channel *ch = cdma_to_channel(cdma);
> >> +
> >> +     if (cdma->running)
> >> +             return;
> >> +
> >> +     cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
> >> +
> >> +     host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
> >> +             HOST1X_CHANNEL_DMACTRL);
> >> +
> >> +     /* set base, end pointer (all of memory) */
> >> +     host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART);
> >> +     host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND);
> > 
> > According to the TRM, writing to HOST1X_CHANNEL_DMASTART will start a
> > DMA transfer on the channel (if DMA_PUT != DMA_GET). Irrespective of
> > that, why set the valid range to all of physical memory? We know the
> > valid range of the push buffer, why not set the limits accordingly?
> 
> That'd make sense. Currently we use the RESTART as the barrier, but
> having hardware check against DMAEND is a good idea, too.

Any reason why DMASTART shouldn't be used to restrict the range as well?

> >> +/*
> >> + * Kick channel DMA into action by writing its PUT offset (if it has changed)
> >> + */
> >> +static void cdma_kick(struct host1x_cdma *cdma)
> >> +{
> >> +     struct host1x *host1x = cdma_to_host1x(cdma);
> >> +     struct host1x_channel *ch = cdma_to_channel(cdma);
> >> +     u32 put;
> >> +
> >> +     put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
> >> +
> >> +     if (put != cdma->last_put) {
> >> +             host1x_ch_writel(ch, put, HOST1X_CHANNEL_DMAPUT);
> >> +             cdma->last_put = put;
> >> +     }
> >> +}
> > 
> > kick() sounds unusual. Maybe flush or commit or something similar would
> > be more accurate.
> 
> We could use flush.

Great.

> >> +     host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
> >> +
> >> +     cdma->torndown = false;
> >> +     cdma_timeout_restart(cdma, getptr);
> >> +}
> > 
> > I find this a bit non-intuitive. We teardown a channel, and when we're
> > done tearing down, the torndown variable is set to false and the channel
> > is actually restarted. Maybe you could explain some more how this works
> > and what its purpose is.
> 
> Actually, teardown_begin freezes the channel, then we manipulate the
> queue, and in the end teardown_end restarts the channel. So these should
> be named freeze and resume. We could even drop the timeout from the
> names of these functions.

Sounds good.

> >> +     /* stop processing to get a clean snapshot */
> >> +     prev_cmdproc = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
> >> +     cmdproc_stop = prev_cmdproc | BIT(ch->chid);
> >> +     host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
> >> +
> >> +     dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
> >> +             prev_cmdproc, cmdproc_stop);
> >> +
> >> +     syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
> >> +
> >> +     /* has buffer actually completed? */
> >> +     if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
> >> +             dev_dbg(&host1x->dev->dev,
> >> +                      "cdma_timeout: expired, but buffer had completed\n");
> > 
> > Maybe this should really be a warning?
> 
> Not really - it's actually just a normal state. We got a timeout event,
> but before we process it, it might be that the job manages to complete.
> This can happen, and is not an error case.

Okay, I see. That's fine then.

> >> +     for (i = 0 ; i < job->num_gathers; i++) {
> >> +             struct host1x_job_gather *g = &job->gathers[i];
> >> +             u32 op1 = host1x_opcode_gather(g->words);
> >> +             u32 op2 = g->mem_base + g->offset;
> >> +             host1x_cdma_push_gather(&job->ch->cdma,
> >> +                             job->gathers[i].ref,
> >> +                             job->gathers[i].offset,
> >> +                             op1, op2);
> >> +     }
> >> +}
> > 
> > Perhaps inline this into channel_submit()? I'm not sure how useful it
> > really is to split off smallish functions such as this which aren't
> > reused anywhere else. I don't have any major objection though, so you
> > can keep it separate if you want.
> 
> I split these out because channel_submit() became so long that I
> couldn't understand it anymore. I'd prefer keeping separate just to keep
> myself (semi-)sane.

Okay. =)

> >> diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
> > [...]
> >>  #include "hw/host1x01.h"
> >>  #include "dev.h"
> >> +#include "channel.h"
> >>  #include "hw/host1x01_hardware.h"
> >>
> >> +#include "hw/channel_hw.c"
> >> +#include "hw/cdma_hw.c"
> >>  #include "hw/syncpt_hw.c"
> >>  #include "hw/intr_hw.c"
> >>
> >>  int host1x01_init(struct host1x *host)
> >>  {
> >> +     host->channel_op = host1x_channel_ops;
> >> +     host->cdma_op = host1x_cdma_ops;
> >> +     host->cdma_pb_op = host1x_pushbuffer_ops;
> >>       host->syncpt_op = host1x_syncpt_ops;
> >>       host->intr_op = host1x_intr_ops;
> > 
> > I think I mentioned this before, but I'd prefer not to have the .c files
> > included here, but rather reference the ops structures externally. But I
> > still think that especially CDMA and push buffer ops don't need to be in
> > separate structures since they aren't likely to change with new hardware
> > revisions.
> 
> The C files need to be included here so that they pick up the hardware
> defs for the correct SoC. Pushbuffer is probably something we can
> generalize, but channel registers can change, so they need to be per SoC.

We can do the same using extern variables, can't we? If you're concerned
about the definitions that come from the headers, we can probably make
that work by parameterizing more.

I think we can live with this way for now and clean it up later, though.

> >> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
> > [...]
> >> +#define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \
> >> +     host1x_channel_dmactrl_dmastop_f(v)
> > 
> > I mentioned this elsewhere already, but I think the _F suffix (and _f
> > for that matter) along with the v parameter should go away.
> 
> I'd prefer keeping so that I don't have to use two #defines to replace
> one. That IMO makes the usage harder and more error prone.

That's precisely my point. This actually makes it harder to use. If you
don't want to set the bit, just don't or it in. It's completely
pointless to shift and mask an unset bit.

> >> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
> > [...]
> > 
> > What does the "uclass" stand for? It seems a bit useless to me.
> 
> It means host1x class, i.e. the host1x registers that can be written to
> from push buffers.

I still don't understand why we need uclass. It seems redundant.

> >> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
> > [...]
> >> +static void action_submit_complete(struct host1x_waitlist *waiter)
> >> +{
> >> +     struct host1x_channel *channel = waiter->data;
> >> +     int nr_completed = waiter->count;
> > 
> > No need for this variable.
> 
> I'm using it for tracing in a follow-up patch. It can be used in traces
> for checking the queue length at each point of time.

Any reason why it can't be introduced in the follow-up patch?

> >> +struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
> >> +             u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
> > 
> > Maybe make the parameters unsigned int instead of u32?
> 
> I'll check this, but we're getting them from user space, and that API
> has a fixed length field. That's why I'm carrying that type over.

Okay, it isn't that important.

> >> +void host1x_job_add_gather(struct host1x_job *job,
> >> +             u32 mem_id, u32 words, u32 offset)
> >> +{
> >> +     struct host1x_job_gather *cur_gather =
> >> +                     &job->gathers[job->num_gathers];
> > 
> > Should this check for overflow?
> 
> As defensive measure, could do, but this is not exploitable.

Alright then.

> >> +
> >> +             /* remove completed reloc from the job */
> >> +             if (i != job->num_relocs - 1) {
> >> +                     struct host1x_reloc *reloc_last =
> >> +                             &job->relocarray[job->num_relocs - 1];
> >> +                     reloc->cmdbuf_mem       = reloc_last->cmdbuf_mem;
> >> +                     reloc->cmdbuf_offset    = reloc_last->cmdbuf_offset;
> >> +                     reloc->target           = reloc_last->target;
> >> +                     reloc->target_offset    = reloc_last->target_offset;
> >> +                     reloc->shift            = reloc_last->shift;
> >> +                     job->reloc_addr_phys[i] =
> >> +                             job->reloc_addr_phys[job->num_relocs - 1];
> >> +                     job->num_relocs--;
> >> +             } else {
> >> +                     break;
> >> +             }
> >> +     }
> >> +
> >> +     if (cmdbuf_page_addr)
> >> +             host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
> >> +
> >> +     return 0;
> >> +}
> > 
> > Also the algorithm seems a bit strange and hard to follow. Instead of
> > removing relocs from the job, replacing them with the last entry and
> > decrementing job->num_relocs, how much is the penalty for always
> > iterating over all relocs? This is one of the other cases where I'd
> > argue that simplicity is key. Furthermore you need to copy quite a bit
> > of data to replace the completed relocs, so I'm not sure it buys you
> > much.
> > 
> > It could always be optimized later on by just setting a bit in the reloc
> > to mark it as completed, or keep a bitmask of completed relocations or
> > whatever.
> 
> This was done in a big optimization patch, but we'll check if we could
> remove this. Previously we just set cmdbuf_mem for the completed reloc
> to 0, and that should work in this case.

That certainly sounds simpler.

> >> +             int err = PTR_ERR(job->gather_copy_mapped);
> >> +             job->gather_copy_mapped = NULL;
> >> +             return err;
> >> +     }
> >> +
> >> +     job->gather_copy_size = size;
> >> +
> >> +     for (i = 0; i < job->num_gathers; i++) {
> >> +             struct host1x_job_gather *g = &job->gathers[i];
> >> +             void *gather = host1x_memmgr_mmap(g->ref);
> >> +             memcpy(job->gather_copy_mapped + offset,
> >> +                             gather + g->offset,
> >> +                             g->words * sizeof(u32));
> >> +
> >> +             g->mem_base = job->gather_copy;
> >> +             g->offset = offset;
> >> +             g->mem_id = 0;
> >> +             g->ref = 0;
> >> +
> >> +             host1x_memmgr_munmap(g->ref, gather);
> >> +             offset += g->words * sizeof(u32);
> >> +     }
> >> +
> >> +     return 0;
> >> +}
> > 
> > I wonder, where's this DMA buffer actually used? I can't find any use
> > between this copy and the corresponding dma_free_writecombine() call.
> 
> We replace the gathers in host1x_job with the ones we allocate here, so
> they are used when pushing the gather's to hardware.
> 
> This is done so that user space cannot tamper with the gathers once
> they've been checked by firewall.

Oh, I had missed how g->mem_base is assigned job->gather_copy, so I had
thought the memory wasn't used anywhere. I wonder if it wouldn't be more
efficient to pre-allocate this buffer. We number of gathers is limited
by HOST1X_GATHER_QUEUE_SIZE, right? So we could allocate a buffer of the
appropriate size for each job to avoid continuously reallocating and
freeing everytime the job in pinned or unpinned.

Also jobs are allocated for each submit and allocating them is quite
expensive, so eventually we may want to pool them. Which will not be
trivial though, given that it requires the number of command buffers
and relocs to match. Some clever checks can probably make this work,
though.

> >> +     /* Null kickoff prevents submit from being sent to hardware */
> >> +     bool null_kickoff;
> > 
> > I don't think this is used anywhere.
> 
> True, we can remove this as we haven't posted the code for null kickoff.

Make sure to explain what this is used for when you post. The one
comment above is a bit vague.

> >> +     /* Check if register is marked as an address reg */
> >> +     int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);
> > 
> > is_addr_reg() sounds a bit unusual. Maybe match this to the name of the
> > main firewall routine, validate()?
> 
> The point of this op is to just tell if a register for a class is
> pointing to a buffer. validate then uses this information. But both
> answers (yes/no) and both types of registers are still valid, so
> validate() wouldn't be the proper name.
> 
> validation is then done by checking that there's a reloc corresponding
> to each register write to a register that can hold an address.

I just remembered that we discussed this already and I think we agreed
that a table lookup might be a better implementation. That'd get rid of
the naming issue altogether, since you can just name the table something
like address_registers, which is quite unambiguous.

> >> diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
> > [...]
> >> +struct mem_handle;
> >> +struct platform_device;
> >> +
> >> +struct host1x_job_unpin_data {
> >> +     struct mem_handle *h;
> >> +     struct sg_table *mem;
> >> +};
> >> +
> >> +enum mem_mgr_flag {
> >> +     mem_mgr_flag_uncacheable = 0,
> >> +     mem_mgr_flag_write_combine = 1,
> >> +};
> > 
> > I'd like to see this use a more object-oriented approach and more common
> > terminology. All of these handles are essentially buffer objects, so
> > maybe something like host1x_bo would be a nice and short name.
> > 
> > To make this more object-oriented, I propose something like:
> > 
> >         struct host1x_bo_ops {
> >                 int (*alloc)(struct host1x_bo *bo, size_t size, unsigned long align,
> >                              unsigned long flags);
> >                 int (*free)(struct host1x_bo *bo);
> >                 ...
> >         };
> > 
> >         struct host1x_bo {
> >                 const struct host1x_bo_ops *ops;
> >         };
> > 
> >         struct host1x_cma_bo {
> >                 struct host1x_bo base;
> >                 struct drm_gem_cma_object *obj;
> >         };
> > 
> >         static inline struct host1x_cma_bo *to_host1x_cma_bo(struct host1x_bo *bo)
> >         {
> >                 return container_of(bo, struct host1x_cma_bo, base);
> >         }
> > 
> >         static inline int host1x_bo_alloc(struct host1x_bo *bo, size_t size,
> >                                           unsigned long align, unsigned long flags)
> >         {
> >                 return bo->ops->alloc(bo, size, align, flags);
> >         }
> > 
> >         ...
> > 
> > That should be easy to extend with a new type of BO once the IOMMU-based
> > allocator is ready. And as I said it is much closer in terminology to
> > what other drivers do.
> 
> One complexity is that we're using the same type for communicating with
> user space. Each buffer carries with it a flag indicating its allocator.
> We might be able to model the internal structure to be more like what
> you propose, but for the API we still need the flag.

I disagree. I don't see any need for passing around the type at all.
We've discussed this a few times already, and correct me if I'm wrong,
but I think we agreed that we don't want to mix handle/buffer types.

We only support CMA for now, so all buffers will be allocated from CMA.
Once the IOMMU-based allocator is ready we'll want to switch to that for
Tegra30 and later, but stick to CMA for Tegra20 since the GART isn't
very usable.

So the way I see it, the decision about which allocator to use is done
once at driver probe time. So all that's really needed is a function
that allocates a buffer object and returns the proper one for the given
Tegra SoC. Once a host1x_bo object is returned it can be used throughout
and we get rid of the additional memmgr abstraction. I think it'll make
things much simpler.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-02-26  9:48     ` Terje Bergström
  2013-02-27  8:56       ` Thierry Reding
@ 2013-03-08 16:16       ` Terje Bergström
  2013-03-08 20:43         ` Thierry Reding
  1 sibling, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-03-08 16:16 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 26.02.2013 11:48, Terje Bergström wrote:
> On 25.02.2013 17:24, Thierry Reding wrote:
>> You use two different styles to indent the function parameters. You
>> might want to stick to one, preferably aligning them with the first
>> parameter on the first line.
> 
> I've generally favored "two tabs" indenting, but we'll anyway
> standardize on one.

We standardized on the convention used in tegradrm, i.e. aligning with
first parameter.

>> There's nothing in this function that requires a platform_device, so
>> passing struct device should be enough. Or maybe host1x_cdma should get
>> a struct device * field?
> 
> I think we'll just start using struct device * in general in code.
> Arto's been already fixing a lot of these, so he might've already fixed
> this.

We did a sweep in the code and now I hope everything that can, uses
struct device *. The side effect was getting rid of a lot of casting,
which is good.

>> Why don't you use any of the kernel's reference counting mechanisms?
>>
>>> +void host1x_channel_put(struct host1x_channel *ch)
>>> +{
>>> +     mutex_lock(&ch->reflock);
>>> +     if (ch->refcount == 1) {
>>> +             host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
>>> +             host1x_cdma_deinit(&ch->cdma);
>>> +     }
>>> +     ch->refcount--;
>>> +     mutex_unlock(&ch->reflock);
>>> +}
>>
>> I think you can do all of this using a kref.
> 
> I think the original reason was that there's no reason to use atomic
> kref, as we anyway have to do mutual exclusion via mutex. But, using
> kref won't be any problem, so we could use that.

Actually, we ended up with a problem with this. kref assumes that once
refcount goes to zero, it gets destroyed. In ch->refcount, going to zero
is just fine and just indicates that we need to initialize. And, we
anyway need to do locking, so we didn't do the conversion to kref.

>>> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
>>> +{
(...)
>>> +}
>>
>> I think the critical section could be shorter here. It's probably not
>> worth the extra trouble, though, given that channels are not often
>> allocated.
> 
> Yeah, boot time isn't measured in microseconds. :-) But, if we just make
> allocated_channels an atomic, we should be able to drop chlist_mutex
> altogether and it could simplify the code.

There wasn't much we could have moved outside the critical section, so
we didn't touch this area.

>> Also, is it really necessary to abstract these into an ops structure? I
>> get that newer hardware revisions might require different ops for sync-
>> point handling because the register layout or number of syncpoints may
>> be different, but the CDMA and push buffer (below) concepts are pretty
>> much a software abstraction, and as such its implementation is unlikely
>> to change with some future hardware revision.
> 
> Pushbuffer ops can become generic. There's only one catch - init uses
> the restart opcode. But the opcode is not going to change, so we can
> generalize that.

We ended up keeping the init as an operation, but rest of push buffer
ops became generic.

>>
>>> +/*
>>> + * Push two words to the push buffer
>>> + * Caller must ensure push buffer is not full
>>> + */
>>> +static void push_buffer_push_to(struct push_buffer *pb,
>>> +             struct mem_handle *handle,
>>> +             u32 op1, u32 op2)
>>> +{
>>> +     u32 cur = pb->cur;
>>> +     u32 *p = (u32 *)((u32)pb->mapped + cur);
>>
>> You do all this extra casting to make sure to increment by bytes and not
>> 32-bit words. How about you change pb->cur to contain the word index, so
>> that you don't have to go through hoops each time around.

When we changed DMASTART and DMAEND to actually denote the push buffer
area, we noticed that DMAGET and DMAPUT are actually relative to
DMASTART and DMAEND. This and the need to access both CPU and device
virtual addresses coupled with changing to word indexes didn't actually
simplify the code, so we kept still using byte indexes.

>>
>>> +/*
>>> + * Return the number of two word slots free in the push buffer
>>> + */
>>> +static u32 push_buffer_space(struct push_buffer *pb)
>>> +{
>>> +     return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
>>> +}
>>
>> Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than
>> PUSH_BUFFER_SIZE, can it?
> 
> You're right, this function doesn't need to worry about wrapping.

Arto noticed this, but actually I was wrong - the wrapping is very
possible. We just have to remember that if we're processing something at
the end of push buffer, cur might be in the end, and fence in the beginning.

>>> diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
>> [...]
>>> +struct mem_handle;
>>> +struct platform_device;
>>> +
>>> +struct host1x_job_unpin_data {
>>> +     struct mem_handle *h;
>>> +     struct sg_table *mem;
>>> +};
>>> +
>>> +enum mem_mgr_flag {
>>> +     mem_mgr_flag_uncacheable = 0,
>>> +     mem_mgr_flag_write_combine = 1,
>>> +};
>>
>> I'd like to see this use a more object-oriented approach and more common
>> terminology. All of these handles are essentially buffer objects, so
>> maybe something like host1x_bo would be a nice and short name.

We did this a bit differently, but following pretty much the same
principles. We have host1x_mem_handle, which contains an ops pointer.
The handle gets encapsulated inside drm_gem_cma_object.

_bo structs seem to usually contains a drm_gem_object, so we thought
it's better not to reuse that term.

Please check the code and let us know what you think. This pretty much
follows what Lucas proposed a while ago, and keeps neatly the DRM
specific parts inside the drm directory.

Other than these, we should have implemented all changes that we agreed
to include. If something's missing, it's because there were so many that
we just dropped the ball.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-03-08 16:16       ` Terje Bergström
@ 2013-03-08 20:43         ` Thierry Reding
  2013-03-11  6:29           ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-03-08 20:43 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1848 bytes --]

On Fri, Mar 08, 2013 at 06:16:16PM +0200, Terje Bergström wrote:
> On 26.02.2013 11:48, Terje Bergström wrote:
> > On 25.02.2013 17:24, Thierry Reding wrote:
[...]
> >>> +struct mem_handle;
> >>> +struct platform_device;
> >>> +
> >>> +struct host1x_job_unpin_data {
> >>> +     struct mem_handle *h;
> >>> +     struct sg_table *mem;
> >>> +};
> >>> +
> >>> +enum mem_mgr_flag {
> >>> +     mem_mgr_flag_uncacheable = 0,
> >>> +     mem_mgr_flag_write_combine = 1,
> >>> +};
> >>
> >> I'd like to see this use a more object-oriented approach and more common
> >> terminology. All of these handles are essentially buffer objects, so
> >> maybe something like host1x_bo would be a nice and short name.
> 
> We did this a bit differently, but following pretty much the same
> principles. We have host1x_mem_handle, which contains an ops pointer.
> The handle gets encapsulated inside drm_gem_cma_object.
> 
> _bo structs seem to usually contains a drm_gem_object, so we thought
> it's better not to reuse that term.
> 
> Please check the code and let us know what you think. This pretty much
> follows what Lucas proposed a while ago, and keeps neatly the DRM
> specific parts inside the drm directory.

A bo is just a buffer object, so I don't see why the name shouldn't be
used. The name is in no way specific to DRM or GEM. But the point that I
was trying to make was that there is nothing to suggest that we couldn't
use drm_gem_object as the underlying scaffold to base all host1x buffer
objects on.

Furthermore I don't understand why you've chosen this approach. It is
completely different from what other drivers do and therefore makes it
more difficult to comprehend. That alone I could live with if there were
any advantages to that approach, but as far as I can tell there are
none.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-03-08 20:43         ` Thierry Reding
@ 2013-03-11  6:29           ` Terje Bergström
  2013-03-11  7:18             ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-03-11  6:29 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 08.03.2013 22:43, Thierry Reding wrote:
> A bo is just a buffer object, so I don't see why the name shouldn't
> be used. The name is in no way specific to DRM or GEM. But the point
> that I was trying to make was that there is nothing to suggest that
> we couldn't use drm_gem_object as the underlying scaffold to base all
> host1x buffer objects on.
> 
> Furthermore I don't understand why you've chosen this approach. It
> is completely different from what other drivers do and therefore
> makes it more difficult to comprehend. That alone I could live with
> if there were any advantages to that approach, but as far as I can
> tell there are none.

I was following the plan we agreed on earlier in email discussion with
you and Lucas:

On 29.11.2012 11:09, Lucas Stach wrote:
> We should aim for a clean split here. GEM handles are something which
> is really specific to how DRM works and as such should be constructed
> by tegradrm. nvhost should really just manage allocations/virtual
> address space and provide something that is able to back all the GEM
> handle operations.
> 
> nvhost has really no reason at all to even know about GEM handles.
> If you back a GEM object by a nvhost object you can just peel out
> the nvhost handles from the GEM wrapper in the tegradrm submit ioctl
> handler and queue the job to nvhost using it's native handles.
> 
> This way you would also be able to construct different handles (like
> GEM obj or V4L2 buffers) from the same backing nvhost object. Note
> that I'm not sure how useful this would be, but it seems like a
> reasonable design to me being able to do so.

With this structure, we are already prepared for non-DRM APIs. Tt's a
matter of familiarity of code versus future expansion. Code paths for
both are as simple/complex, so neither has a direct technical
superiority in performance.

I know other DRM drivers have opted to hard code GEM dependency
throughout the code. Then again, host1x hardware is managing much more
than graphics, so we need to think outside the DRM box, too.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-03-11  6:29           ` Terje Bergström
@ 2013-03-11  7:18             ` Thierry Reding
  2013-03-11  9:21               ` Terje Bergström
  0 siblings, 1 reply; 49+ messages in thread
From: Thierry Reding @ 2013-03-11  7:18 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3826 bytes --]

On Mon, Mar 11, 2013 at 08:29:59AM +0200, Terje Bergström wrote:
> On 08.03.2013 22:43, Thierry Reding wrote:
> > A bo is just a buffer object, so I don't see why the name shouldn't
> > be used. The name is in no way specific to DRM or GEM. But the point
> > that I was trying to make was that there is nothing to suggest that
> > we couldn't use drm_gem_object as the underlying scaffold to base all
> > host1x buffer objects on.
> > 
> > Furthermore I don't understand why you've chosen this approach. It
> > is completely different from what other drivers do and therefore
> > makes it more difficult to comprehend. That alone I could live with
> > if there were any advantages to that approach, but as far as I can
> > tell there are none.
> 
> I was following the plan we agreed on earlier in email discussion with
> you and Lucas:
> 
> On 29.11.2012 11:09, Lucas Stach wrote:
> > We should aim for a clean split here. GEM handles are something which
> > is really specific to how DRM works and as such should be constructed
> > by tegradrm. nvhost should really just manage allocations/virtual
> > address space and provide something that is able to back all the GEM
> > handle operations.
> > 
> > nvhost has really no reason at all to even know about GEM handles.
> > If you back a GEM object by a nvhost object you can just peel out
> > the nvhost handles from the GEM wrapper in the tegradrm submit ioctl
> > handler and queue the job to nvhost using it's native handles.
> > 
> > This way you would also be able to construct different handles (like
> > GEM obj or V4L2 buffers) from the same backing nvhost object. Note
> > that I'm not sure how useful this would be, but it seems like a
> > reasonable design to me being able to do so.
> 
> With this structure, we are already prepared for non-DRM APIs. Tt's a
> matter of familiarity of code versus future expansion. Code paths for
> both are as simple/complex, so neither has a direct technical
> superiority in performance.
> 
> I know other DRM drivers have opted to hard code GEM dependency
> throughout the code. Then again, host1x hardware is managing much more
> than graphics, so we need to think outside the DRM box, too.

This sound a bit over-engineered at this point in time. DRM is currently
the only user. Is anybody working on any non-DRM drivers that would use
this?

Even that aside, I don't think host1x_mem_handle is a good choice of
name here. The objects are much more than handles. They are in fact
buffer objects, which can optionally be attached to a handle. I also
think that using a void * to store the handle specific data isn't such a
good idea.

So how about the following proposal, which I think might satisfy both of
us:

	struct host1x_bo;

	struct host1x_bo_ops {
		struct host1x_bo *(*get)(struct host1x_bo *bo);
		void (*put)(struct host1x_bo *bo);
		dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
		...
	};

	struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
	void host1x_bo_put(struct host1x_bo *bo);
	dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
	...

	struct host1x_bo {
		const struct host1x_bo_ops *ops;
		...
	};

	struct tegra_drm_bo {
		struct host1x_bo base;
		...
	};

That way you can get rid of the host1x_memmgr_create_handle() helper and
instead embed host1x_bo into driver-/framework-specific structures with
the necessary initialization.

It also allows you to interact directly with the objects instead of
having to go through the memmgr API. The memory manager doesn't really
exist anymore so keeping the name in the API is only confusing. Your
current proposal deals with memory handles directly already so it's
really just making the naming more consistent.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-03-11  7:18             ` Thierry Reding
@ 2013-03-11  9:21               ` Terje Bergström
  2013-03-11  9:41                 ` Thierry Reding
  0 siblings, 1 reply; 49+ messages in thread
From: Terje Bergström @ 2013-03-11  9:21 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

On 11.03.2013 09:18, Thierry Reding wrote:
> This sound a bit over-engineered at this point in time. DRM is currently
> the only user. Is anybody working on any non-DRM drivers that would use
> this?

Well, this contains beginning of that:

http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/video/tegra_v4l2_camera.c;h=644d0be5380367aca4c826c49724c03aad08387c;hb=l4t/l4t-r16-r2

I don't want to give these guys any excuse not to port it over to host1x
code base. :-)

> Even that aside, I don't think host1x_mem_handle is a good choice of
> name here. The objects are much more than handles. They are in fact
> buffer objects, which can optionally be attached to a handle. I also
> think that using a void * to store the handle specific data isn't such a
> good idea.

Naming if not an issue for me - we can easily agree on using _bo.

> So how about the following proposal, which I think might satisfy both of
> us:
> 
> 	struct host1x_bo;
> 
> 	struct host1x_bo_ops {
> 		struct host1x_bo *(*get)(struct host1x_bo *bo);
> 		void (*put)(struct host1x_bo *bo);
> 		dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
> 		...
> 	};
> 
> 	struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
> 	void host1x_bo_put(struct host1x_bo *bo);
> 	dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
> 	...
> 
> 	struct host1x_bo {
> 		const struct host1x_bo_ops *ops;
> 		...
> 	};
> 
> 	struct tegra_drm_bo {
> 		struct host1x_bo base;
> 		...
> 	};
> 
> That way you can get rid of the host1x_memmgr_create_handle() helper and
> instead embed host1x_bo into driver-/framework-specific structures with
> the necessary initialization.

This would make sense. We'll get back when we have enough of
implementation done to understand it all. One consequence is that we
cannot use drm_gem_cma_create() anymore. We'll have to introduce a
function that does the same as drm_gem_cma_create(), but it takes a
pre-allocated drm_gem_cma_object pointer. That way we can allocate the
struct, and use DRM CMA just to initialize the drm_gem_cma_object.

Other way would be just taking a copy of DRM CMA helper, but I'd like to
defer that to the next step when we implement IOMMU aware allocator.

> It also allows you to interact directly with the objects instead of
> having to go through the memmgr API. The memory manager doesn't really
> exist anymore so keeping the name in the API is only confusing. Your
> current proposal deals with memory handles directly already so it's
> really just making the naming more consistent.

The memmgr APIs are currently just a shortcut wrapper to the ops, so in
that sense the memmgr does not really exist. I think it might still make
sense to keep static inline wrappers for calling the ops within, but we
could rename them to host1x_bo_somethingandother. Then it'd follow the
pattern we are using for the hw ops in the latest set.

Terje

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support
  2013-03-11  9:21               ` Terje Bergström
@ 2013-03-11  9:41                 ` Thierry Reding
  0 siblings, 0 replies; 49+ messages in thread
From: Thierry Reding @ 2013-03-11  9:41 UTC (permalink / raw)
  To: Terje Bergström
  Cc: Arto Merilainen, airlied, dri-devel, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3777 bytes --]

On Mon, Mar 11, 2013 at 11:21:05AM +0200, Terje Bergström wrote:
> On 11.03.2013 09:18, Thierry Reding wrote:
> > This sound a bit over-engineered at this point in time. DRM is currently
> > the only user. Is anybody working on any non-DRM drivers that would use
> > this?
> 
> Well, this contains beginning of that:
> 
> http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/video/tegra_v4l2_camera.c;h=644d0be5380367aca4c826c49724c03aad08387c;hb=l4t/l4t-r16-r2
> 
> I don't want to give these guys any excuse not to port it over to host1x
> code base. :-)

I was aware of that driver but I didn't realize it had been available
publicly. It's great to see this, though, and one more argument in
favour of not binding the host1x_bo too tightly to DRM/GEM.

> > So how about the following proposal, which I think might satisfy both of
> > us:
> > 
> > 	struct host1x_bo;
> > 
> > 	struct host1x_bo_ops {
> > 		struct host1x_bo *(*get)(struct host1x_bo *bo);
> > 		void (*put)(struct host1x_bo *bo);
> > 		dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
> > 		...
> > 	};
> > 
> > 	struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
> > 	void host1x_bo_put(struct host1x_bo *bo);
> > 	dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
> > 	...
> > 
> > 	struct host1x_bo {
> > 		const struct host1x_bo_ops *ops;
> > 		...
> > 	};
> > 
> > 	struct tegra_drm_bo {
> > 		struct host1x_bo base;
> > 		...
> > 	};
> > 
> > That way you can get rid of the host1x_memmgr_create_handle() helper and
> > instead embed host1x_bo into driver-/framework-specific structures with
> > the necessary initialization.
> 
> This would make sense. We'll get back when we have enough of
> implementation done to understand it all. One consequence is that we
> cannot use drm_gem_cma_create() anymore. We'll have to introduce a
> function that does the same as drm_gem_cma_create(), but it takes a
> pre-allocated drm_gem_cma_object pointer. That way we can allocate the
> struct, and use DRM CMA just to initialize the drm_gem_cma_object.

I certainly think that introducing a drm_gem_cma_object_init() function
shouldn't pose a problem. If you do, make sure to update the existing
drm_gem_cma_create() to use it. Having both lets users have the choice
to use drm_gem_cma_create() if they don't need to embed it, or
drm_gem_cma_object_init() otherwise.

> Other way would be just taking a copy of DRM CMA helper, but I'd like to
> defer that to the next step when we implement IOMMU aware allocator.

I'm not sure I understand what you're saying, but if you add a function
as discussed above this shouldn't be necessary.

> > It also allows you to interact directly with the objects instead of
> > having to go through the memmgr API. The memory manager doesn't really
> > exist anymore so keeping the name in the API is only confusing. Your
> > current proposal deals with memory handles directly already so it's
> > really just making the naming more consistent.
> 
> The memmgr APIs are currently just a shortcut wrapper to the ops, so in
> that sense the memmgr does not really exist. I think it might still make
> sense to keep static inline wrappers for calling the ops within, but we
> could rename them to host1x_bo_somethingandother. Then it'd follow the
> pattern we are using for the hw ops in the latest set.

Yes, that's exactly what I had in mind in the above proposal. They could
be inline, but it's probably also okay if they're not. They aren't meant
to be used very frequently so the extra function call shouldn't matter
much. It might be easier to do add some additional checks if they aren't
inlined. I'm fine either way.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2013-03-11  9:41 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-15 11:43 [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergstrom
2013-01-15 11:43 ` [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver Terje Bergstrom
2013-02-04  9:09   ` Thierry Reding
2013-02-05  3:30     ` Terje Bergström
2013-02-05  7:43       ` Thierry Reding
2013-02-06 20:13         ` Terje Bergström
2013-01-15 11:43 ` [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts Terje Bergstrom
2013-02-04 10:30   ` Thierry Reding
2013-02-05  4:29     ` Terje Bergström
2013-02-05  8:42       ` Thierry Reding
2013-02-06 20:29         ` Terje Bergström
2013-02-06 20:38           ` Thierry Reding
2013-02-06 20:41             ` Terje Bergström
2013-01-15 11:43 ` [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support Terje Bergstrom
2013-02-25 15:24   ` Thierry Reding
2013-02-26  9:48     ` Terje Bergström
2013-02-27  8:56       ` Thierry Reding
2013-03-08 16:16       ` Terje Bergström
2013-03-08 20:43         ` Thierry Reding
2013-03-11  6:29           ` Terje Bergström
2013-03-11  7:18             ` Thierry Reding
2013-03-11  9:21               ` Terje Bergström
2013-03-11  9:41                 ` Thierry Reding
2013-01-15 11:44 ` [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support Terje Bergstrom
2013-02-04 11:03   ` Thierry Reding
2013-02-05  4:41     ` Terje Bergström
2013-02-05  9:15       ` Thierry Reding
2013-02-06 20:58         ` Terje Bergström
2013-02-08  6:54           ` Thierry Reding
2013-01-15 11:44 ` [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x Terje Bergstrom
2013-02-04 11:08   ` Thierry Reding
2013-02-05  4:45     ` Terje Bergström
2013-02-05  9:26       ` Thierry Reding
2013-01-15 11:44 ` [PATCHv5,RESEND 6/8] gpu: host1x: Remove second host1x driver Terje Bergstrom
2013-02-04 11:23   ` Thierry Reding
2013-01-15 11:44 ` [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks Terje Bergstrom
2013-02-04 11:26   ` Thierry Reding
2013-02-04 17:06     ` Stephen Warren
2013-02-05  4:47     ` Terje Bergström
2013-01-15 11:44 ` [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device Terje Bergstrom
2013-02-04 12:56   ` Thierry Reding
2013-02-05  5:17     ` Terje Bergström
2013-02-05  9:54       ` Thierry Reding
2013-02-06 21:23         ` Terje Bergström
2013-02-08  7:07           ` Thierry Reding
2013-02-11  0:42             ` Terje Bergström
2013-02-11  6:44               ` Thierry Reding
2013-02-11 15:40                 ` Terje Bergström
2013-01-22  9:03 ` [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware Terje Bergström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).