All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Host1x IOMMU support + VIC support
@ 2016-12-14 11:16 Mikko Perttunen
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  2016-12-14 11:16 ` [PATCH v2 6/7] arm64: tegra: Enable VIC on T210 Mikko Perttunen
  0 siblings, 2 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 11:16 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c,
	daniel-/w4YWyX8dFk, Mikko Perttunen

This series adds IOMMU support to Host1x and TegraDRM
and adds support for the VIC (Video Image Compositor)
host1x client. The series is available as a git repository at
git://github.com/cyndis/linux.git; branch vic-2.

A userspace test case for VIC can be found at
https://github.com/cyndis/drm/tree/work/tegra.
The testcase is in tests/tegra and is called submit_vic.
The testcase/TRM include full headers and documentation
to program the unit. The unit by itself, however, does not
readily map to existing userspace library interfaces, so
implementations for those are not provided.

The in-kernel firewall is not implemented for VIC;
therefore, either the IOMMU must be enabled or the firewall
disabled for the test to pass.

Tested with Jetson TX1 (T210). Probably works also
with Jetson TK1 (T124). Note that due to hardware changes
the testcase also needs to be changed to run properly
on T124: this can be done by removing the topmost
commit of the aforementioned test repository.

Thanks,
  Mikko.

Arto Merilainen (2):
  drm/tegra: Add falcon helper library
  drm/tegra: Add VIC support

Mikko Perttunen (5):
  drm/tegra: Add Tegra DRM allocation API
  gpu: host1x: Add IOMMU support
  dt-bindings: Add bindings for the Tegra VIC
  arm64: tegra: Enable VIC on T210
  arm64: tegra: Enable IOMMU for Host1x on Tegra210

 .../display/tegra/nvidia,tegra20-host1x.txt        |  13 +
 arch/arm64/boot/dts/nvidia/tegra210.dtsi           |  19 +-
 drivers/gpu/drm/tegra/Makefile                     |   4 +-
 drivers/gpu/drm/tegra/drm.c                        | 114 +++++-
 drivers/gpu/drm/tegra/drm.h                        |  12 +
 drivers/gpu/drm/tegra/falcon.c                     | 259 ++++++++++++++
 drivers/gpu/drm/tegra/falcon.h                     | 127 +++++++
 drivers/gpu/drm/tegra/vic.c                        | 396 +++++++++++++++++++++
 drivers/gpu/drm/tegra/vic.h                        |  31 ++
 drivers/gpu/host1x/cdma.c                          |  74 +++-
 drivers/gpu/host1x/cdma.h                          |   6 +-
 drivers/gpu/host1x/dev.c                           |  40 ++-
 drivers/gpu/host1x/dev.h                           |   6 +
 drivers/gpu/host1x/hw/cdma_hw.c                    |  16 +-
 drivers/gpu/host1x/job.c                           |  72 +++-
 drivers/gpu/host1x/job.h                           |   1 +
 include/linux/host1x.h                             |   1 +
 17 files changed, 1143 insertions(+), 48 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/falcon.c
 create mode 100644 drivers/gpu/drm/tegra/falcon.h
 create mode 100644 drivers/gpu/drm/tegra/vic.c
 create mode 100644 drivers/gpu/drm/tegra/vic.h

-- 
2.10.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2016-12-14 11:16   ` Mikko Perttunen
  2016-12-14 11:35     ` Lucas Stach
  2016-12-14 11:16   ` [PATCH v2 2/7] drm/tegra: Add falcon helper library Mikko Perttunen
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 11:16 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c,
	daniel-/w4YWyX8dFk, Mikko Perttunen

Add a new IO virtual memory allocation API to allow clients to
allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
required e.g. for loading client firmware when clients are attached
to the IOMMU domain.

The allocator allocates contiguous physical pages that are then
mapped contiguously to the IOMMU domain using the iova_domain
library provided by the kernel. Contiguous physical pages are
used so that the same allocator works also when IOMMU support
is disabled and therefore devices access physical memory directly.

Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/drm/tegra/drm.c | 111 +++++++++++++++++++++++++++++++++++++++++---
 drivers/gpu/drm/tegra/drm.h |  11 +++++
 2 files changed, 115 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index b8be3ee..cf714c6 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1,12 +1,13 @@
 /*
  * Copyright (C) 2012 Avionic Design GmbH
- * Copyright (C) 2012-2013 NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (C) 2012-2016 NVIDIA CORPORATION.  All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
 
+#include <linux/bitops.h>
 #include <linux/host1x.h>
 #include <linux/iommu.h>
 
@@ -23,6 +24,8 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+#define CARVEOUT_SZ SZ_64M
+
 struct tegra_drm_file {
 	struct list_head contexts;
 };
@@ -127,7 +130,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 
 	if (iommu_present(&platform_bus_type)) {
 		struct iommu_domain_geometry *geometry;
-		u64 start, end;
+		unsigned long order;
+		u64 carveout_start, carveout_end, gem_start, gem_end;
 
 		tegra->domain = iommu_domain_alloc(&platform_bus_type);
 		if (!tegra->domain) {
@@ -136,12 +140,25 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 		}
 
 		geometry = &tegra->domain->geometry;
-		start = geometry->aperture_start;
-		end = geometry->aperture_end;
+		gem_start = geometry->aperture_start;
+		gem_end = geometry->aperture_end - CARVEOUT_SZ;
+		carveout_start = gem_end + 1;
+		carveout_end = geometry->aperture_end;
+
+		order = __ffs(tegra->domain->pgsize_bitmap);
+		init_iova_domain(&tegra->carveout.domain, 1UL << order,
+				 carveout_start >> order,
+				 carveout_end >> order);
+
+		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
+		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
+
+		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
 
-		DRM_DEBUG_DRIVER("IOMMU aperture initialized (%#llx-%#llx)\n",
-				 start, end);
-		drm_mm_init(&tegra->mm, start, end - start + 1);
+		DRM_DEBUG("IOMMU apertures:\n");
+		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
+		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
+			  carveout_end);
 	}
 
 	mutex_init(&tegra->clients_lock);
@@ -208,6 +225,7 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 	if (tegra->domain) {
 		iommu_domain_free(tegra->domain);
 		drm_mm_takedown(&tegra->mm);
+		put_iova_domain(&tegra->carveout.domain);
 	}
 free:
 	kfree(tegra);
@@ -232,6 +250,7 @@ static int tegra_drm_unload(struct drm_device *drm)
 	if (tegra->domain) {
 		iommu_domain_free(tegra->domain);
 		drm_mm_takedown(&tegra->mm);
+		put_iova_domain(&tegra->carveout.domain);
 	}
 
 	kfree(tegra);
@@ -975,6 +994,84 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 	return 0;
 }
 
+void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size,
+			      dma_addr_t *dma)
+{
+	struct iova *alloc;
+	void *virt;
+	gfp_t gfp;
+	int err;
+
+	if (tegra->domain)
+		size = iova_align(&tegra->carveout.domain, size);
+	else
+		size = PAGE_ALIGN(size);
+
+	gfp = GFP_KERNEL | __GFP_ZERO;
+	if (!tegra->domain) {
+		/*
+		 * Many units only support 32-bit addresses, even on 64-bit
+		 * SoCs. If there is no IOMMU to translate into a 32-bit IO
+		 * virtual address space, force allocations to be in the
+		 * lower 32-bit range.
+		 */
+		gfp |= GFP_DMA;
+	}
+
+	virt = (void *)__get_free_pages(gfp, get_order(size));
+	if (!virt)
+		return ERR_PTR(-ENOMEM);
+
+	if (!tegra->domain) {
+		/*
+		 * If IOMMU is disabled, devices address physical memory
+		 * directly.
+		 */
+		*dma = virt_to_phys(virt);
+		return virt;
+	}
+
+	alloc = alloc_iova(&tegra->carveout.domain,
+			   size >> tegra->carveout.shift,
+			   tegra->carveout.limit, true);
+	if (!alloc) {
+		err = -EBUSY;
+		goto free_pages;
+	}
+
+	*dma = iova_dma_addr(&tegra->carveout.domain, alloc);
+	err = iommu_map(tegra->domain, *dma, virt_to_phys(virt),
+			size, IOMMU_READ | IOMMU_WRITE);
+	if (err < 0)
+		goto free_iova;
+
+	return virt;
+
+free_iova:
+	__free_iova(&tegra->carveout.domain, alloc);
+free_pages:
+	free_pages((unsigned long)virt, get_order(size));
+
+	return ERR_PTR(err);
+}
+
+void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
+		    dma_addr_t dma)
+{
+	if (tegra->domain)
+		size = iova_align(&tegra->carveout.domain, size);
+	else
+		size = PAGE_ALIGN(size);
+
+	if (tegra->domain) {
+		iommu_unmap(tegra->domain, dma, size);
+		free_iova(&tegra->carveout.domain,
+			  iova_pfn(&tegra->carveout.domain, dma));
+	}
+
+	free_pages((unsigned long)virt, get_order(size));
+}
+
 static int host1x_drm_probe(struct host1x_device *dev)
 {
 	struct drm_driver *driver = &tegra_drm_driver;
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 0ddcce1..7351dee 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -12,6 +12,7 @@
 
 #include <uapi/drm/tegra_drm.h>
 #include <linux/host1x.h>
+#include <linux/iova.h>
 #include <linux/of_gpio.h>
 
 #include <drm/drmP.h>
@@ -43,6 +44,12 @@ struct tegra_drm {
 	struct iommu_domain *domain;
 	struct drm_mm mm;
 
+	struct {
+		struct iova_domain domain;
+		unsigned long shift;
+		unsigned long limit;
+	} carveout;
+
 	struct mutex clients_lock;
 	struct list_head clients;
 
@@ -104,6 +111,10 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 int tegra_drm_init(struct tegra_drm *tegra, struct drm_device *drm);
 int tegra_drm_exit(struct tegra_drm *tegra);
 
+void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size, dma_addr_t *iova);
+void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
+		    dma_addr_t iova);
+
 struct tegra_dc_soc_info;
 struct tegra_output;
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/7] drm/tegra: Add falcon helper library
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  2016-12-14 11:16   ` [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API Mikko Perttunen
@ 2016-12-14 11:16   ` Mikko Perttunen
  2016-12-14 11:16   ` [PATCH v2 3/7] drm/tegra: Add VIC support Mikko Perttunen
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 11:16 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c,
	daniel-/w4YWyX8dFk, Arto Merilainen, Andrew Chew,
	Mikko Perttunen

From: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

Add a set of falcon helper routines for use by the tegradrm client drivers
of the various falcon-based engines.

The falcon is a microcontroller that acts as a frontend for the rest of a
particular Tegra engine.  In order to properly utilize these engines, the
frontend must be booted before pushing any commands.

Based on work by Andrew Chew <achew-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

Signed-off-by: Andrew Chew <achew-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/drm/tegra/Makefile |   3 +-
 drivers/gpu/drm/tegra/falcon.c | 259 +++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/falcon.h | 127 ++++++++++++++++++++
 3 files changed, 388 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/tegra/falcon.c
 create mode 100644 drivers/gpu/drm/tegra/falcon.h

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 2c66a8d..93e9a4a 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -13,6 +13,7 @@ tegra-drm-y := \
 	sor.o \
 	dpaux.o \
 	gr2d.o \
-	gr3d.o
+	gr3d.o \
+	falcon.o
 
 obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o
diff --git a/drivers/gpu/drm/tegra/falcon.c b/drivers/gpu/drm/tegra/falcon.c
new file mode 100644
index 0000000..f685e729
--- /dev/null
+++ b/drivers/gpu/drm/tegra/falcon.c
@@ -0,0 +1,259 @@
+/*
+ * Copyright (c) 2015, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/dma-mapping.h>
+#include <linux/firmware.h>
+#include <linux/pci_ids.h>
+#include <linux/iopoll.h>
+
+#include "falcon.h"
+#include "drm.h"
+
+enum falcon_memory {
+	FALCON_MEMORY_IMEM,
+	FALCON_MEMORY_DATA,
+};
+
+static void falcon_writel(struct falcon *falcon, u32 value, u32 offset)
+{
+	writel(value, falcon->regs + offset);
+}
+
+int falcon_wait_idle(struct falcon *falcon)
+{
+	u32 value;
+
+	return readl_poll_timeout(falcon->regs + FALCON_IDLESTATE, value,
+				  (value == 0), 10, 100000);
+}
+
+static int falcon_dma_wait_idle(struct falcon *falcon)
+{
+	u32 value;
+
+	return readl_poll_timeout(falcon->regs + FALCON_DMATRFCMD, value,
+				  (value & FALCON_DMATRFCMD_IDLE), 10, 100000);
+}
+
+static int falcon_copy_chunk(struct falcon *falcon,
+			     phys_addr_t base,
+			     unsigned long offset,
+			     enum falcon_memory target)
+{
+	u32 cmd = FALCON_DMATRFCMD_SIZE_256B;
+
+	if (target == FALCON_MEMORY_IMEM)
+		cmd |= FALCON_DMATRFCMD_IMEM;
+
+	falcon_writel(falcon, offset, FALCON_DMATRFMOFFS);
+	falcon_writel(falcon, base, FALCON_DMATRFFBOFFS);
+	falcon_writel(falcon, cmd, FALCON_DMATRFCMD);
+
+	return falcon_dma_wait_idle(falcon);
+}
+
+static void falcon_copy_firmware_image(struct falcon *falcon,
+				       const struct firmware *firmware)
+{
+	u32 *firmware_vaddr = falcon->firmware.vaddr;
+	dma_addr_t daddr;
+	size_t i;
+	int err;
+
+	/* copy the whole thing taking into account endianness */
+	for (i = 0; i < firmware->size / sizeof(u32); i++)
+		firmware_vaddr[i] = le32_to_cpu(((u32 *)firmware->data)[i]);
+
+	/* ensure that caches are flushed and falcon can see the firmware */
+	daddr = dma_map_single(falcon->dev, firmware_vaddr,
+			       falcon->firmware.size, DMA_TO_DEVICE);
+	err = dma_mapping_error(falcon->dev, daddr);
+	if (err) {
+		dev_err(falcon->dev, "failed to map firmware: %d\n", err);
+		return;
+	}
+	dma_sync_single_for_device(falcon->dev, daddr,
+				   falcon->firmware.size, DMA_TO_DEVICE);
+	dma_unmap_single(falcon->dev, daddr, falcon->firmware.size,
+			 DMA_TO_DEVICE);
+}
+
+static int falcon_parse_firmware_image(struct falcon *falcon)
+{
+	struct falcon_fw_bin_header_v1 *bin = (void *)falcon->firmware.vaddr;
+	struct falcon_fw_os_header_v1 *os;
+
+	/* endian problems would show up right here */
+	if (bin->magic != PCI_VENDOR_ID_NVIDIA) {
+		dev_err(falcon->dev, "incorrect firmware magic\n");
+		return -EINVAL;
+	}
+
+	/* currently only version 1 is supported */
+	if (bin->version != 1) {
+		dev_err(falcon->dev, "unsupported firmware version\n");
+		return -EINVAL;
+	}
+
+	/* check that the firmware size is consistent */
+	if (bin->size > falcon->firmware.size) {
+		dev_err(falcon->dev, "firmware image size inconsistency\n");
+		return -EINVAL;
+	}
+
+	os = falcon->firmware.vaddr + bin->os_header_offset;
+
+	falcon->firmware.bin_data.size = bin->os_size;
+	falcon->firmware.bin_data.offset = bin->os_data_offset;
+	falcon->firmware.code.offset = os->code_offset;
+	falcon->firmware.code.size = os->code_size;
+	falcon->firmware.data.offset = os->data_offset;
+	falcon->firmware.data.size = os->data_size;
+
+	return 0;
+}
+
+int falcon_read_firmware(struct falcon *falcon, const char *name)
+{
+	int err;
+
+	/* request_firmware prints error if it fails */
+	err = request_firmware(&falcon->firmware.firmware, name, falcon->dev);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+int falcon_load_firmware(struct falcon *falcon)
+{
+	const struct firmware *firmware = falcon->firmware.firmware;
+	int err;
+
+	falcon->firmware.size = firmware->size;
+
+	/* allocate iova space for the firmware */
+	falcon->firmware.vaddr = falcon->ops->alloc(falcon, firmware->size,
+						    &falcon->firmware.paddr);
+	if (!falcon->firmware.vaddr) {
+		dev_err(falcon->dev, "dma memory mapping failed\n");
+		return -ENOMEM;
+	}
+
+	/* copy firmware image into local area. this also ensures endianness */
+	falcon_copy_firmware_image(falcon, firmware);
+
+	/* parse the image data */
+	err = falcon_parse_firmware_image(falcon);
+	if (err < 0) {
+		dev_err(falcon->dev, "failed to parse firmware image\n");
+		goto err_setup_firmware_image;
+	}
+
+	release_firmware(firmware);
+	falcon->firmware.firmware = NULL;
+
+	return 0;
+
+err_setup_firmware_image:
+	falcon->ops->free(falcon, falcon->firmware.size,
+			  falcon->firmware.paddr, falcon->firmware.vaddr);
+
+	return err;
+}
+
+int falcon_init(struct falcon *falcon)
+{
+	/* check mandatory ops */
+	if (!falcon->ops || !falcon->ops->alloc || !falcon->ops->free)
+		return -EINVAL;
+
+	falcon->firmware.vaddr = NULL;
+
+	return 0;
+}
+
+void falcon_exit(struct falcon *falcon)
+{
+	if (falcon->firmware.firmware) {
+		release_firmware(falcon->firmware.firmware);
+		falcon->firmware.firmware = NULL;
+	}
+
+	if (falcon->firmware.vaddr) {
+		falcon->ops->free(falcon, falcon->firmware.size,
+				  falcon->firmware.paddr,
+				  falcon->firmware.vaddr);
+		falcon->firmware.vaddr = NULL;
+	}
+}
+
+int falcon_boot(struct falcon *falcon)
+{
+	unsigned long offset;
+	int err;
+
+	if (!falcon->firmware.vaddr)
+		return -EINVAL;
+
+	falcon_writel(falcon, 0, FALCON_DMACTL);
+
+	/* setup the address of the binary data so Falcon can access it later */
+	falcon_writel(falcon, (falcon->firmware.paddr +
+			       falcon->firmware.bin_data.offset) >> 8,
+		      FALCON_DMATRFBASE);
+
+	/* copy the data segment into Falcon internal memory */
+	for (offset = 0; offset < falcon->firmware.data.size; offset += 256)
+		falcon_copy_chunk(falcon,
+				  falcon->firmware.data.offset + offset,
+				  offset, FALCON_MEMORY_DATA);
+
+	/* copy the first code segment into Falcon internal memory */
+	falcon_copy_chunk(falcon, falcon->firmware.code.offset,
+			  0, FALCON_MEMORY_IMEM);
+
+	/* setup falcon interrupts */
+	falcon_writel(falcon, FALCON_IRQMSET_EXT(0xff) |
+			      FALCON_IRQMSET_SWGEN1 |
+			      FALCON_IRQMSET_SWGEN0 |
+			      FALCON_IRQMSET_EXTERR |
+			      FALCON_IRQMSET_HALT |
+			      FALCON_IRQMSET_WDTMR,
+		      FALCON_IRQMSET);
+	falcon_writel(falcon, FALCON_IRQDEST_EXT(0xff) |
+			      FALCON_IRQDEST_SWGEN1 |
+			      FALCON_IRQDEST_SWGEN0 |
+			      FALCON_IRQDEST_EXTERR |
+			      FALCON_IRQDEST_HALT,
+		      FALCON_IRQDEST);
+
+	/* enable interface */
+	falcon_writel(falcon, FALCON_ITFEN_MTHDEN |
+			      FALCON_ITFEN_CTXEN,
+		      FALCON_ITFEN);
+
+	/* boot falcon */
+	falcon_writel(falcon, 0x00000000, FALCON_BOOTVEC);
+	falcon_writel(falcon, FALCON_CPUCTL_STARTCPU, FALCON_CPUCTL);
+
+	err = falcon_wait_idle(falcon);
+	if (err < 0) {
+		dev_err(falcon->dev, "Falcon boot failed due to timeout\n");
+		return err;
+	}
+
+	return 0;
+}
+
+void falcon_execute_method(struct falcon *falcon, u32 method, u32 data)
+{
+	falcon_writel(falcon, method >> 2, FALCON_UCLASS_METHOD_OFFSET);
+	falcon_writel(falcon, data, FALCON_UCLASS_METHOD_DATA);
+}
diff --git a/drivers/gpu/drm/tegra/falcon.h b/drivers/gpu/drm/tegra/falcon.h
new file mode 100644
index 0000000..4504ed5
--- /dev/null
+++ b/drivers/gpu/drm/tegra/falcon.h
@@ -0,0 +1,127 @@
+/*
+ * Copyright (c) 2015, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _FALCON_H_
+#define _FALCON_H_
+
+#include <linux/types.h>
+
+#define FALCON_UCLASS_METHOD_OFFSET		0x00000040
+
+#define FALCON_UCLASS_METHOD_DATA		0x00000044
+
+#define FALCON_IRQMSET				0x00001010
+#define FALCON_IRQMSET_WDTMR			(1 << 1)
+#define FALCON_IRQMSET_HALT			(1 << 4)
+#define FALCON_IRQMSET_EXTERR			(1 << 5)
+#define FALCON_IRQMSET_SWGEN0			(1 << 6)
+#define FALCON_IRQMSET_SWGEN1			(1 << 7)
+#define FALCON_IRQMSET_EXT(v)			(((v) & 0xff) << 8)
+
+#define FALCON_IRQDEST				0x0000101c
+#define FALCON_IRQDEST_HALT			(1 << 4)
+#define FALCON_IRQDEST_EXTERR			(1 << 5)
+#define FALCON_IRQDEST_SWGEN0			(1 << 6)
+#define FALCON_IRQDEST_SWGEN1			(1 << 7)
+#define FALCON_IRQDEST_EXT(v)			(((v) & 0xff) << 8)
+
+#define FALCON_ITFEN				0x00001048
+#define FALCON_ITFEN_CTXEN			(1 << 0)
+#define FALCON_ITFEN_MTHDEN			(1 << 1)
+
+#define FALCON_IDLESTATE			0x0000104c
+
+#define FALCON_CPUCTL				0x00001100
+#define FALCON_CPUCTL_STARTCPU			(1 << 1)
+
+#define FALCON_BOOTVEC				0x00001104
+
+#define FALCON_DMACTL				0x0000110c
+#define FALCON_DMACTL_DMEM_SCRUBBING		(1 << 1)
+#define FALCON_DMACTL_IMEM_SCRUBBING		(1 << 2)
+
+#define FALCON_DMATRFBASE			0x00001110
+
+#define FALCON_DMATRFMOFFS			0x00001114
+
+#define FALCON_DMATRFCMD			0x00001118
+#define FALCON_DMATRFCMD_IDLE			(1 << 1)
+#define FALCON_DMATRFCMD_IMEM			(1 << 4)
+#define FALCON_DMATRFCMD_SIZE_256B		(6 << 8)
+
+#define FALCON_DMATRFFBOFFS			0x0000111c
+
+struct falcon_fw_bin_header_v1 {
+	u32 magic;		/* 0x10de */
+	u32 version;		/* version of bin format (1) */
+	u32 size;		/* entire image size including this header */
+	u32 os_header_offset;
+	u32 os_data_offset;
+	u32 os_size;
+};
+
+struct falcon_fw_os_app_v1 {
+	u32 offset;
+	u32 size;
+};
+
+struct falcon_fw_os_header_v1 {
+	u32 code_offset;
+	u32 code_size;
+	u32 data_offset;
+	u32 data_size;
+};
+
+struct falcon;
+
+struct falcon_ops {
+	void *(*alloc)(struct falcon *falcon, size_t size,
+		       dma_addr_t *paddr);
+	void (*free)(struct falcon *falcon, size_t size,
+		     dma_addr_t paddr, void *vaddr);
+};
+
+struct falcon_firmware_section {
+	unsigned long offset;
+	size_t size;
+};
+
+struct falcon_firmware {
+	/* Firmware after it is read but not loaded */
+	const struct firmware *firmware;
+
+	/* Raw firmware data */
+	dma_addr_t paddr;
+	void *vaddr;
+	size_t size;
+
+	/* Parsed firmware information */
+	struct falcon_firmware_section bin_data;
+	struct falcon_firmware_section data;
+	struct falcon_firmware_section code;
+};
+
+struct falcon {
+	/* Set by falcon client */
+	struct device *dev;
+	void __iomem *regs;
+	const struct falcon_ops *ops;
+	void *data;
+
+	struct falcon_firmware firmware;
+};
+
+int falcon_init(struct falcon *falcon);
+void falcon_exit(struct falcon *falcon);
+int falcon_read_firmware(struct falcon *falcon, const char *firmware_name);
+int falcon_load_firmware(struct falcon *falcon);
+int falcon_boot(struct falcon *falcon);
+void falcon_execute_method(struct falcon *falcon, u32 method, u32 data);
+int falcon_wait_idle(struct falcon *falcon);
+
+#endif /* _FALCON_H_ */
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/7] drm/tegra: Add VIC support
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  2016-12-14 11:16   ` [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API Mikko Perttunen
  2016-12-14 11:16   ` [PATCH v2 2/7] drm/tegra: Add falcon helper library Mikko Perttunen
@ 2016-12-14 11:16   ` Mikko Perttunen
  2016-12-14 11:16   ` [PATCH v2 4/7] gpu: host1x: Add IOMMU support Mikko Perttunen
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 11:16 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c,
	daniel-/w4YWyX8dFk, Arto Merilainen, Andrew Chew,
	Mikko Perttunen

From: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

This patch adds support for Video Image Compositor engine which
can be used for 2d operations.

Signed-off-by: Andrew Chew <achew-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Arto Merilainen <amerilainen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/drm/tegra/Makefile |   3 +-
 drivers/gpu/drm/tegra/drm.c    |   3 +
 drivers/gpu/drm/tegra/drm.h    |   1 +
 drivers/gpu/drm/tegra/vic.c    | 396 +++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/vic.h    |  31 ++++
 include/linux/host1x.h         |   1 +
 6 files changed, 434 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/tegra/vic.c
 create mode 100644 drivers/gpu/drm/tegra/vic.h

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 93e9a4a..6af3a9a 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -14,6 +14,7 @@ tegra-drm-y := \
 	dpaux.o \
 	gr2d.o \
 	gr3d.o \
-	falcon.o
+	falcon.o \
+	vic.o
 
 obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index cf714c6..f47692e 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1160,11 +1160,13 @@ static const struct of_device_id host1x_drm_subdevs[] = {
 	{ .compatible = "nvidia,tegra124-sor", },
 	{ .compatible = "nvidia,tegra124-hdmi", },
 	{ .compatible = "nvidia,tegra124-dsi", },
+	{ .compatible = "nvidia,tegra124-vic", },
 	{ .compatible = "nvidia,tegra132-dsi", },
 	{ .compatible = "nvidia,tegra210-dc", },
 	{ .compatible = "nvidia,tegra210-dsi", },
 	{ .compatible = "nvidia,tegra210-sor", },
 	{ .compatible = "nvidia,tegra210-sor1", },
+	{ .compatible = "nvidia,tegra210-vic", },
 	{ /* sentinel */ }
 };
 
@@ -1186,6 +1188,7 @@ static struct platform_driver * const drivers[] = {
 	&tegra_sor_driver,
 	&tegra_gr2d_driver,
 	&tegra_gr3d_driver,
+	&tegra_vic_driver,
 };
 
 static int __init host1x_drm_init(void)
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 7351dee..6a57e66 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -296,5 +296,6 @@ extern struct platform_driver tegra_dpaux_driver;
 extern struct platform_driver tegra_sor_driver;
 extern struct platform_driver tegra_gr2d_driver;
 extern struct platform_driver tegra_gr3d_driver;
+extern struct platform_driver tegra_vic_driver;
 
 #endif /* HOST1X_DRM_H */
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
new file mode 100644
index 0000000..cd804e4
--- /dev/null
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -0,0 +1,396 @@
+/*
+ * Copyright (c) 2015, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/clk.h>
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/reset.h>
+
+#include <soc/tegra/pmc.h>
+
+#include "drm.h"
+#include "falcon.h"
+#include "vic.h"
+
+struct vic_config {
+	const char *firmware;
+};
+
+struct vic {
+	struct falcon falcon;
+	bool booted;
+
+	void __iomem *regs;
+	struct tegra_drm_client client;
+	struct host1x_channel *channel;
+	struct iommu_domain *domain;
+	struct device *dev;
+	struct clk *clk;
+
+	/* Platform configuration */
+	const struct vic_config *config;
+};
+
+static inline struct vic *to_vic(struct tegra_drm_client *client)
+{
+	return container_of(client, struct vic, client);
+}
+
+static void vic_writel(struct vic *vic, u32 value, unsigned int offset)
+{
+	writel(value, vic->regs + offset);
+}
+
+static int vic_runtime_resume(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+
+	return clk_prepare_enable(vic->clk);
+}
+
+static int vic_runtime_suspend(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+
+	clk_disable_unprepare(vic->clk);
+
+	vic->booted = false;
+
+	return 0;
+}
+
+static int vic_boot(struct vic *vic)
+{
+	u32 fce_ucode_size, fce_bin_data_offset;
+	void *hdr;
+	int err = 0;
+
+	if (vic->booted)
+		return 0;
+
+	/* setup clockgating registers */
+	vic_writel(vic, CG_IDLE_CG_DLY_CNT(4) |
+			CG_IDLE_CG_EN |
+			CG_WAKEUP_DLY_CNT(4),
+		   NV_PVIC_MISC_PRI_VIC_CG);
+
+	err = falcon_boot(&vic->falcon);
+	if (err < 0)
+		return err;
+
+	hdr = vic->falcon.firmware.vaddr;
+	fce_bin_data_offset = *(u32 *)(hdr + VIC_UCODE_FCE_DATA_OFFSET);
+	hdr = vic->falcon.firmware.vaddr +
+		*(u32 *)(hdr + VIC_UCODE_FCE_HEADER_OFFSET);
+	fce_ucode_size = *(u32 *)(hdr + FCE_UCODE_SIZE_OFFSET);
+
+	falcon_execute_method(&vic->falcon, VIC_SET_APPLICATION_ID, 1);
+	falcon_execute_method(&vic->falcon, VIC_SET_FCE_UCODE_SIZE,
+			      fce_ucode_size);
+	falcon_execute_method(&vic->falcon, VIC_SET_FCE_UCODE_OFFSET,
+			      (vic->falcon.firmware.paddr + fce_bin_data_offset)
+				>> 8);
+
+	err = falcon_wait_idle(&vic->falcon);
+	if (err < 0) {
+		dev_err(vic->dev,
+			"failed to set application ID and FCE base\n");
+		return err;
+	}
+
+	vic->booted = true;
+
+	return 0;
+}
+
+static void *vic_falcon_alloc(struct falcon *falcon, size_t size,
+			       dma_addr_t *iova)
+{
+	struct tegra_drm *tegra = falcon->data;
+
+	return tegra_drm_alloc(tegra, size, iova);
+}
+
+static void vic_falcon_free(struct falcon *falcon, size_t size,
+			    dma_addr_t iova, void *va)
+{
+	struct tegra_drm *tegra = falcon->data;
+
+	return tegra_drm_free(tegra, size, va, iova);
+}
+
+static const struct falcon_ops vic_falcon_ops = {
+	.alloc = vic_falcon_alloc,
+	.free = vic_falcon_free
+};
+
+static int vic_init(struct host1x_client *client)
+{
+	struct tegra_drm_client *drm = host1x_to_drm_client(client);
+	struct drm_device *dev = dev_get_drvdata(client->parent);
+	struct tegra_drm *tegra = dev->dev_private;
+	struct vic *vic = to_vic(drm);
+	int err;
+
+	if (tegra->domain) {
+		err = iommu_attach_device(tegra->domain, vic->dev);
+		if (err < 0) {
+			dev_err(vic->dev, "failed to attach to domain: %d\n",
+				err);
+			return err;
+		}
+
+		vic->domain = tegra->domain;
+	}
+
+	if (!vic->falcon.data) {
+		vic->falcon.data = tegra;
+		err = falcon_load_firmware(&vic->falcon);
+		if (err < 0)
+			goto detach_device;
+	}
+
+	vic->channel = host1x_channel_request(client->dev);
+	if (!vic->channel) {
+		err = -ENOMEM;
+		goto detach_device;
+	}
+
+	client->syncpts[0] = host1x_syncpt_request(client->dev, 0);
+	if (!client->syncpts[0]) {
+		err = -ENOMEM;
+		goto free_channel;
+	}
+
+	err = tegra_drm_register_client(tegra, drm);
+	if (err < 0)
+		goto free_syncpt;
+
+	return 0;
+
+free_syncpt:
+	host1x_syncpt_free(client->syncpts[0]);
+free_channel:
+	host1x_channel_free(vic->channel);
+detach_device:
+	if (tegra->domain)
+		iommu_detach_device(tegra->domain, vic->dev);
+
+	return err;
+}
+
+static int vic_exit(struct host1x_client *client)
+{
+	struct tegra_drm_client *drm = host1x_to_drm_client(client);
+	struct drm_device *dev = dev_get_drvdata(client->parent);
+	struct tegra_drm *tegra = dev->dev_private;
+	struct vic *vic = to_vic(drm);
+	int err;
+
+	err = tegra_drm_unregister_client(tegra, drm);
+	if (err < 0)
+		return err;
+
+	host1x_syncpt_free(client->syncpts[0]);
+	host1x_channel_free(vic->channel);
+
+	if (vic->domain) {
+		iommu_detach_device(vic->domain, vic->dev);
+		vic->domain = NULL;
+	}
+
+	return 0;
+}
+
+static const struct host1x_client_ops vic_client_ops = {
+	.init = vic_init,
+	.exit = vic_exit,
+};
+
+static int vic_open_channel(struct tegra_drm_client *client,
+			    struct tegra_drm_context *context)
+{
+	struct vic *vic = to_vic(client);
+	int err;
+
+	err = pm_runtime_get_sync(vic->dev);
+	if (err < 0)
+		return err;
+
+	err = vic_boot(vic);
+	if (err < 0) {
+		pm_runtime_put(vic->dev);
+		return err;
+	}
+
+	context->channel = host1x_channel_get(vic->channel);
+	if (!context->channel) {
+		pm_runtime_put(vic->dev);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void vic_close_channel(struct tegra_drm_context *context)
+{
+	struct vic *vic = to_vic(context->client);
+
+	host1x_channel_put(context->channel);
+
+	pm_runtime_put(vic->dev);
+}
+
+static const struct tegra_drm_client_ops vic_ops = {
+	.open_channel = vic_open_channel,
+	.close_channel = vic_close_channel,
+	.submit = tegra_drm_submit,
+};
+
+static const struct vic_config vic_t124_config = {
+	.firmware = "nvidia/tegra124/vic03_ucode.bin",
+};
+
+static const struct vic_config vic_t210_config = {
+	.firmware = "nvidia/tegra210/vic04_ucode.bin",
+};
+
+static const struct of_device_id vic_match[] = {
+	{ .compatible = "nvidia,tegra124-vic", .data = &vic_t124_config },
+	{ .compatible = "nvidia,tegra210-vic", .data = &vic_t210_config },
+	{ },
+};
+
+static int vic_probe(struct platform_device *pdev)
+{
+	struct vic_config *vic_config = NULL;
+	struct device *dev = &pdev->dev;
+	struct host1x_syncpt **syncpts;
+	struct resource *regs;
+	const struct of_device_id *match;
+	struct vic *vic;
+	int err;
+
+	match = of_match_device(vic_match, dev);
+	vic_config = (struct vic_config *)match->data;
+
+	vic = devm_kzalloc(dev, sizeof(*vic), GFP_KERNEL);
+	if (!vic)
+		return -ENOMEM;
+
+	syncpts = devm_kzalloc(dev, sizeof(*syncpts), GFP_KERNEL);
+	if (!syncpts)
+		return -ENOMEM;
+
+	regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!regs) {
+		dev_err(&pdev->dev, "failed to get registers\n");
+		return -ENXIO;
+	}
+
+	vic->regs = devm_ioremap_resource(dev, regs);
+	if (IS_ERR(vic->regs))
+		return PTR_ERR(vic->regs);
+
+	vic->clk = devm_clk_get(dev, NULL);
+	if (IS_ERR(vic->clk)) {
+		dev_err(&pdev->dev, "failed to get clock\n");
+		return PTR_ERR(vic->clk);
+	}
+
+	vic->falcon.dev = dev;
+	vic->falcon.regs = vic->regs;
+	vic->falcon.ops = &vic_falcon_ops;
+
+	err = falcon_init(&vic->falcon);
+	if (err < 0)
+		return err;
+
+	err = falcon_read_firmware(&vic->falcon, vic_config->firmware);
+	if (err < 0)
+		goto exit_falcon;
+
+	platform_set_drvdata(pdev, vic);
+
+	INIT_LIST_HEAD(&vic->client.base.list);
+	vic->client.base.ops = &vic_client_ops;
+	vic->client.base.dev = dev;
+	vic->client.base.class = HOST1X_CLASS_VIC;
+	vic->client.base.syncpts = syncpts;
+	vic->client.base.num_syncpts = 1;
+	vic->dev = dev;
+	vic->config = vic_config;
+
+	INIT_LIST_HEAD(&vic->client.list);
+	vic->client.ops = &vic_ops;
+
+	err = host1x_client_register(&vic->client.base);
+	if (err < 0) {
+		dev_err(dev, "failed to register host1x client: %d\n", err);
+		platform_set_drvdata(pdev, NULL);
+		goto exit_falcon;
+	}
+
+	pm_runtime_enable(&pdev->dev);
+	if (!pm_runtime_enabled(&pdev->dev)) {
+		err = vic_runtime_resume(&pdev->dev);
+		if (err < 0)
+			goto unregister_client;
+	}
+
+	return 0;
+
+unregister_client:
+	host1x_client_unregister(&vic->client.base);
+exit_falcon:
+	falcon_exit(&vic->falcon);
+
+	return err;
+}
+
+static int vic_remove(struct platform_device *pdev)
+{
+	struct vic *vic = platform_get_drvdata(pdev);
+	int err;
+
+	err = host1x_client_unregister(&vic->client.base);
+	if (err < 0) {
+		dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+			err);
+		return err;
+	}
+
+	if (pm_runtime_enabled(&pdev->dev))
+		pm_runtime_disable(&pdev->dev);
+	else
+		vic_runtime_suspend(&pdev->dev);
+
+	falcon_exit(&vic->falcon);
+
+	return 0;
+}
+
+static const struct dev_pm_ops vic_pm_ops = {
+	SET_RUNTIME_PM_OPS(vic_runtime_suspend, vic_runtime_resume, NULL)
+};
+
+struct platform_driver tegra_vic_driver = {
+	.driver = {
+		.name = "tegra-vic",
+		.of_match_table = vic_match,
+		.pm = &vic_pm_ops
+	},
+	.probe = vic_probe,
+	.remove = vic_remove,
+};
diff --git a/drivers/gpu/drm/tegra/vic.h b/drivers/gpu/drm/tegra/vic.h
new file mode 100644
index 0000000..2184481
--- /dev/null
+++ b/drivers/gpu/drm/tegra/vic.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2015, NVIDIA Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef TEGRA_VIC_H
+#define TEGRA_VIC_H
+
+/* VIC methods */
+
+#define VIC_SET_APPLICATION_ID			0x00000200
+#define VIC_SET_FCE_UCODE_SIZE			0x0000071C
+#define VIC_SET_FCE_UCODE_OFFSET		0x0000072C
+
+/* VIC registers */
+
+#define NV_PVIC_MISC_PRI_VIC_CG			0x000016d0
+#define CG_IDLE_CG_DLY_CNT(val)			((val & 0x3f) << 0)
+#define CG_IDLE_CG_EN				(1 << 6)
+#define CG_WAKEUP_DLY_CNT(val)			((val & 0xf) << 16)
+
+/* Firmware offsets */
+
+#define VIC_UCODE_FCE_HEADER_OFFSET		(6*4)
+#define VIC_UCODE_FCE_DATA_OFFSET		(7*4)
+#define FCE_UCODE_SIZE_OFFSET			(2*4)
+
+#endif /* TEGRA_VIC_H */
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 1ffbf2a..3d04aa1 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -26,6 +26,7 @@ enum host1x_class {
 	HOST1X_CLASS_HOST1X = 0x1,
 	HOST1X_CLASS_GR2D = 0x51,
 	HOST1X_CLASS_GR2D_SB = 0x52,
+	HOST1X_CLASS_VIC = 0x5D,
 	HOST1X_CLASS_GR3D = 0x60,
 };
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/7] gpu: host1x: Add IOMMU support
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-12-14 11:16   ` [PATCH v2 3/7] drm/tegra: Add VIC support Mikko Perttunen
@ 2016-12-14 11:16   ` Mikko Perttunen
  2016-12-14 11:16   ` [PATCH v2 5/7] dt-bindings: Add bindings for the Tegra VIC Mikko Perttunen
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 11:16 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c,
	daniel-/w4YWyX8dFk, Mikko Perttunen

Add support for the Host1x unit to be located behind
an IOMMU. This is required when gather buffers may be
allocated non-contiguously in physical memory, as can
be the case when TegraDRM is also using the IOMMU.

Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/host1x/cdma.c       | 74 ++++++++++++++++++++++++++++++++---------
 drivers/gpu/host1x/cdma.h       |  6 ++--
 drivers/gpu/host1x/dev.c        | 40 ++++++++++++++++++++--
 drivers/gpu/host1x/dev.h        |  6 ++++
 drivers/gpu/host1x/hw/cdma_hw.c | 16 ++++-----
 drivers/gpu/host1x/job.c        | 72 +++++++++++++++++++++++++++++++++------
 drivers/gpu/host1x/job.h        |  1 +
 7 files changed, 176 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index c5d82a8..28541b2 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -51,9 +51,15 @@ static void host1x_pushbuffer_destroy(struct push_buffer *pb)
 	struct host1x_cdma *cdma = pb_to_cdma(pb);
 	struct host1x *host1x = cdma_to_host1x(cdma);
 
-	if (pb->phys != 0)
-		dma_free_wc(host1x->dev, pb->size_bytes + 4, pb->mapped,
-			    pb->phys);
+	if (!pb->phys)
+		return;
+
+	if (host1x->domain) {
+		iommu_unmap(host1x->domain, pb->dma, pb->alloc_size);
+		free_iova(&host1x->iova, iova_pfn(&host1x->iova, pb->dma));
+	}
+
+	dma_free_wc(host1x->dev, pb->alloc_size, pb->mapped, pb->phys);
 
 	pb->mapped = NULL;
 	pb->phys = 0;
@@ -66,28 +72,64 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
 {
 	struct host1x_cdma *cdma = pb_to_cdma(pb);
 	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct iova *alloc;
+	u32 size;
+	int err;
 
 	pb->mapped = NULL;
 	pb->phys = 0;
-	pb->size_bytes = HOST1X_PUSHBUFFER_SLOTS * 8;
+	pb->size = HOST1X_PUSHBUFFER_SLOTS * 8;
+
+	size = pb->size + 4;
 
 	/* initialize buffer pointers */
-	pb->fence = pb->size_bytes - 8;
+	pb->fence = pb->size - 8;
 	pb->pos = 0;
 
-	/* allocate and map pushbuffer memory */
-	pb->mapped = dma_alloc_wc(host1x->dev, pb->size_bytes + 4, &pb->phys,
-				  GFP_KERNEL);
-	if (!pb->mapped)
-		goto fail;
+	if (host1x->domain) {
+		unsigned long shift;
+
+		size = iova_align(&host1x->iova, size);
+
+		pb->mapped = dma_alloc_wc(host1x->dev, size, &pb->phys,
+					  GFP_KERNEL);
+		if (!pb->mapped)
+			return -ENOMEM;
+
+		shift = iova_shift(&host1x->iova);
+		alloc = alloc_iova(&host1x->iova, size >> shift,
+				   host1x->iova_end >> shift, true);
+		if (!alloc) {
+			err = -ENOMEM;
+			goto iommu_free_mem;
+		}
+
+		pb->dma = iova_dma_addr(&host1x->iova, alloc);
+		err = iommu_map(host1x->domain, pb->dma, pb->phys, size,
+				IOMMU_READ);
+		if (err)
+			goto iommu_free_iova;
+	} else {
+		pb->mapped = dma_alloc_wc(host1x->dev, size, &pb->phys,
+					  GFP_KERNEL);
+		if (!pb->mapped)
+			return -ENOMEM;
+
+		pb->dma = pb->phys;
+	}
+
+	pb->alloc_size = size;
 
 	host1x_hw_pushbuffer_init(host1x, pb);
 
 	return 0;
 
-fail:
-	host1x_pushbuffer_destroy(pb);
-	return -ENOMEM;
+iommu_free_iova:
+	__free_iova(&host1x->iova, alloc);
+iommu_free_mem:
+	dma_free_wc(host1x->dev, pb->alloc_size, pb->mapped, pb->phys);
+
+	return err;
 }
 
 /*
@@ -101,7 +143,7 @@ static void host1x_pushbuffer_push(struct push_buffer *pb, u32 op1, u32 op2)
 	WARN_ON(pb->pos == pb->fence);
 	*(p++) = op1;
 	*(p++) = op2;
-	pb->pos = (pb->pos + 8) & (pb->size_bytes - 1);
+	pb->pos = (pb->pos + 8) & (pb->size - 1);
 }
 
 /*
@@ -111,7 +153,7 @@ static void host1x_pushbuffer_push(struct push_buffer *pb, u32 op1, u32 op2)
 static void host1x_pushbuffer_pop(struct push_buffer *pb, unsigned int slots)
 {
 	/* Advance the next write position */
-	pb->fence = (pb->fence + slots * 8) & (pb->size_bytes - 1);
+	pb->fence = (pb->fence + slots * 8) & (pb->size - 1);
 }
 
 /*
@@ -119,7 +161,7 @@ static void host1x_pushbuffer_pop(struct push_buffer *pb, unsigned int slots)
  */
 static u32 host1x_pushbuffer_space(struct push_buffer *pb)
 {
-	return ((pb->fence - pb->pos) & (pb->size_bytes - 1)) / 8;
+	return ((pb->fence - pb->pos) & (pb->size - 1)) / 8;
 }
 
 /*
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
index 470087a..ec170a7 100644
--- a/drivers/gpu/host1x/cdma.h
+++ b/drivers/gpu/host1x/cdma.h
@@ -43,10 +43,12 @@ struct host1x_job;
 
 struct push_buffer {
 	void *mapped;			/* mapped pushbuffer memory */
-	dma_addr_t phys;		/* physical address of pushbuffer */
+	dma_addr_t dma;			/* device address of pushbuffer */
+	phys_addr_t phys;		/* physical address of pushbuffer */
 	u32 fence;			/* index we've written */
 	u32 pos;			/* index to write to */
-	u32 size_bytes;
+	u32 size;
+	u32 alloc_size;
 };
 
 struct buffer_timeout {
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index a62317a..a8234bb 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -168,16 +168,37 @@ static int host1x_probe(struct platform_device *pdev)
 		return err;
 	}
 
+	if (iommu_present(&platform_bus_type)) {
+		struct iommu_domain_geometry *geometry;
+		unsigned long order;
+
+		host->domain = iommu_domain_alloc(&platform_bus_type);
+		if (!host->domain)
+			return -ENOMEM;
+
+		err = iommu_attach_device(host->domain, &pdev->dev);
+		if (err)
+			goto fail_free_domain;
+
+		geometry = &host->domain->geometry;
+
+		order = __ffs(host->domain->pgsize_bitmap);
+		init_iova_domain(&host->iova, 1UL << order,
+				 geometry->aperture_start >> order,
+				 geometry->aperture_end >> order);
+		host->iova_end = geometry->aperture_end;
+	}
+
 	err = host1x_channel_list_init(host);
 	if (err) {
 		dev_err(&pdev->dev, "failed to initialize channel list\n");
-		return err;
+		goto fail_detach_device;
 	}
 
 	err = clk_prepare_enable(host->clk);
 	if (err < 0) {
 		dev_err(&pdev->dev, "failed to enable clock\n");
-		return err;
+		goto fail_detach_device;
 	}
 
 	err = host1x_syncpt_init(host);
@@ -206,6 +227,15 @@ static int host1x_probe(struct platform_device *pdev)
 	host1x_syncpt_deinit(host);
 fail_unprepare_disable:
 	clk_disable_unprepare(host->clk);
+fail_detach_device:
+	if (host->domain) {
+		put_iova_domain(&host->iova);
+		iommu_detach_device(host->domain, &pdev->dev);
+	}
+fail_free_domain:
+	if (host->domain)
+		iommu_domain_free(host->domain);
+
 	return err;
 }
 
@@ -218,6 +248,12 @@ static int host1x_remove(struct platform_device *pdev)
 	host1x_syncpt_deinit(host);
 	clk_disable_unprepare(host->clk);
 
+	if (host->domain) {
+		put_iova_domain(&host->iova);
+		iommu_detach_device(host->domain, &pdev->dev);
+		iommu_domain_free(host->domain);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 5220510..d81eb92 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -19,6 +19,8 @@
 
 #include <linux/platform_device.h>
 #include <linux/device.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
 
 #include "channel.h"
 #include "syncpt.h"
@@ -108,6 +110,10 @@ struct host1x {
 	struct device *dev;
 	struct clk *clk;
 
+	struct iommu_domain *domain;
+	struct iova_domain iova;
+	dma_addr_t iova_end;
+
 	struct mutex intr_mutex;
 	int intr_syncpt_irq;
 
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 659c1bb..6b23111 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -30,7 +30,7 @@
  */
 static void push_buffer_init(struct push_buffer *pb)
 {
-	*(u32 *)(pb->mapped + pb->size_bytes) = host1x_opcode_restart(0);
+	*(u32 *)(pb->mapped + pb->size) = host1x_opcode_restart(0);
 }
 
 /*
@@ -55,8 +55,8 @@ static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
 		*(p++) = HOST1X_OPCODE_NOP;
 		*(p++) = HOST1X_OPCODE_NOP;
 		dev_dbg(host1x->dev, "%s: NOP at %pad+%#x\n", __func__,
-			&pb->phys, getptr);
-		getptr = (getptr + 8) & (pb->size_bytes - 1);
+			&pb->dma, getptr);
+		getptr = (getptr + 8) & (pb->size - 1);
 	}
 
 	wmb();
@@ -78,10 +78,9 @@ static void cdma_start(struct host1x_cdma *cdma)
 			 HOST1X_CHANNEL_DMACTRL);
 
 	/* set base, put and end pointer */
-	host1x_ch_writel(ch, cdma->push_buffer.phys, HOST1X_CHANNEL_DMASTART);
+	host1x_ch_writel(ch, cdma->push_buffer.dma, HOST1X_CHANNEL_DMASTART);
 	host1x_ch_writel(ch, cdma->push_buffer.pos, HOST1X_CHANNEL_DMAPUT);
-	host1x_ch_writel(ch, cdma->push_buffer.phys +
-			 cdma->push_buffer.size_bytes + 4,
+	host1x_ch_writel(ch, cdma->push_buffer.dma + cdma->push_buffer.size + 4,
 			 HOST1X_CHANNEL_DMAEND);
 
 	/* reset GET */
@@ -115,9 +114,8 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
 			 HOST1X_CHANNEL_DMACTRL);
 
 	/* set base, end pointer (all of memory) */
-	host1x_ch_writel(ch, cdma->push_buffer.phys, HOST1X_CHANNEL_DMASTART);
-	host1x_ch_writel(ch, cdma->push_buffer.phys +
-			 cdma->push_buffer.size_bytes,
+	host1x_ch_writel(ch, cdma->push_buffer.dma, HOST1X_CHANNEL_DMASTART);
+	host1x_ch_writel(ch, cdma->push_buffer.dma + cdma->push_buffer.size,
 			 HOST1X_CHANNEL_DMAEND);
 
 	/* set GET, by loading the value in PUT (then reset GET) */
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index a91b7c4..5434be9 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -174,9 +174,10 @@ static int do_waitchks(struct host1x_job *job, struct host1x *host,
 	return 0;
 }
 
-static unsigned int pin_job(struct host1x_job *job)
+static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 {
 	unsigned int i;
+	int err;
 
 	job->num_unpins = 0;
 
@@ -186,12 +187,16 @@ static unsigned int pin_job(struct host1x_job *job)
 		dma_addr_t phys_addr;
 
 		reloc->target.bo = host1x_bo_get(reloc->target.bo);
-		if (!reloc->target.bo)
+		if (!reloc->target.bo) {
+			err = -EINVAL;
 			goto unpin;
+		}
 
 		phys_addr = host1x_bo_pin(reloc->target.bo, &sgt);
-		if (!phys_addr)
+		if (!phys_addr) {
+			err = -EINVAL;
 			goto unpin;
+		}
 
 		job->addr_phys[job->num_unpins] = phys_addr;
 		job->unpins[job->num_unpins].bo = reloc->target.bo;
@@ -201,28 +206,67 @@ static unsigned int pin_job(struct host1x_job *job)
 
 	for (i = 0; i < job->num_gathers; i++) {
 		struct host1x_job_gather *g = &job->gathers[i];
+		size_t gather_size = 0;
+		struct scatterlist *sg;
 		struct sg_table *sgt;
 		dma_addr_t phys_addr;
+		unsigned long shift;
+		struct iova *alloc;
+		unsigned int j;
 
 		g->bo = host1x_bo_get(g->bo);
-		if (!g->bo)
+		if (!g->bo) {
+			err = -EINVAL;
 			goto unpin;
+		}
 
 		phys_addr = host1x_bo_pin(g->bo, &sgt);
-		if (!phys_addr)
+		if (!phys_addr) {
+			err = -EINVAL;
 			goto unpin;
+		}
+
+		if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) && host->domain) {
+			for_each_sg(sgt->sgl, sg, sgt->nents, j)
+				gather_size += sg->length;
+			gather_size = iova_align(&host->iova, gather_size);
+
+			shift = iova_shift(&host->iova);
+			alloc = alloc_iova(&host->iova, gather_size >> shift,
+					   host->iova_end >> shift, true);
+			if (!alloc) {
+				err = -ENOMEM;
+				goto unpin;
+			}
+
+			err = iommu_map_sg(host->domain,
+					iova_dma_addr(&host->iova, alloc),
+					sgt->sgl, sgt->nents, IOMMU_READ);
+			if (err == 0) {
+				__free_iova(&host->iova, alloc);
+				err = -EINVAL;
+				goto unpin;
+			}
+
+			job->addr_phys[job->num_unpins] =
+				iova_dma_addr(&host->iova, alloc);
+			job->unpins[job->num_unpins].size = gather_size;
+		} else {
+			job->addr_phys[job->num_unpins] = phys_addr;
+		}
+
+		job->gather_addr_phys[i] = job->addr_phys[job->num_unpins];
 
-		job->addr_phys[job->num_unpins] = phys_addr;
 		job->unpins[job->num_unpins].bo = g->bo;
 		job->unpins[job->num_unpins].sgt = sgt;
 		job->num_unpins++;
 	}
 
-	return job->num_unpins;
+	return 0;
 
 unpin:
 	host1x_job_unpin(job);
-	return 0;
+	return err;
 }
 
 static int do_relocs(struct host1x_job *job, struct host1x_bo *cmdbuf)
@@ -525,8 +569,8 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 		host1x_syncpt_load(host->syncpt + i);
 
 	/* pin memory */
-	err = pin_job(job);
-	if (!err)
+	err = pin_job(host, job);
+	if (err)
 		goto out;
 
 	/* patch gathers */
@@ -569,11 +613,19 @@ EXPORT_SYMBOL(host1x_job_pin);
 
 void host1x_job_unpin(struct host1x_job *job)
 {
+	struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
 	unsigned int i;
 
 	for (i = 0; i < job->num_unpins; i++) {
 		struct host1x_job_unpin_data *unpin = &job->unpins[i];
 
+		if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) && host->domain) {
+			iommu_unmap(host->domain, job->addr_phys[i],
+				    unpin->size);
+			free_iova(&host->iova,
+				iova_pfn(&host->iova, job->addr_phys[i]));
+		}
+
 		host1x_bo_unpin(unpin->bo, unpin->sgt);
 		host1x_bo_put(unpin->bo);
 	}
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
index 8b3c15d..878239c4 100644
--- a/drivers/gpu/host1x/job.h
+++ b/drivers/gpu/host1x/job.h
@@ -44,6 +44,7 @@ struct host1x_waitchk {
 struct host1x_job_unpin_data {
 	struct host1x_bo *bo;
 	struct sg_table *sgt;
+	size_t size;
 };
 
 /*
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/7] dt-bindings: Add bindings for the Tegra VIC
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-12-14 11:16   ` [PATCH v2 4/7] gpu: host1x: Add IOMMU support Mikko Perttunen
@ 2016-12-14 11:16   ` Mikko Perttunen
  2016-12-14 11:16   ` [PATCH v2 7/7] arm64: tegra: Enable IOMMU for Host1x on Tegra210 Mikko Perttunen
  2016-12-14 12:30   ` [PATCH v2 0/7] Host1x IOMMU support + VIC support Daniel Vetter
  6 siblings, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 11:16 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c,
	daniel-/w4YWyX8dFk, Mikko Perttunen

The VIC (Video Image Compositor) is a Host1x client unit
that can do various 2D composition and transform operations.

Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 .../bindings/display/tegra/nvidia,tegra20-host1x.txt        | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt
index 0fad7ed..74e1e8a 100644
--- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt
+++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt
@@ -249,6 +249,19 @@ of the following host1x client modules:
   See ../pinctrl/nvidia,tegra124-dpaux-padctl.txt for information
   regarding the DPAUX pad controller bindings.
 
+- vic: Video Image Compositor
+  - compatible : "nvidia,tegra<chip>-vic"
+  - reg: Physical base address and length of the controller's registers.
+  - interrupts: The interrupt outputs from the controller.
+  - clocks: Must contain an entry for each entry in clock-names.
+    See ../clocks/clock-bindings.txt for details.
+  - clock-names: Must include the following entries:
+    - vic: clock input for the VIC hardware
+  - resets: Must contain an entry for each entry in reset-names.
+    See ../reset/reset.txt for details.
+  - reset-names: Must include the following entries:
+    - vic
+
 Example:
 
 / {
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/7] arm64: tegra: Enable VIC on T210
  2016-12-14 11:16 [PATCH v2 0/7] Host1x IOMMU support + VIC support Mikko Perttunen
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2016-12-14 11:16 ` Mikko Perttunen
  1 sibling, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 11:16 UTC (permalink / raw)
  To: thierry.reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

Enable the VIC (Video Image Compositor) host1x
unit on Tegra210 systems.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 arch/arm64/boot/dts/nvidia/tegra210.dtsi | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
index 46045fe..e12390f 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
@@ -122,7 +122,14 @@
 		vic@54340000 {
 			compatible = "nvidia,tegra210-vic";
 			reg = <0x0 0x54340000 0x0 0x00040000>;
-			status = "disabled";
+			interrupts = <GIC_SPI 72 IRQ_TYPE_LEVEL_HIGH>;
+			clocks = <&tegra_car TEGRA210_CLK_VIC03>;
+			clock-names = "vic";
+			resets = <&tegra_car 178>;
+			reset-names = "vic";
+
+			iommus = <&mc TEGRA_SWGROUP_VIC>;
+			power-domains = <&pd_vic>;
 		};
 
 		nvjpg@54380000 {
@@ -692,6 +699,14 @@
 				resets = <&tegra_car TEGRA210_CLK_XUSB_HOST>;
 				#power-domain-cells = <0>;
 			};
+
+			pd_vic: vic {
+				clocks = <&tegra_car TEGRA210_CLK_VIC03>;
+				clock-names = "vic";
+				resets = <&tegra_car TEGRA210_CLK_VIC03>;
+				reset-names = "vic";
+				#power-domain-cells = <0>;
+			};
 		};
 	};
 
-- 
2.10.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 7/7] arm64: tegra: Enable IOMMU for Host1x on Tegra210
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-12-14 11:16   ` [PATCH v2 5/7] dt-bindings: Add bindings for the Tegra VIC Mikko Perttunen
@ 2016-12-14 11:16   ` Mikko Perttunen
  2016-12-14 12:30   ` [PATCH v2 0/7] Host1x IOMMU support + VIC support Daniel Vetter
  6 siblings, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 11:16 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c,
	daniel-/w4YWyX8dFk, Mikko Perttunen

The Host1x driver now supports operation behind an IOMMU,
so add its IOMMU domain to the device tree.

Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 arch/arm64/boot/dts/nvidia/tegra210.dtsi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
index e12390f..8d229c7 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
@@ -26,6 +26,8 @@
 
 		ranges = <0x0 0x54000000 0x0 0x54000000 0x0 0x01000000>;
 
+		iommus = <&mc TEGRA_SWGROUP_HC>;
+
 		dpaux1: dpaux@54040000 {
 			compatible = "nvidia,tegra210-dpaux";
 			reg = <0x0 0x54040000 0x0 0x00040000>;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
  2016-12-14 11:16   ` [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API Mikko Perttunen
@ 2016-12-14 11:35     ` Lucas Stach
       [not found]       ` <1481715331.2313.28.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Lucas Stach @ 2016-12-14 11:35 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: linux-tegra, dri-devel

Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen:
> Add a new IO virtual memory allocation API to allow clients to
> allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
> required e.g. for loading client firmware when clients are attached
> to the IOMMU domain.
> 
> The allocator allocates contiguous physical pages that are then
> mapped contiguously to the IOMMU domain using the iova_domain
> library provided by the kernel. Contiguous physical pages are
> used so that the same allocator works also when IOMMU support
> is disabled and therefore devices access physical memory directly.
> 
Why is this needed? If you use the DMA API for those buffers you should
end up with CMA memory in the !IOMMU case and normal paged memory with
IOMMU enabled. From my understanding this should match the requirements.

Also how big can those firmware images get? Will __get_free_pages be
able to provide this much contig memory in a long running system with
fragmented memory space? Isn't CMA the better option here?

Regards,
Lucas

> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 111 +++++++++++++++++++++++++++++++++++++++++---
>  drivers/gpu/drm/tegra/drm.h |  11 +++++
>  2 files changed, 115 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index b8be3ee..cf714c6 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -1,12 +1,13 @@
>  /*
>   * Copyright (C) 2012 Avionic Design GmbH
> - * Copyright (C) 2012-2013 NVIDIA CORPORATION.  All rights reserved.
> + * Copyright (C) 2012-2016 NVIDIA CORPORATION.  All rights reserved.
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License version 2 as
>   * published by the Free Software Foundation.
>   */
>  
> +#include <linux/bitops.h>
>  #include <linux/host1x.h>
>  #include <linux/iommu.h>
>  
> @@ -23,6 +24,8 @@
>  #define DRIVER_MINOR 0
>  #define DRIVER_PATCHLEVEL 0
>  
> +#define CARVEOUT_SZ SZ_64M
> +
>  struct tegra_drm_file {
>  	struct list_head contexts;
>  };
> @@ -127,7 +130,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>  
>  	if (iommu_present(&platform_bus_type)) {
>  		struct iommu_domain_geometry *geometry;
> -		u64 start, end;
> +		unsigned long order;
> +		u64 carveout_start, carveout_end, gem_start, gem_end;
>  
>  		tegra->domain = iommu_domain_alloc(&platform_bus_type);
>  		if (!tegra->domain) {
> @@ -136,12 +140,25 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>  		}
>  
>  		geometry = &tegra->domain->geometry;
> -		start = geometry->aperture_start;
> -		end = geometry->aperture_end;
> +		gem_start = geometry->aperture_start;
> +		gem_end = geometry->aperture_end - CARVEOUT_SZ;
> +		carveout_start = gem_end + 1;
> +		carveout_end = geometry->aperture_end;
> +
> +		order = __ffs(tegra->domain->pgsize_bitmap);
> +		init_iova_domain(&tegra->carveout.domain, 1UL << order,
> +				 carveout_start >> order,
> +				 carveout_end >> order);
> +
> +		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
> +		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
> +
> +		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
>  
> -		DRM_DEBUG_DRIVER("IOMMU aperture initialized (%#llx-%#llx)\n",
> -				 start, end);
> -		drm_mm_init(&tegra->mm, start, end - start + 1);
> +		DRM_DEBUG("IOMMU apertures:\n");
> +		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
> +		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
> +			  carveout_end);
>  	}
>  
>  	mutex_init(&tegra->clients_lock);
> @@ -208,6 +225,7 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>  	if (tegra->domain) {
>  		iommu_domain_free(tegra->domain);
>  		drm_mm_takedown(&tegra->mm);
> +		put_iova_domain(&tegra->carveout.domain);
>  	}
>  free:
>  	kfree(tegra);
> @@ -232,6 +250,7 @@ static int tegra_drm_unload(struct drm_device *drm)
>  	if (tegra->domain) {
>  		iommu_domain_free(tegra->domain);
>  		drm_mm_takedown(&tegra->mm);
> +		put_iova_domain(&tegra->carveout.domain);
>  	}
>  
>  	kfree(tegra);
> @@ -975,6 +994,84 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>  	return 0;
>  }
>  
> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size,
> +			      dma_addr_t *dma)
> +{
> +	struct iova *alloc;
> +	void *virt;
> +	gfp_t gfp;
> +	int err;
> +
> +	if (tegra->domain)
> +		size = iova_align(&tegra->carveout.domain, size);
> +	else
> +		size = PAGE_ALIGN(size);
> +
> +	gfp = GFP_KERNEL | __GFP_ZERO;
> +	if (!tegra->domain) {
> +		/*
> +		 * Many units only support 32-bit addresses, even on 64-bit
> +		 * SoCs. If there is no IOMMU to translate into a 32-bit IO
> +		 * virtual address space, force allocations to be in the
> +		 * lower 32-bit range.
> +		 */
> +		gfp |= GFP_DMA;
> +	}
> +
> +	virt = (void *)__get_free_pages(gfp, get_order(size));
> +	if (!virt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (!tegra->domain) {
> +		/*
> +		 * If IOMMU is disabled, devices address physical memory
> +		 * directly.
> +		 */
> +		*dma = virt_to_phys(virt);
> +		return virt;
> +	}
> +
> +	alloc = alloc_iova(&tegra->carveout.domain,
> +			   size >> tegra->carveout.shift,
> +			   tegra->carveout.limit, true);
> +	if (!alloc) {
> +		err = -EBUSY;
> +		goto free_pages;
> +	}
> +
> +	*dma = iova_dma_addr(&tegra->carveout.domain, alloc);
> +	err = iommu_map(tegra->domain, *dma, virt_to_phys(virt),
> +			size, IOMMU_READ | IOMMU_WRITE);
> +	if (err < 0)
> +		goto free_iova;
> +
> +	return virt;
> +
> +free_iova:
> +	__free_iova(&tegra->carveout.domain, alloc);
> +free_pages:
> +	free_pages((unsigned long)virt, get_order(size));
> +
> +	return ERR_PTR(err);
> +}
> +
> +void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
> +		    dma_addr_t dma)
> +{
> +	if (tegra->domain)
> +		size = iova_align(&tegra->carveout.domain, size);
> +	else
> +		size = PAGE_ALIGN(size);
> +
> +	if (tegra->domain) {
> +		iommu_unmap(tegra->domain, dma, size);
> +		free_iova(&tegra->carveout.domain,
> +			  iova_pfn(&tegra->carveout.domain, dma));
> +	}
> +
> +	free_pages((unsigned long)virt, get_order(size));
> +}
> +
>  static int host1x_drm_probe(struct host1x_device *dev)
>  {
>  	struct drm_driver *driver = &tegra_drm_driver;
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 0ddcce1..7351dee 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -12,6 +12,7 @@
>  
>  #include <uapi/drm/tegra_drm.h>
>  #include <linux/host1x.h>
> +#include <linux/iova.h>
>  #include <linux/of_gpio.h>
>  
>  #include <drm/drmP.h>
> @@ -43,6 +44,12 @@ struct tegra_drm {
>  	struct iommu_domain *domain;
>  	struct drm_mm mm;
>  
> +	struct {
> +		struct iova_domain domain;
> +		unsigned long shift;
> +		unsigned long limit;
> +	} carveout;
> +
>  	struct mutex clients_lock;
>  	struct list_head clients;
>  
> @@ -104,6 +111,10 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>  int tegra_drm_init(struct tegra_drm *tegra, struct drm_device *drm);
>  int tegra_drm_exit(struct tegra_drm *tegra);
>  
> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size, dma_addr_t *iova);
> +void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
> +		    dma_addr_t iova);
> +
>  struct tegra_dc_soc_info;
>  struct tegra_output;
>  


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
       [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-12-14 11:16   ` [PATCH v2 7/7] arm64: tegra: Enable IOMMU for Host1x on Tegra210 Mikko Perttunen
@ 2016-12-14 12:30   ` Daniel Vetter
       [not found]     ` <20161214123004.urnvlqarcnp5rp4a-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  2016-12-14 18:26     ` Emil Velikov
  6 siblings, 2 replies; 28+ messages in thread
From: Daniel Vetter @ 2016-12-14 12:30 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c,
	daniel-/w4YWyX8dFk

On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
> This series adds IOMMU support to Host1x and TegraDRM
> and adds support for the VIC (Video Image Compositor)
> host1x client. The series is available as a git repository at
> git://github.com/cyndis/linux.git; branch vic-2.
> 
> A userspace test case for VIC can be found at
> https://github.com/cyndis/drm/tree/work/tegra.
> The testcase is in tests/tegra and is called submit_vic.
> The testcase/TRM include full headers and documentation
> to program the unit. The unit by itself, however, does not
> readily map to existing userspace library interfaces, so
> implementations for those are not provided.

Afaik libva has an entire pile of post-processing support. Pretty sure
other video transcode libraries have similar interfaces, so should all be
possible to implement this.

Until that exists I really think that the VIC part (and only that, since
tk1/tx1 in general seems to work with nouveau and all that) should stay
out of tree.
-Daniel

> 
> The in-kernel firewall is not implemented for VIC;
> therefore, either the IOMMU must be enabled or the firewall
> disabled for the test to pass.
> 
> Tested with Jetson TX1 (T210). Probably works also
> with Jetson TK1 (T124). Note that due to hardware changes
> the testcase also needs to be changed to run properly
> on T124: this can be done by removing the topmost
> commit of the aforementioned test repository.
> 
> Thanks,
>   Mikko.
> 
> Arto Merilainen (2):
>   drm/tegra: Add falcon helper library
>   drm/tegra: Add VIC support
> 
> Mikko Perttunen (5):
>   drm/tegra: Add Tegra DRM allocation API
>   gpu: host1x: Add IOMMU support
>   dt-bindings: Add bindings for the Tegra VIC
>   arm64: tegra: Enable VIC on T210
>   arm64: tegra: Enable IOMMU for Host1x on Tegra210
> 
>  .../display/tegra/nvidia,tegra20-host1x.txt        |  13 +
>  arch/arm64/boot/dts/nvidia/tegra210.dtsi           |  19 +-
>  drivers/gpu/drm/tegra/Makefile                     |   4 +-
>  drivers/gpu/drm/tegra/drm.c                        | 114 +++++-
>  drivers/gpu/drm/tegra/drm.h                        |  12 +
>  drivers/gpu/drm/tegra/falcon.c                     | 259 ++++++++++++++
>  drivers/gpu/drm/tegra/falcon.h                     | 127 +++++++
>  drivers/gpu/drm/tegra/vic.c                        | 396 +++++++++++++++++++++
>  drivers/gpu/drm/tegra/vic.h                        |  31 ++
>  drivers/gpu/host1x/cdma.c                          |  74 +++-
>  drivers/gpu/host1x/cdma.h                          |   6 +-
>  drivers/gpu/host1x/dev.c                           |  40 ++-
>  drivers/gpu/host1x/dev.h                           |   6 +
>  drivers/gpu/host1x/hw/cdma_hw.c                    |  16 +-
>  drivers/gpu/host1x/job.c                           |  72 +++-
>  drivers/gpu/host1x/job.h                           |   1 +
>  include/linux/host1x.h                             |   1 +
>  17 files changed, 1143 insertions(+), 48 deletions(-)
>  create mode 100644 drivers/gpu/drm/tegra/falcon.c
>  create mode 100644 drivers/gpu/drm/tegra/falcon.h
>  create mode 100644 drivers/gpu/drm/tegra/vic.c
>  create mode 100644 drivers/gpu/drm/tegra/vic.h
> 
> -- 
> 2.10.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
       [not found]       ` <1481715331.2313.28.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
@ 2016-12-14 12:36         ` Mikko Perttunen
  2016-12-14 12:56           ` Lucas Stach
  2016-12-14 13:48         ` Thierry Reding
  1 sibling, 1 reply; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 12:36 UTC (permalink / raw)
  To: Lucas Stach, Mikko Perttunen
  Cc: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 14.12.2016 13:35, Lucas Stach wrote:
> Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen:
>> Add a new IO virtual memory allocation API to allow clients to
>> allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
>> required e.g. for loading client firmware when clients are attached
>> to the IOMMU domain.
>>
>> The allocator allocates contiguous physical pages that are then
>> mapped contiguously to the IOMMU domain using the iova_domain
>> library provided by the kernel. Contiguous physical pages are
>> used so that the same allocator works also when IOMMU support
>> is disabled and therefore devices access physical memory directly.
>>
> Why is this needed? If you use the DMA API for those buffers you should
> end up with CMA memory in the !IOMMU case and normal paged memory with
> IOMMU enabled. From my understanding this should match the requirements.

Yes, memory allocated with the DMA API should work as well, but would 
also require passing the device pointer for the device that is 
allocating the memory, which isn't a problem, but this way we don't need 
that.

>
> Also how big can those firmware images get? Will __get_free_pages be
> able to provide this much contig memory in a long running system with
> fragmented memory space? Isn't CMA the better option here?

The largest is about 140 kilobytes. The space is also allocated when the 
device is initialized which usually happens during boot.

>
> Regards,
> Lucas

Thanks,
Mikko.

>
>> Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>> ---
>>  drivers/gpu/drm/tegra/drm.c | 111 +++++++++++++++++++++++++++++++++++++++++---
>>  drivers/gpu/drm/tegra/drm.h |  11 +++++
>>  2 files changed, 115 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index b8be3ee..cf714c6 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -1,12 +1,13 @@
>>  /*
>>   * Copyright (C) 2012 Avionic Design GmbH
>> - * Copyright (C) 2012-2013 NVIDIA CORPORATION.  All rights reserved.
>> + * Copyright (C) 2012-2016 NVIDIA CORPORATION.  All rights reserved.
>>   *
>>   * This program is free software; you can redistribute it and/or modify
>>   * it under the terms of the GNU General Public License version 2 as
>>   * published by the Free Software Foundation.
>>   */
>>
>> +#include <linux/bitops.h>
>>  #include <linux/host1x.h>
>>  #include <linux/iommu.h>
>>
>> @@ -23,6 +24,8 @@
>>  #define DRIVER_MINOR 0
>>  #define DRIVER_PATCHLEVEL 0
>>
>> +#define CARVEOUT_SZ SZ_64M
>> +
>>  struct tegra_drm_file {
>>  	struct list_head contexts;
>>  };
>> @@ -127,7 +130,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>
>>  	if (iommu_present(&platform_bus_type)) {
>>  		struct iommu_domain_geometry *geometry;
>> -		u64 start, end;
>> +		unsigned long order;
>> +		u64 carveout_start, carveout_end, gem_start, gem_end;
>>
>>  		tegra->domain = iommu_domain_alloc(&platform_bus_type);
>>  		if (!tegra->domain) {
>> @@ -136,12 +140,25 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>  		}
>>
>>  		geometry = &tegra->domain->geometry;
>> -		start = geometry->aperture_start;
>> -		end = geometry->aperture_end;
>> +		gem_start = geometry->aperture_start;
>> +		gem_end = geometry->aperture_end - CARVEOUT_SZ;
>> +		carveout_start = gem_end + 1;
>> +		carveout_end = geometry->aperture_end;
>> +
>> +		order = __ffs(tegra->domain->pgsize_bitmap);
>> +		init_iova_domain(&tegra->carveout.domain, 1UL << order,
>> +				 carveout_start >> order,
>> +				 carveout_end >> order);
>> +
>> +		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
>> +		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
>> +
>> +		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
>>
>> -		DRM_DEBUG_DRIVER("IOMMU aperture initialized (%#llx-%#llx)\n",
>> -				 start, end);
>> -		drm_mm_init(&tegra->mm, start, end - start + 1);
>> +		DRM_DEBUG("IOMMU apertures:\n");
>> +		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
>> +		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
>> +			  carveout_end);
>>  	}
>>
>>  	mutex_init(&tegra->clients_lock);
>> @@ -208,6 +225,7 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>  	if (tegra->domain) {
>>  		iommu_domain_free(tegra->domain);
>>  		drm_mm_takedown(&tegra->mm);
>> +		put_iova_domain(&tegra->carveout.domain);
>>  	}
>>  free:
>>  	kfree(tegra);
>> @@ -232,6 +250,7 @@ static int tegra_drm_unload(struct drm_device *drm)
>>  	if (tegra->domain) {
>>  		iommu_domain_free(tegra->domain);
>>  		drm_mm_takedown(&tegra->mm);
>> +		put_iova_domain(&tegra->carveout.domain);
>>  	}
>>
>>  	kfree(tegra);
>> @@ -975,6 +994,84 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>>  	return 0;
>>  }
>>
>> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size,
>> +			      dma_addr_t *dma)
>> +{
>> +	struct iova *alloc;
>> +	void *virt;
>> +	gfp_t gfp;
>> +	int err;
>> +
>> +	if (tegra->domain)
>> +		size = iova_align(&tegra->carveout.domain, size);
>> +	else
>> +		size = PAGE_ALIGN(size);
>> +
>> +	gfp = GFP_KERNEL | __GFP_ZERO;
>> +	if (!tegra->domain) {
>> +		/*
>> +		 * Many units only support 32-bit addresses, even on 64-bit
>> +		 * SoCs. If there is no IOMMU to translate into a 32-bit IO
>> +		 * virtual address space, force allocations to be in the
>> +		 * lower 32-bit range.
>> +		 */
>> +		gfp |= GFP_DMA;
>> +	}
>> +
>> +	virt = (void *)__get_free_pages(gfp, get_order(size));
>> +	if (!virt)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	if (!tegra->domain) {
>> +		/*
>> +		 * If IOMMU is disabled, devices address physical memory
>> +		 * directly.
>> +		 */
>> +		*dma = virt_to_phys(virt);
>> +		return virt;
>> +	}
>> +
>> +	alloc = alloc_iova(&tegra->carveout.domain,
>> +			   size >> tegra->carveout.shift,
>> +			   tegra->carveout.limit, true);
>> +	if (!alloc) {
>> +		err = -EBUSY;
>> +		goto free_pages;
>> +	}
>> +
>> +	*dma = iova_dma_addr(&tegra->carveout.domain, alloc);
>> +	err = iommu_map(tegra->domain, *dma, virt_to_phys(virt),
>> +			size, IOMMU_READ | IOMMU_WRITE);
>> +	if (err < 0)
>> +		goto free_iova;
>> +
>> +	return virt;
>> +
>> +free_iova:
>> +	__free_iova(&tegra->carveout.domain, alloc);
>> +free_pages:
>> +	free_pages((unsigned long)virt, get_order(size));
>> +
>> +	return ERR_PTR(err);
>> +}
>> +
>> +void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
>> +		    dma_addr_t dma)
>> +{
>> +	if (tegra->domain)
>> +		size = iova_align(&tegra->carveout.domain, size);
>> +	else
>> +		size = PAGE_ALIGN(size);
>> +
>> +	if (tegra->domain) {
>> +		iommu_unmap(tegra->domain, dma, size);
>> +		free_iova(&tegra->carveout.domain,
>> +			  iova_pfn(&tegra->carveout.domain, dma));
>> +	}
>> +
>> +	free_pages((unsigned long)virt, get_order(size));
>> +}
>> +
>>  static int host1x_drm_probe(struct host1x_device *dev)
>>  {
>>  	struct drm_driver *driver = &tegra_drm_driver;
>> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
>> index 0ddcce1..7351dee 100644
>> --- a/drivers/gpu/drm/tegra/drm.h
>> +++ b/drivers/gpu/drm/tegra/drm.h
>> @@ -12,6 +12,7 @@
>>
>>  #include <uapi/drm/tegra_drm.h>
>>  #include <linux/host1x.h>
>> +#include <linux/iova.h>
>>  #include <linux/of_gpio.h>
>>
>>  #include <drm/drmP.h>
>> @@ -43,6 +44,12 @@ struct tegra_drm {
>>  	struct iommu_domain *domain;
>>  	struct drm_mm mm;
>>
>> +	struct {
>> +		struct iova_domain domain;
>> +		unsigned long shift;
>> +		unsigned long limit;
>> +	} carveout;
>> +
>>  	struct mutex clients_lock;
>>  	struct list_head clients;
>>
>> @@ -104,6 +111,10 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>>  int tegra_drm_init(struct tegra_drm *tegra, struct drm_device *drm);
>>  int tegra_drm_exit(struct tegra_drm *tegra);
>>
>> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size, dma_addr_t *iova);
>> +void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
>> +		    dma_addr_t iova);
>> +
>>  struct tegra_dc_soc_info;
>>  struct tegra_output;
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
       [not found]     ` <20161214123004.urnvlqarcnp5rp4a-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2016-12-14 12:41       ` Mikko Perttunen
  2016-12-14 13:05         ` Daniel Vetter
  2016-12-14 14:18       ` Thierry Reding
  1 sibling, 1 reply; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 12:41 UTC (permalink / raw)
  To: Daniel Vetter, Mikko Perttunen
  Cc: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c

On 14.12.2016 14:30, Daniel Vetter wrote:
> On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
>> This series adds IOMMU support to Host1x and TegraDRM
>> and adds support for the VIC (Video Image Compositor)
>> host1x client. The series is available as a git repository at
>> git://github.com/cyndis/linux.git; branch vic-2.
>>
>> A userspace test case for VIC can be found at
>> https://github.com/cyndis/drm/tree/work/tegra.
>> The testcase is in tests/tegra and is called submit_vic.
>> The testcase/TRM include full headers and documentation
>> to program the unit. The unit by itself, however, does not
>> readily map to existing userspace library interfaces, so
>> implementations for those are not provided.
>
> Afaik libva has an entire pile of post-processing support. Pretty sure
> other video transcode libraries have similar interfaces, so should all be
> possible to implement this.

We don't have any actual video transcoding support though, so unless 
it's possible to just implement a part of libva and defer the rest to 
some CPU implementation, I don't see how this is useful. I suppose I 
could implement a GStreamer plugin for colorspace conversion or 
resizing, since those are very modular.

>
> Until that exists I really think that the VIC part (and only that, since
> tk1/tx1 in general seems to work with nouveau and all that) should stay
> out of tree.

Ok. I'll try to whip up something.

> -Daniel

Thanks,
Mikko.

>
>>
>> The in-kernel firewall is not implemented for VIC;
>> therefore, either the IOMMU must be enabled or the firewall
>> disabled for the test to pass.
>>
>> Tested with Jetson TX1 (T210). Probably works also
>> with Jetson TK1 (T124). Note that due to hardware changes
>> the testcase also needs to be changed to run properly
>> on T124: this can be done by removing the topmost
>> commit of the aforementioned test repository.
>>
>> Thanks,
>>   Mikko.
>>
>> Arto Merilainen (2):
>>   drm/tegra: Add falcon helper library
>>   drm/tegra: Add VIC support
>>
>> Mikko Perttunen (5):
>>   drm/tegra: Add Tegra DRM allocation API
>>   gpu: host1x: Add IOMMU support
>>   dt-bindings: Add bindings for the Tegra VIC
>>   arm64: tegra: Enable VIC on T210
>>   arm64: tegra: Enable IOMMU for Host1x on Tegra210
>>
>>  .../display/tegra/nvidia,tegra20-host1x.txt        |  13 +
>>  arch/arm64/boot/dts/nvidia/tegra210.dtsi           |  19 +-
>>  drivers/gpu/drm/tegra/Makefile                     |   4 +-
>>  drivers/gpu/drm/tegra/drm.c                        | 114 +++++-
>>  drivers/gpu/drm/tegra/drm.h                        |  12 +
>>  drivers/gpu/drm/tegra/falcon.c                     | 259 ++++++++++++++
>>  drivers/gpu/drm/tegra/falcon.h                     | 127 +++++++
>>  drivers/gpu/drm/tegra/vic.c                        | 396 +++++++++++++++++++++
>>  drivers/gpu/drm/tegra/vic.h                        |  31 ++
>>  drivers/gpu/host1x/cdma.c                          |  74 +++-
>>  drivers/gpu/host1x/cdma.h                          |   6 +-
>>  drivers/gpu/host1x/dev.c                           |  40 ++-
>>  drivers/gpu/host1x/dev.h                           |   6 +
>>  drivers/gpu/host1x/hw/cdma_hw.c                    |  16 +-
>>  drivers/gpu/host1x/job.c                           |  72 +++-
>>  drivers/gpu/host1x/job.h                           |   1 +
>>  include/linux/host1x.h                             |   1 +
>>  17 files changed, 1143 insertions(+), 48 deletions(-)
>>  create mode 100644 drivers/gpu/drm/tegra/falcon.c
>>  create mode 100644 drivers/gpu/drm/tegra/falcon.h
>>  create mode 100644 drivers/gpu/drm/tegra/vic.c
>>  create mode 100644 drivers/gpu/drm/tegra/vic.h
>>
>> --
>> 2.10.2
>>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
  2016-12-14 12:36         ` Mikko Perttunen
@ 2016-12-14 12:56           ` Lucas Stach
       [not found]             ` <1481720206.2313.32.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Lucas Stach @ 2016-12-14 12:56 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: linux-tegra, dri-devel, Mikko Perttunen

Am Mittwoch, den 14.12.2016, 14:36 +0200 schrieb Mikko Perttunen:
> On 14.12.2016 13:35, Lucas Stach wrote:
> > Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen:
> >> Add a new IO virtual memory allocation API to allow clients to
> >> allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
> >> required e.g. for loading client firmware when clients are attached
> >> to the IOMMU domain.
> >>
> >> The allocator allocates contiguous physical pages that are then
> >> mapped contiguously to the IOMMU domain using the iova_domain
> >> library provided by the kernel. Contiguous physical pages are
> >> used so that the same allocator works also when IOMMU support
> >> is disabled and therefore devices access physical memory directly.
> >>
> > Why is this needed? If you use the DMA API for those buffers you should
> > end up with CMA memory in the !IOMMU case and normal paged memory with
> > IOMMU enabled. From my understanding this should match the requirements.
> 
> Yes, memory allocated with the DMA API should work as well, but would 
> also require passing the device pointer for the device that is 
> allocating the memory, which isn't a problem, but this way we don't need 
> that.
> 
Which in turn would allow you to properly describe the DMA capabilities
of engines that can address more than 32 bits without the IOMMU, which
was one of the comments Thierry had the last time around IIRC.
 
> >
> > Also how big can those firmware images get? Will __get_free_pages be
> > able to provide this much contig memory in a long running system with
> > fragmented memory space? Isn't CMA the better option here?
> 
> The largest is about 140 kilobytes. The space is also allocated when the 
> device is initialized which usually happens during boot.
> 
Which is an order 6 allocation, something you almost certainly don't get
in a fragmented system. I know it may work most of the time, as you are
allocating those buffers early, but is most of the time really enough?
What happens on module unload/load at a later time?

> >
> > Regards,
> > Lucas
> 
> Thanks,
> Mikko.
> 
> >
> >> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> >> ---
> >>  drivers/gpu/drm/tegra/drm.c | 111 +++++++++++++++++++++++++++++++++++++++++---
> >>  drivers/gpu/drm/tegra/drm.h |  11 +++++
> >>  2 files changed, 115 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> >> index b8be3ee..cf714c6 100644
> >> --- a/drivers/gpu/drm/tegra/drm.c
> >> +++ b/drivers/gpu/drm/tegra/drm.c
> >> @@ -1,12 +1,13 @@
> >>  /*
> >>   * Copyright (C) 2012 Avionic Design GmbH
> >> - * Copyright (C) 2012-2013 NVIDIA CORPORATION.  All rights reserved.
> >> + * Copyright (C) 2012-2016 NVIDIA CORPORATION.  All rights reserved.
> >>   *
> >>   * This program is free software; you can redistribute it and/or modify
> >>   * it under the terms of the GNU General Public License version 2 as
> >>   * published by the Free Software Foundation.
> >>   */
> >>
> >> +#include <linux/bitops.h>
> >>  #include <linux/host1x.h>
> >>  #include <linux/iommu.h>
> >>
> >> @@ -23,6 +24,8 @@
> >>  #define DRIVER_MINOR 0
> >>  #define DRIVER_PATCHLEVEL 0
> >>
> >> +#define CARVEOUT_SZ SZ_64M
> >> +
> >>  struct tegra_drm_file {
> >>  	struct list_head contexts;
> >>  };
> >> @@ -127,7 +130,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
> >>
> >>  	if (iommu_present(&platform_bus_type)) {
> >>  		struct iommu_domain_geometry *geometry;
> >> -		u64 start, end;
> >> +		unsigned long order;
> >> +		u64 carveout_start, carveout_end, gem_start, gem_end;
> >>
> >>  		tegra->domain = iommu_domain_alloc(&platform_bus_type);
> >>  		if (!tegra->domain) {
> >> @@ -136,12 +140,25 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
> >>  		}
> >>
> >>  		geometry = &tegra->domain->geometry;
> >> -		start = geometry->aperture_start;
> >> -		end = geometry->aperture_end;
> >> +		gem_start = geometry->aperture_start;
> >> +		gem_end = geometry->aperture_end - CARVEOUT_SZ;
> >> +		carveout_start = gem_end + 1;
> >> +		carveout_end = geometry->aperture_end;
> >> +
> >> +		order = __ffs(tegra->domain->pgsize_bitmap);
> >> +		init_iova_domain(&tegra->carveout.domain, 1UL << order,
> >> +				 carveout_start >> order,
> >> +				 carveout_end >> order);
> >> +
> >> +		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
> >> +		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
> >> +
> >> +		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
> >>
> >> -		DRM_DEBUG_DRIVER("IOMMU aperture initialized (%#llx-%#llx)\n",
> >> -				 start, end);
> >> -		drm_mm_init(&tegra->mm, start, end - start + 1);
> >> +		DRM_DEBUG("IOMMU apertures:\n");
> >> +		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
> >> +		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
> >> +			  carveout_end);
> >>  	}
> >>
> >>  	mutex_init(&tegra->clients_lock);
> >> @@ -208,6 +225,7 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
> >>  	if (tegra->domain) {
> >>  		iommu_domain_free(tegra->domain);
> >>  		drm_mm_takedown(&tegra->mm);
> >> +		put_iova_domain(&tegra->carveout.domain);
> >>  	}
> >>  free:
> >>  	kfree(tegra);
> >> @@ -232,6 +250,7 @@ static int tegra_drm_unload(struct drm_device *drm)
> >>  	if (tegra->domain) {
> >>  		iommu_domain_free(tegra->domain);
> >>  		drm_mm_takedown(&tegra->mm);
> >> +		put_iova_domain(&tegra->carveout.domain);
> >>  	}
> >>
> >>  	kfree(tegra);
> >> @@ -975,6 +994,84 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
> >>  	return 0;
> >>  }
> >>
> >> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size,
> >> +			      dma_addr_t *dma)
> >> +{
> >> +	struct iova *alloc;
> >> +	void *virt;
> >> +	gfp_t gfp;
> >> +	int err;
> >> +
> >> +	if (tegra->domain)
> >> +		size = iova_align(&tegra->carveout.domain, size);
> >> +	else
> >> +		size = PAGE_ALIGN(size);
> >> +
> >> +	gfp = GFP_KERNEL | __GFP_ZERO;
> >> +	if (!tegra->domain) {
> >> +		/*
> >> +		 * Many units only support 32-bit addresses, even on 64-bit
> >> +		 * SoCs. If there is no IOMMU to translate into a 32-bit IO
> >> +		 * virtual address space, force allocations to be in the
> >> +		 * lower 32-bit range.
> >> +		 */
> >> +		gfp |= GFP_DMA;
> >> +	}
> >> +
> >> +	virt = (void *)__get_free_pages(gfp, get_order(size));
> >> +	if (!virt)
> >> +		return ERR_PTR(-ENOMEM);
> >> +
> >> +	if (!tegra->domain) {
> >> +		/*
> >> +		 * If IOMMU is disabled, devices address physical memory
> >> +		 * directly.
> >> +		 */
> >> +		*dma = virt_to_phys(virt);
> >> +		return virt;
> >> +	}
> >> +
> >> +	alloc = alloc_iova(&tegra->carveout.domain,
> >> +			   size >> tegra->carveout.shift,
> >> +			   tegra->carveout.limit, true);
> >> +	if (!alloc) {
> >> +		err = -EBUSY;
> >> +		goto free_pages;
> >> +	}
> >> +
> >> +	*dma = iova_dma_addr(&tegra->carveout.domain, alloc);
> >> +	err = iommu_map(tegra->domain, *dma, virt_to_phys(virt),
> >> +			size, IOMMU_READ | IOMMU_WRITE);
> >> +	if (err < 0)
> >> +		goto free_iova;
> >> +
> >> +	return virt;
> >> +
> >> +free_iova:
> >> +	__free_iova(&tegra->carveout.domain, alloc);
> >> +free_pages:
> >> +	free_pages((unsigned long)virt, get_order(size));
> >> +
> >> +	return ERR_PTR(err);
> >> +}
> >> +
> >> +void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
> >> +		    dma_addr_t dma)
> >> +{
> >> +	if (tegra->domain)
> >> +		size = iova_align(&tegra->carveout.domain, size);
> >> +	else
> >> +		size = PAGE_ALIGN(size);
> >> +
> >> +	if (tegra->domain) {
> >> +		iommu_unmap(tegra->domain, dma, size);
> >> +		free_iova(&tegra->carveout.domain,
> >> +			  iova_pfn(&tegra->carveout.domain, dma));
> >> +	}
> >> +
> >> +	free_pages((unsigned long)virt, get_order(size));
> >> +}
> >> +
> >>  static int host1x_drm_probe(struct host1x_device *dev)
> >>  {
> >>  	struct drm_driver *driver = &tegra_drm_driver;
> >> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> >> index 0ddcce1..7351dee 100644
> >> --- a/drivers/gpu/drm/tegra/drm.h
> >> +++ b/drivers/gpu/drm/tegra/drm.h
> >> @@ -12,6 +12,7 @@
> >>
> >>  #include <uapi/drm/tegra_drm.h>
> >>  #include <linux/host1x.h>
> >> +#include <linux/iova.h>
> >>  #include <linux/of_gpio.h>
> >>
> >>  #include <drm/drmP.h>
> >> @@ -43,6 +44,12 @@ struct tegra_drm {
> >>  	struct iommu_domain *domain;
> >>  	struct drm_mm mm;
> >>
> >> +	struct {
> >> +		struct iova_domain domain;
> >> +		unsigned long shift;
> >> +		unsigned long limit;
> >> +	} carveout;
> >> +
> >>  	struct mutex clients_lock;
> >>  	struct list_head clients;
> >>
> >> @@ -104,6 +111,10 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
> >>  int tegra_drm_init(struct tegra_drm *tegra, struct drm_device *drm);
> >>  int tegra_drm_exit(struct tegra_drm *tegra);
> >>
> >> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size, dma_addr_t *iova);
> >> +void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
> >> +		    dma_addr_t iova);
> >> +
> >>  struct tegra_dc_soc_info;
> >>  struct tegra_output;
> >>
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
  2016-12-14 12:41       ` Mikko Perttunen
@ 2016-12-14 13:05         ` Daniel Vetter
       [not found]           ` <20161214130545.gfpkosyixr5lfkkf-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2016-12-14 13:05 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: dri-devel, Mikko Perttunen, linux-tegra

On Wed, Dec 14, 2016 at 02:41:28PM +0200, Mikko Perttunen wrote:
> On 14.12.2016 14:30, Daniel Vetter wrote:
> > On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
> > > This series adds IOMMU support to Host1x and TegraDRM
> > > and adds support for the VIC (Video Image Compositor)
> > > host1x client. The series is available as a git repository at
> > > git://github.com/cyndis/linux.git; branch vic-2.
> > > 
> > > A userspace test case for VIC can be found at
> > > https://github.com/cyndis/drm/tree/work/tegra.
> > > The testcase is in tests/tegra and is called submit_vic.
> > > The testcase/TRM include full headers and documentation
> > > to program the unit. The unit by itself, however, does not
> > > readily map to existing userspace library interfaces, so
> > > implementations for those are not provided.
> > 
> > Afaik libva has an entire pile of post-processing support. Pretty sure
> > other video transcode libraries have similar interfaces, so should all be
> > possible to implement this.
> 
> We don't have any actual video transcoding support though, so unless it's
> possible to just implement a part of libva and defer the rest to some CPU
> implementation, I don't see how this is useful. I suppose I could implement
> a GStreamer plugin for colorspace conversion or resizing, since those are
> very modular.

Hm, I guess the question then is, how did that get enabled?

> > Until that exists I really think that the VIC part (and only that, since
> > tk1/tx1 in general seems to work with nouveau and all that) should stay
> > out of tree.
> 
> Ok. I'll try to whip up something.

But yeah some gstreamer thing might be doable, but not sure they take that
(I don't have any contacts there). Video is a bit a mess, there's no clear
go-to place for dropping some new hw support.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
       [not found]             ` <1481720206.2313.32.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
@ 2016-12-14 13:17               ` Mikko Perttunen
  0 siblings, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 13:17 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 14.12.2016 14:56, Lucas Stach wrote:
> Am Mittwoch, den 14.12.2016, 14:36 +0200 schrieb Mikko Perttunen:
>> On 14.12.2016 13:35, Lucas Stach wrote:
>>> Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen:
>>>> Add a new IO virtual memory allocation API to allow clients to
>>>> allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
>>>> required e.g. for loading client firmware when clients are attached
>>>> to the IOMMU domain.
>>>>
>>>> The allocator allocates contiguous physical pages that are then
>>>> mapped contiguously to the IOMMU domain using the iova_domain
>>>> library provided by the kernel. Contiguous physical pages are
>>>> used so that the same allocator works also when IOMMU support
>>>> is disabled and therefore devices access physical memory directly.
>>>>
>>> Why is this needed? If you use the DMA API for those buffers you should
>>> end up with CMA memory in the !IOMMU case and normal paged memory with
>>> IOMMU enabled. From my understanding this should match the requirements.
>>
>> Yes, memory allocated with the DMA API should work as well, but would
>> also require passing the device pointer for the device that is
>> allocating the memory, which isn't a problem, but this way we don't need
>> that.
>>
> Which in turn would allow you to properly describe the DMA capabilities
> of engines that can address more than 32 bits without the IOMMU, which
> was one of the comments Thierry had the last time around IIRC.

In this case, just the device's DMA mask wouldn't allow that, since 
chips post Tegra132 do have units that support more than 32 bits 
generally but the firmware must still be in the lower 4G.

I guess this means that we would need to set the DMA mask to the lowest 
common denominator which is 32 bits, even though we want to have the 
full 34 bits for the GEM objects. However, this should only be an issue 
when the IOMMU is disabled so it isn't really that much of a problem.

>
>>>
>>> Also how big can those firmware images get? Will __get_free_pages be
>>> able to provide this much contig memory in a long running system with
>>> fragmented memory space? Isn't CMA the better option here?
>>
>> The largest is about 140 kilobytes. The space is also allocated when the
>> device is initialized which usually happens during boot.
>>
> Which is an order 6 allocation, something you almost certainly don't get
> in a fragmented system. I know it may work most of the time, as you are
> allocating those buffers early, but is most of the time really enough?
> What happens on module unload/load at a later time?

Yeah, it's probably worth fixing.

Thanks,
Mikko.

>
>>>
>>> Regards,
>>> Lucas
>>
>> Thanks,
>> Mikko.
>>
>>>
>>>> Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>  drivers/gpu/drm/tegra/drm.c | 111 +++++++++++++++++++++++++++++++++++++++++---
>>>>  drivers/gpu/drm/tegra/drm.h |  11 +++++
>>>>  2 files changed, 115 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>>> index b8be3ee..cf714c6 100644
>>>> --- a/drivers/gpu/drm/tegra/drm.c
>>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>>> @@ -1,12 +1,13 @@
>>>>  /*
>>>>   * Copyright (C) 2012 Avionic Design GmbH
>>>> - * Copyright (C) 2012-2013 NVIDIA CORPORATION.  All rights reserved.
>>>> + * Copyright (C) 2012-2016 NVIDIA CORPORATION.  All rights reserved.
>>>>   *
>>>>   * This program is free software; you can redistribute it and/or modify
>>>>   * it under the terms of the GNU General Public License version 2 as
>>>>   * published by the Free Software Foundation.
>>>>   */
>>>>
>>>> +#include <linux/bitops.h>
>>>>  #include <linux/host1x.h>
>>>>  #include <linux/iommu.h>
>>>>
>>>> @@ -23,6 +24,8 @@
>>>>  #define DRIVER_MINOR 0
>>>>  #define DRIVER_PATCHLEVEL 0
>>>>
>>>> +#define CARVEOUT_SZ SZ_64M
>>>> +
>>>>  struct tegra_drm_file {
>>>>  	struct list_head contexts;
>>>>  };
>>>> @@ -127,7 +130,8 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>>
>>>>  	if (iommu_present(&platform_bus_type)) {
>>>>  		struct iommu_domain_geometry *geometry;
>>>> -		u64 start, end;
>>>> +		unsigned long order;
>>>> +		u64 carveout_start, carveout_end, gem_start, gem_end;
>>>>
>>>>  		tegra->domain = iommu_domain_alloc(&platform_bus_type);
>>>>  		if (!tegra->domain) {
>>>> @@ -136,12 +140,25 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>>  		}
>>>>
>>>>  		geometry = &tegra->domain->geometry;
>>>> -		start = geometry->aperture_start;
>>>> -		end = geometry->aperture_end;
>>>> +		gem_start = geometry->aperture_start;
>>>> +		gem_end = geometry->aperture_end - CARVEOUT_SZ;
>>>> +		carveout_start = gem_end + 1;
>>>> +		carveout_end = geometry->aperture_end;
>>>> +
>>>> +		order = __ffs(tegra->domain->pgsize_bitmap);
>>>> +		init_iova_domain(&tegra->carveout.domain, 1UL << order,
>>>> +				 carveout_start >> order,
>>>> +				 carveout_end >> order);
>>>> +
>>>> +		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
>>>> +		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
>>>> +
>>>> +		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
>>>>
>>>> -		DRM_DEBUG_DRIVER("IOMMU aperture initialized (%#llx-%#llx)\n",
>>>> -				 start, end);
>>>> -		drm_mm_init(&tegra->mm, start, end - start + 1);
>>>> +		DRM_DEBUG("IOMMU apertures:\n");
>>>> +		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
>>>> +		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
>>>> +			  carveout_end);
>>>>  	}
>>>>
>>>>  	mutex_init(&tegra->clients_lock);
>>>> @@ -208,6 +225,7 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>>>>  	if (tegra->domain) {
>>>>  		iommu_domain_free(tegra->domain);
>>>>  		drm_mm_takedown(&tegra->mm);
>>>> +		put_iova_domain(&tegra->carveout.domain);
>>>>  	}
>>>>  free:
>>>>  	kfree(tegra);
>>>> @@ -232,6 +250,7 @@ static int tegra_drm_unload(struct drm_device *drm)
>>>>  	if (tegra->domain) {
>>>>  		iommu_domain_free(tegra->domain);
>>>>  		drm_mm_takedown(&tegra->mm);
>>>> +		put_iova_domain(&tegra->carveout.domain);
>>>>  	}
>>>>
>>>>  	kfree(tegra);
>>>> @@ -975,6 +994,84 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>>>>  	return 0;
>>>>  }
>>>>
>>>> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size,
>>>> +			      dma_addr_t *dma)
>>>> +{
>>>> +	struct iova *alloc;
>>>> +	void *virt;
>>>> +	gfp_t gfp;
>>>> +	int err;
>>>> +
>>>> +	if (tegra->domain)
>>>> +		size = iova_align(&tegra->carveout.domain, size);
>>>> +	else
>>>> +		size = PAGE_ALIGN(size);
>>>> +
>>>> +	gfp = GFP_KERNEL | __GFP_ZERO;
>>>> +	if (!tegra->domain) {
>>>> +		/*
>>>> +		 * Many units only support 32-bit addresses, even on 64-bit
>>>> +		 * SoCs. If there is no IOMMU to translate into a 32-bit IO
>>>> +		 * virtual address space, force allocations to be in the
>>>> +		 * lower 32-bit range.
>>>> +		 */
>>>> +		gfp |= GFP_DMA;
>>>> +	}
>>>> +
>>>> +	virt = (void *)__get_free_pages(gfp, get_order(size));
>>>> +	if (!virt)
>>>> +		return ERR_PTR(-ENOMEM);
>>>> +
>>>> +	if (!tegra->domain) {
>>>> +		/*
>>>> +		 * If IOMMU is disabled, devices address physical memory
>>>> +		 * directly.
>>>> +		 */
>>>> +		*dma = virt_to_phys(virt);
>>>> +		return virt;
>>>> +	}
>>>> +
>>>> +	alloc = alloc_iova(&tegra->carveout.domain,
>>>> +			   size >> tegra->carveout.shift,
>>>> +			   tegra->carveout.limit, true);
>>>> +	if (!alloc) {
>>>> +		err = -EBUSY;
>>>> +		goto free_pages;
>>>> +	}
>>>> +
>>>> +	*dma = iova_dma_addr(&tegra->carveout.domain, alloc);
>>>> +	err = iommu_map(tegra->domain, *dma, virt_to_phys(virt),
>>>> +			size, IOMMU_READ | IOMMU_WRITE);
>>>> +	if (err < 0)
>>>> +		goto free_iova;
>>>> +
>>>> +	return virt;
>>>> +
>>>> +free_iova:
>>>> +	__free_iova(&tegra->carveout.domain, alloc);
>>>> +free_pages:
>>>> +	free_pages((unsigned long)virt, get_order(size));
>>>> +
>>>> +	return ERR_PTR(err);
>>>> +}
>>>> +
>>>> +void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
>>>> +		    dma_addr_t dma)
>>>> +{
>>>> +	if (tegra->domain)
>>>> +		size = iova_align(&tegra->carveout.domain, size);
>>>> +	else
>>>> +		size = PAGE_ALIGN(size);
>>>> +
>>>> +	if (tegra->domain) {
>>>> +		iommu_unmap(tegra->domain, dma, size);
>>>> +		free_iova(&tegra->carveout.domain,
>>>> +			  iova_pfn(&tegra->carveout.domain, dma));
>>>> +	}
>>>> +
>>>> +	free_pages((unsigned long)virt, get_order(size));
>>>> +}
>>>> +
>>>>  static int host1x_drm_probe(struct host1x_device *dev)
>>>>  {
>>>>  	struct drm_driver *driver = &tegra_drm_driver;
>>>> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
>>>> index 0ddcce1..7351dee 100644
>>>> --- a/drivers/gpu/drm/tegra/drm.h
>>>> +++ b/drivers/gpu/drm/tegra/drm.h
>>>> @@ -12,6 +12,7 @@
>>>>
>>>>  #include <uapi/drm/tegra_drm.h>
>>>>  #include <linux/host1x.h>
>>>> +#include <linux/iova.h>
>>>>  #include <linux/of_gpio.h>
>>>>
>>>>  #include <drm/drmP.h>
>>>> @@ -43,6 +44,12 @@ struct tegra_drm {
>>>>  	struct iommu_domain *domain;
>>>>  	struct drm_mm mm;
>>>>
>>>> +	struct {
>>>> +		struct iova_domain domain;
>>>> +		unsigned long shift;
>>>> +		unsigned long limit;
>>>> +	} carveout;
>>>> +
>>>>  	struct mutex clients_lock;
>>>>  	struct list_head clients;
>>>>
>>>> @@ -104,6 +111,10 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>>>>  int tegra_drm_init(struct tegra_drm *tegra, struct drm_device *drm);
>>>>  int tegra_drm_exit(struct tegra_drm *tegra);
>>>>
>>>> +void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size, dma_addr_t *iova);
>>>> +void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt,
>>>> +		    dma_addr_t iova);
>>>> +
>>>>  struct tegra_dc_soc_info;
>>>>  struct tegra_output;
>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
       [not found]           ` <20161214130545.gfpkosyixr5lfkkf-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2016-12-14 13:32             ` Mikko Perttunen
  2016-12-14 14:11               ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 13:32 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c

On 14.12.2016 15:05, Daniel Vetter wrote:
> On Wed, Dec 14, 2016 at 02:41:28PM +0200, Mikko Perttunen wrote:
>> On 14.12.2016 14:30, Daniel Vetter wrote:
>>> On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
>>>> This series adds IOMMU support to Host1x and TegraDRM
>>>> and adds support for the VIC (Video Image Compositor)
>>>> host1x client. The series is available as a git repository at
>>>> git://github.com/cyndis/linux.git; branch vic-2.
>>>>
>>>> A userspace test case for VIC can be found at
>>>> https://github.com/cyndis/drm/tree/work/tegra.
>>>> The testcase is in tests/tegra and is called submit_vic.
>>>> The testcase/TRM include full headers and documentation
>>>> to program the unit. The unit by itself, however, does not
>>>> readily map to existing userspace library interfaces, so
>>>> implementations for those are not provided.
>>>
>>> Afaik libva has an entire pile of post-processing support. Pretty sure
>>> other video transcode libraries have similar interfaces, so should all be
>>> possible to implement this.
>>
>> We don't have any actual video transcoding support though, so unless it's
>> possible to just implement a part of libva and defer the rest to some CPU
>> implementation, I don't see how this is useful. I suppose I could implement
>> a GStreamer plugin for colorspace conversion or resizing, since those are
>> very modular.
>
> Hm, I guess the question then is, how did that get enabled?

What is "that"? I'm not exactly sure.

Our architecture is such that there's the VIC that handles colorspace 
conversion, rescaling, blitting and can do some 2d postprocessing 
effects as well.

Then there's the separate NVDEC that is a video bitstream decoder. 
There's no support for that at the moment. I am working on the IP side 
of that.

The video processing pipeline is then such that NVDEC is fed the 
bitstream; NVDEC outputs a YUV picture in a specific format; VIC takes 
that YUV picture and converts/rescales it into the desired format. Or if 
we are encoding video, VIC takes your RGB image, converts it into a 
format that NVENC understands, and so on.

So with just VIC support, I could implement some simple 2D things. I 
don't know if anyone would want to specifically use the VIC for those 
since applications already have fast CPU algorithms. For the video 
pipeline using VIC is nice since these units can synchronize work 
without CPU involvement and when you're already using NVDEC or NVENC 
it's barely any extra effort to involve VIC as well. It can also be 
useful in power usage sensitive situations, but we aren't really fit for 
those situations with the upstream kernel anyway :)

Thanks;
Mikko.

>
>>> Until that exists I really think that the VIC part (and only that, since
>>> tk1/tx1 in general seems to work with nouveau and all that) should stay
>>> out of tree.
>>
>> Ok. I'll try to whip up something.
>
> But yeah some gstreamer thing might be doable, but not sure they take that
> (I don't have any contacts there). Video is a bit a mess, there's no clear
> go-to place for dropping some new hw support.
> -Daniel
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
       [not found]       ` <1481715331.2313.28.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  2016-12-14 12:36         ` Mikko Perttunen
@ 2016-12-14 13:48         ` Thierry Reding
       [not found]           ` <20161214134833.GA29981-EkSeR96xj6Pcmrwk2tT4+A@public.gmane.org>
  1 sibling, 1 reply; 28+ messages in thread
From: Thierry Reding @ 2016-12-14 13:48 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Mikko Perttunen, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]

On Wed, Dec 14, 2016 at 12:35:31PM +0100, Lucas Stach wrote:
> Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen:
> > Add a new IO virtual memory allocation API to allow clients to
> > allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
> > required e.g. for loading client firmware when clients are attached
> > to the IOMMU domain.
> > 
> > The allocator allocates contiguous physical pages that are then
> > mapped contiguously to the IOMMU domain using the iova_domain
> > library provided by the kernel. Contiguous physical pages are
> > used so that the same allocator works also when IOMMU support
> > is disabled and therefore devices access physical memory directly.
> > 
> Why is this needed? If you use the DMA API for those buffers you should
> end up with CMA memory in the !IOMMU case and normal paged memory with
> IOMMU enabled. From my understanding this should match the requirements.

We can't currently use the DMA API for these allocations because it
doesn't allow (or at least didn't back when this was first implemented)
us to share a mapping between two devices.

The reason why we need these patches is that when IOMMU is enabled, then
the units' falcons will read memory through the IOMMU, so we must have
allocations for GEM buffers and the firmware go through the same
mechanism.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
       [not found]           ` <20161214134833.GA29981-EkSeR96xj6Pcmrwk2tT4+A@public.gmane.org>
@ 2016-12-14 14:01             ` Lucas Stach
       [not found]               ` <1481724116.2313.39.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Lucas Stach @ 2016-12-14 14:01 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Mikko Perttunen, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am Mittwoch, den 14.12.2016, 14:48 +0100 schrieb Thierry Reding:
> On Wed, Dec 14, 2016 at 12:35:31PM +0100, Lucas Stach wrote:
> > Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen:
> > > Add a new IO virtual memory allocation API to allow clients to
> > > allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
> > > required e.g. for loading client firmware when clients are attached
> > > to the IOMMU domain.
> > > 
> > > The allocator allocates contiguous physical pages that are then
> > > mapped contiguously to the IOMMU domain using the iova_domain
> > > library provided by the kernel. Contiguous physical pages are
> > > used so that the same allocator works also when IOMMU support
> > > is disabled and therefore devices access physical memory directly.
> > > 
> > Why is this needed? If you use the DMA API for those buffers you should
> > end up with CMA memory in the !IOMMU case and normal paged memory with
> > IOMMU enabled. From my understanding this should match the requirements.
> 
> We can't currently use the DMA API for these allocations because it
> doesn't allow (or at least didn't back when this was first implemented)
> us to share a mapping between two devices.

Hm, maybe I'm overlooking something, but isn't this just a matter of
allocating on one device, then constructing a SG list (dma_get_sgtable)
from the buffer you got and use that to dma_map_sg() the buffer on the
other device?

Maybe doing the firmware buffer allocation on host1x (with a 4GB upper
bound) and then sharing the SG list to the devices?

> The reason why we need these patches is that when IOMMU is enabled, then
> the units' falcons will read memory through the IOMMU, so we must have
> allocations for GEM buffers and the firmware go through the same
> mechanism.

Sorry for maybe dumb questions.

Do you have any engines other than the GPU that can handle SG
themselves? Wouldn't you want the GEM objects to be backed by CMA in
the !MMU case? How are ordinary GEM objects different from the falcon
firmware?

Regards,
Lucas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
  2016-12-14 13:32             ` Mikko Perttunen
@ 2016-12-14 14:11               ` Daniel Vetter
       [not found]                 ` <20161214141142.jmrofknylonhprbk-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2016-12-14 14:11 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: dri-devel, Mikko Perttunen, linux-tegra

On Wed, Dec 14, 2016 at 03:32:16PM +0200, Mikko Perttunen wrote:
> On 14.12.2016 15:05, Daniel Vetter wrote:
> > On Wed, Dec 14, 2016 at 02:41:28PM +0200, Mikko Perttunen wrote:
> > > On 14.12.2016 14:30, Daniel Vetter wrote:
> > > > On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
> > > > > This series adds IOMMU support to Host1x and TegraDRM
> > > > > and adds support for the VIC (Video Image Compositor)
> > > > > host1x client. The series is available as a git repository at
> > > > > git://github.com/cyndis/linux.git; branch vic-2.
> > > > > 
> > > > > A userspace test case for VIC can be found at
> > > > > https://github.com/cyndis/drm/tree/work/tegra.
> > > > > The testcase is in tests/tegra and is called submit_vic.
> > > > > The testcase/TRM include full headers and documentation
> > > > > to program the unit. The unit by itself, however, does not
> > > > > readily map to existing userspace library interfaces, so
> > > > > implementations for those are not provided.
> > > > 
> > > > Afaik libva has an entire pile of post-processing support. Pretty sure
> > > > other video transcode libraries have similar interfaces, so should all be
> > > > possible to implement this.
> > > 
> > > We don't have any actual video transcoding support though, so unless it's
> > > possible to just implement a part of libva and defer the rest to some CPU
> > > implementation, I don't see how this is useful. I suppose I could implement
> > > a GStreamer plugin for colorspace conversion or resizing, since those are
> > > very modular.
> > 
> > Hm, I guess the question then is, how did that get enabled?
> 
> What is "that"? I'm not exactly sure.
> 
> Our architecture is such that there's the VIC that handles colorspace
> conversion, rescaling, blitting and can do some 2d postprocessing effects as
> well.
> 
> Then there's the separate NVDEC that is a video bitstream decoder. There's
> no support for that at the moment. I am working on the IP side of that.
> 
> The video processing pipeline is then such that NVDEC is fed the bitstream;
> NVDEC outputs a YUV picture in a specific format; VIC takes that YUV picture
> and converts/rescales it into the desired format. Or if we are encoding
> video, VIC takes your RGB image, converts it into a format that NVENC
> understands, and so on.
> 
> So with just VIC support, I could implement some simple 2D things. I don't
> know if anyone would want to specifically use the VIC for those since
> applications already have fast CPU algorithms. For the video pipeline using
> VIC is nice since these units can synchronize work without CPU involvement
> and when you're already using NVDEC or NVENC it's barely any extra effort to
> involve VIC as well. It can also be useful in power usage sensitive
> situations, but we aren't really fit for those situations with the upstream
> kernel anyway :)

Ah I thought the nvdec was already enabled, since for i915 that's how we
went about things (we have a pretty much exactly matching split in the
various video related engines). But if that's not there yet then no
worries, all fine.

Since you do seem to plan to enable everything anyway, might be worth it
to go directly with something like libva or libvdpau or whatever the cool
thing is. libva is my recommendation since it works on non-X11 too afaik,
but I have 0 clue. And might be worth it to check out whether you can't do
a super-basic libva driver that only does the post processing stuff. With
libva you can import/export images, so it might be possible even ... And
directly doing the full video engine support instead of a one-off in
gstreamer sounds more sensible to me.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
       [not found]     ` <20161214123004.urnvlqarcnp5rp4a-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  2016-12-14 12:41       ` Mikko Perttunen
@ 2016-12-14 14:18       ` Thierry Reding
  1 sibling, 0 replies; 28+ messages in thread
From: Thierry Reding @ 2016-12-14 14:18 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mikko Perttunen, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c

[-- Attachment #1: Type: text/plain, Size: 3344 bytes --]

On Wed, Dec 14, 2016 at 01:30:04PM +0100, Daniel Vetter wrote:
> On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
> > This series adds IOMMU support to Host1x and TegraDRM
> > and adds support for the VIC (Video Image Compositor)
> > host1x client. The series is available as a git repository at
> > git://github.com/cyndis/linux.git; branch vic-2.
> > 
> > A userspace test case for VIC can be found at
> > https://github.com/cyndis/drm/tree/work/tegra.
> > The testcase is in tests/tegra and is called submit_vic.
> > The testcase/TRM include full headers and documentation
> > to program the unit. The unit by itself, however, does not
> > readily map to existing userspace library interfaces, so
> > implementations for those are not provided.
> 
> Afaik libva has an entire pile of post-processing support. Pretty sure
> other video transcode libraries have similar interfaces, so should all be
> possible to implement this.
> 
> Until that exists I really think that the VIC part (and only that, since
> tk1/tx1 in general seems to work with nouveau and all that) should stay
> out of tree.

I think the VIC is somewhat special with regard to userspace ABI. I
understand and completely agree with the requirements the community
has established. However, as Mikko already said, there's no use for
standalone VIC at all. There aren't any APIs that map to what the
hardware does (except maybe parts of X or perhaps pixman, neither
of which are likely to see implementations using VIC). I'm also not
aware of any of the proprietary driver stack using only VIC for any
use cases.

The primary reason that I want VIC support upstream is because it
makes it easier to test host1x support. Currently the only units that we
can test against are gr2d and gr3d on Tegra20 through Tegra114. With
Tegra124 gr2d and gr3d went away and got replaced by the gk20a GPU which
doesn't use the host1x infrastructure. So on Tegra124 and later we
currently have no way of exercising that code. Now, Tegra124 and
Tegra210 are by far the most popular generations at this point and there
are very few people running Tegra20 or Tegra30 with an upstream kernel
(there are some still, and I think they're all awesome) so the host1x
infrastructure is effectively untested and therefore I get very nervous
whenever I have to apply patches against it because we can only validate
on very old chips.

Tegra124 is also somewhat of a special case. While it has a GPU that's
driven by Nouveau, it doesn't yet have NVDEC and NVENC support that the
newer Tegra210 has. The video encoding, if I remember correctly, is a
precursor to NVENC (MSENC?) but video decoding is the "legacy" hardware
that we had on earlier chips. That involved a fairly complicated setup
with firmware that needs to be loaded to a fixed address in system
memory (or mapped using the IOMMU). The bottom line is that it's
unlikely that we'll ever see an open-source implementation for video
decoding on Tegra124 (unless somebody reverse-engineers it), and hence
we'd essentially be left without any way of testisc host1x support on
TK1.

All of that said, I could carry these patches out of tree and use them
for testing until we've got a complete userspace ready. It's just
something that I had hoped I could avoid.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
       [not found]                 ` <20161214141142.jmrofknylonhprbk-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2016-12-14 14:29                   ` Mikko Perttunen
  2016-12-14 14:48                   ` Daniel Vetter
  2017-01-17 16:22                   ` Mikko Perttunen
  2 siblings, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2016-12-14 14:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c



On 14.12.2016 16:11, Daniel Vetter wrote:
> On Wed, Dec 14, 2016 at 03:32:16PM +0200, Mikko Perttunen wrote:
>> On 14.12.2016 15:05, Daniel Vetter wrote:
>>> On Wed, Dec 14, 2016 at 02:41:28PM +0200, Mikko Perttunen wrote:
>>>> On 14.12.2016 14:30, Daniel Vetter wrote:
>>>>> On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
>>>>>> This series adds IOMMU support to Host1x and TegraDRM
>>>>>> and adds support for the VIC (Video Image Compositor)
>>>>>> host1x client. The series is available as a git repository at
>>>>>> git://github.com/cyndis/linux.git; branch vic-2.
>>>>>>
>>>>>> A userspace test case for VIC can be found at
>>>>>> https://github.com/cyndis/drm/tree/work/tegra.
>>>>>> The testcase is in tests/tegra and is called submit_vic.
>>>>>> The testcase/TRM include full headers and documentation
>>>>>> to program the unit. The unit by itself, however, does not
>>>>>> readily map to existing userspace library interfaces, so
>>>>>> implementations for those are not provided.
>>>>>
>>>>> Afaik libva has an entire pile of post-processing support. Pretty sure
>>>>> other video transcode libraries have similar interfaces, so should all be
>>>>> possible to implement this.
>>>>
>>>> We don't have any actual video transcoding support though, so unless it's
>>>> possible to just implement a part of libva and defer the rest to some CPU
>>>> implementation, I don't see how this is useful. I suppose I could implement
>>>> a GStreamer plugin for colorspace conversion or resizing, since those are
>>>> very modular.
>>>
>>> Hm, I guess the question then is, how did that get enabled?
>>
>> What is "that"? I'm not exactly sure.
>>
>> Our architecture is such that there's the VIC that handles colorspace
>> conversion, rescaling, blitting and can do some 2d postprocessing effects as
>> well.
>>
>> Then there's the separate NVDEC that is a video bitstream decoder. There's
>> no support for that at the moment. I am working on the IP side of that.
>>
>> The video processing pipeline is then such that NVDEC is fed the bitstream;
>> NVDEC outputs a YUV picture in a specific format; VIC takes that YUV picture
>> and converts/rescales it into the desired format. Or if we are encoding
>> video, VIC takes your RGB image, converts it into a format that NVENC
>> understands, and so on.
>>
>> So with just VIC support, I could implement some simple 2D things. I don't
>> know if anyone would want to specifically use the VIC for those since
>> applications already have fast CPU algorithms. For the video pipeline using
>> VIC is nice since these units can synchronize work without CPU involvement
>> and when you're already using NVDEC or NVENC it's barely any extra effort to
>> involve VIC as well. It can also be useful in power usage sensitive
>> situations, but we aren't really fit for those situations with the upstream
>> kernel anyway :)
>
> Ah I thought the nvdec was already enabled, since for i915 that's how we
> went about things (we have a pretty much exactly matching split in the
> various video related engines). But if that's not there yet then no
> worries, all fine.

Ah, I see :)

Yes, perhaps I should have described the situation and hardware in a bit 
more detail initially - sometimes I forget I'm writing to other people 
than Thierry as well.. something to keep in mind for future patches.

>
> Since you do seem to plan to enable everything anyway, might be worth it
> to go directly with something like libva or libvdpau or whatever the cool
> thing is. libva is my recommendation since it works on non-X11 too afaik,
> but I have 0 clue. And might be worth it to check out whether you can't do
> a super-basic libva driver that only does the post processing stuff. With
> libva you can import/export images, so it might be possible even ... And
> directly doing the full video engine support instead of a one-off in
> gstreamer sounds more sensible to me.
> -Daniel
>

I'll take a look at libva.

Thanks,
Mikko.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
       [not found]               ` <1481724116.2313.39.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
@ 2016-12-14 14:39                 ` Thierry Reding
       [not found]                   ` <20161214143912.GA30280-EkSeR96xj6Pcmrwk2tT4+A@public.gmane.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Thierry Reding @ 2016-12-14 14:39 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Mikko Perttunen, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[-- Attachment #1: Type: text/plain, Size: 4063 bytes --]

On Wed, Dec 14, 2016 at 03:01:56PM +0100, Lucas Stach wrote:
> Am Mittwoch, den 14.12.2016, 14:48 +0100 schrieb Thierry Reding:
> > On Wed, Dec 14, 2016 at 12:35:31PM +0100, Lucas Stach wrote:
> > > Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen:
> > > > Add a new IO virtual memory allocation API to allow clients to
> > > > allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
> > > > required e.g. for loading client firmware when clients are attached
> > > > to the IOMMU domain.
> > > > 
> > > > The allocator allocates contiguous physical pages that are then
> > > > mapped contiguously to the IOMMU domain using the iova_domain
> > > > library provided by the kernel. Contiguous physical pages are
> > > > used so that the same allocator works also when IOMMU support
> > > > is disabled and therefore devices access physical memory directly.
> > > > 
> > > Why is this needed? If you use the DMA API for those buffers you should
> > > end up with CMA memory in the !IOMMU case and normal paged memory with
> > > IOMMU enabled. From my understanding this should match the requirements.
> > 
> > We can't currently use the DMA API for these allocations because it
> > doesn't allow (or at least didn't back when this was first implemented)
> > us to share a mapping between two devices.
> 
> Hm, maybe I'm overlooking something, but isn't this just a matter of
> allocating on one device, then constructing a SG list (dma_get_sgtable)
> from the buffer you got and use that to dma_map_sg() the buffer on the
> other device?

Yes, that would work. What I was referring to is sharing framebuffers
between multiple CRTCs. Back at the time when IOMMU support was first
added, I tried to use the DMA API. However, the problem with that was
that we would've had to effectively dma_map_sg() on every page-flip
since the buffer is imported into the DRM device, but there's no call
that would import it for each CRTC only once. So when the framebuffer
is added to a plane, you have to map it to the corresponding display
controller. And the dma_map_sg() was, if I recall correctly, on the
order of 5-10 ms, which is prohibitively expensive to do per frame.

It's also completely unnecessary because all CRTCs in a DRM device can
simply share the same IOMMU domain. I can't think of a reason why you
would want or need to use separate domains.

Back at the time this was something that the DMA API couldn't do, it
would simply assign a separate IOMMU domain per device. It's possible
that this has changed now given that many others must've run into the
same problem meanwhile.

> Maybe doing the firmware buffer allocation on host1x (with a 4GB upper
> bound) and then sharing the SG list to the devices?

That's pretty much what this API is doing. Only it's the other way
around: we don't share the SG list with other devices for mapping, we
simply reuse the same mapping across multiple devices, since they're
all in the same IOMMU domain.

> > The reason why we need these patches is that when IOMMU is enabled, then
> > the units' falcons will read memory through the IOMMU, so we must have
> > allocations for GEM buffers and the firmware go through the same
> > mechanism.
> 
> Sorry for maybe dumb questions.
> 
> Do you have any engines other than the GPU that can handle SG
> themselves?

No, I don't think so.

> Wouldn't you want the GEM objects to be backed by CMA in the !MMU
> case?

That's exactly what's happening already. If no IOMMU is available we
allocate buffer objects backing store with dma_alloc_wc().

> How are ordinary GEM objects different from the falcon firmware?

They're not. I think we could probably reuse more of the BO allocation
functions for the firmware as well. I think Mikko already agreed to look
into that. We might have to add some special cases, or split up the
helpers a little differently to avoid creating GEM objects from the
firmware buffers. We wouldn't want userspace to start mmap()'ing those.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
       [not found]                 ` <20161214141142.jmrofknylonhprbk-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  2016-12-14 14:29                   ` Mikko Perttunen
@ 2016-12-14 14:48                   ` Daniel Vetter
       [not found]                     ` <CAKMK7uGw3XwbPU798YJ-9eLbwhMWs7z_F-ro238RfFN8LFURtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-01-17 16:22                   ` Mikko Perttunen
  2 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2016-12-14 14:48 UTC (permalink / raw)
  To: Mikko Perttunen, Sean V Kelley
  Cc: Dave Airlie, dri-devel, Mikko Perttunen, Thierry Reding,
	libva-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On Wed, Dec 14, 2016 at 3:11 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Dec 14, 2016 at 03:32:16PM +0200, Mikko Perttunen wrote:
>> On 14.12.2016 15:05, Daniel Vetter wrote:
>> > On Wed, Dec 14, 2016 at 02:41:28PM +0200, Mikko Perttunen wrote:
>> > > On 14.12.2016 14:30, Daniel Vetter wrote:
>> > > > On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
>> > > > > This series adds IOMMU support to Host1x and TegraDRM
>> > > > > and adds support for the VIC (Video Image Compositor)
>> > > > > host1x client. The series is available as a git repository at
>> > > > > git://github.com/cyndis/linux.git; branch vic-2.
>> > > > >
>> > > > > A userspace test case for VIC can be found at
>> > > > > https://github.com/cyndis/drm/tree/work/tegra.
>> > > > > The testcase is in tests/tegra and is called submit_vic.
>> > > > > The testcase/TRM include full headers and documentation
>> > > > > to program the unit. The unit by itself, however, does not
>> > > > > readily map to existing userspace library interfaces, so
>> > > > > implementations for those are not provided.
>> > > >
>> > > > Afaik libva has an entire pile of post-processing support. Pretty sure
>> > > > other video transcode libraries have similar interfaces, so should all be
>> > > > possible to implement this.
>> > >
>> > > We don't have any actual video transcoding support though, so unless it's
>> > > possible to just implement a part of libva and defer the rest to some CPU
>> > > implementation, I don't see how this is useful. I suppose I could implement
>> > > a GStreamer plugin for colorspace conversion or resizing, since those are
>> > > very modular.
>> >
>> > Hm, I guess the question then is, how did that get enabled?
>>
>> What is "that"? I'm not exactly sure.
>>
>> Our architecture is such that there's the VIC that handles colorspace
>> conversion, rescaling, blitting and can do some 2d postprocessing effects as
>> well.
>>
>> Then there's the separate NVDEC that is a video bitstream decoder. There's
>> no support for that at the moment. I am working on the IP side of that.
>>
>> The video processing pipeline is then such that NVDEC is fed the bitstream;
>> NVDEC outputs a YUV picture in a specific format; VIC takes that YUV picture
>> and converts/rescales it into the desired format. Or if we are encoding
>> video, VIC takes your RGB image, converts it into a format that NVENC
>> understands, and so on.
>>
>> So with just VIC support, I could implement some simple 2D things. I don't
>> know if anyone would want to specifically use the VIC for those since
>> applications already have fast CPU algorithms. For the video pipeline using
>> VIC is nice since these units can synchronize work without CPU involvement
>> and when you're already using NVDEC or NVENC it's barely any extra effort to
>> involve VIC as well. It can also be useful in power usage sensitive
>> situations, but we aren't really fit for those situations with the upstream
>> kernel anyway :)
>
> Ah I thought the nvdec was already enabled, since for i915 that's how we
> went about things (we have a pretty much exactly matching split in the
> various video related engines). But if that's not there yet then no
> worries, all fine.
>
> Since you do seem to plan to enable everything anyway, might be worth it
> to go directly with something like libva or libvdpau or whatever the cool
> thing is. libva is my recommendation since it works on non-X11 too afaik,
> but I have 0 clue. And might be worth it to check out whether you can't do
> a super-basic libva driver that only does the post processing stuff. With
> libva you can import/export images, so it might be possible even ... And
> directly doing the full video engine support instead of a one-off in
> gstreamer sounds more sensible to me.

Silly me forgot to add the experts, i.e. Sean (current libva
maintainer) and libva mailing lists.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Libva mailing list
Libva@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libva

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
  2016-12-14 12:30   ` [PATCH v2 0/7] Host1x IOMMU support + VIC support Daniel Vetter
       [not found]     ` <20161214123004.urnvlqarcnp5rp4a-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2016-12-14 18:26     ` Emil Velikov
  1 sibling, 0 replies; 28+ messages in thread
From: Emil Velikov @ 2016-12-14 18:26 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: linux-tegra, ML dri-devel, Mikko Perttunen

On 14 December 2016 at 12:30, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
>> This series adds IOMMU support to Host1x and TegraDRM
>> and adds support for the VIC (Video Image Compositor)
>> host1x client. The series is available as a git repository at
>> git://github.com/cyndis/linux.git; branch vic-2.
>>
>> A userspace test case for VIC can be found at
>> https://github.com/cyndis/drm/tree/work/tegra.
>> The testcase is in tests/tegra and is called submit_vic.
>> The testcase/TRM include full headers and documentation
>> to program the unit. The unit by itself, however, does not
>> readily map to existing userspace library interfaces, so
>> implementations for those are not provided.
>
> Afaik libva has an entire pile of post-processing support. Pretty sure
> other video transcode libraries have similar interfaces, so should all be
> possible to implement this.
>
> Until that exists I really think that the VIC part (and only that, since
> tk1/tx1 in general seems to work with nouveau and all that) should stay
> out of tree.
Related:
With render-only about to be merged soon{tm) and Alexandre already
looking at tegra version of it, one should be able to wire this
functionality and use it in mesa/gallium.

Note: this is not an objection against the work - be that about from
quality or inclusion POV.

Emil
P.S. Goes to check on the renderonly topic.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
       [not found]                     ` <CAKMK7uGw3XwbPU798YJ-9eLbwhMWs7z_F-ro238RfFN8LFURtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-12-15  2:52                       ` Sean V Kelley
  0 siblings, 0 replies; 28+ messages in thread
From: Sean V Kelley @ 2016-12-15  2:52 UTC (permalink / raw)
  To: Daniel Vetter (daniel-/w4YWyX8dFk@public.gmane.org)
  Cc: Mikko Perttunen, Dave Airlie, DRI mailing list, Mikko Perttunen,
	Thierry Reding, libva-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

Hi,

Comments below:

> On Dec 14, 2016, at 6:48 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> On Wed, Dec 14, 2016 at 3:11 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Wed, Dec 14, 2016 at 03:32:16PM +0200, Mikko Perttunen wrote:
>>> On 14.12.2016 15:05, Daniel Vetter wrote:
>>>> On Wed, Dec 14, 2016 at 02:41:28PM +0200, Mikko Perttunen wrote:
>>>>> On 14.12.2016 14:30, Daniel Vetter wrote:
>>>>>> On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
>>>>>>> This series adds IOMMU support to Host1x and TegraDRM
>>>>>>> and adds support for the VIC (Video Image Compositor)
>>>>>>> host1x client. The series is available as a git repository at
>>>>>>> git://github.com/cyndis/linux.git; branch vic-2.
>>>>>>> 
>>>>>>> A userspace test case for VIC can be found at
>>>>>>> https://github.com/cyndis/drm/tree/work/tegra.
>>>>>>> The testcase is in tests/tegra and is called submit_vic.
>>>>>>> The testcase/TRM include full headers and documentation
>>>>>>> to program the unit. The unit by itself, however, does not
>>>>>>> readily map to existing userspace library interfaces, so
>>>>>>> implementations for those are not provided.
>>>>>> 
>>>>>> Afaik libva has an entire pile of post-processing support. Pretty sure
>>>>>> other video transcode libraries have similar interfaces, so should all be
>>>>>> possible to implement this.
>>>>> 
>>>>> We don't have any actual video transcoding support though, so unless it's
>>>>> possible to just implement a part of libva and defer the rest to some CPU
>>>>> implementation, I don't see how this is useful. I suppose I could implement
>>>>> a GStreamer plugin for colorspace conversion or resizing, since those are
>>>>> very modular.
>>>> 
>>>> Hm, I guess the question then is, how did that get enabled?
>>> 
>>> What is "that"? I'm not exactly sure.
>>> 
>>> Our architecture is such that there's the VIC that handles colorspace
>>> conversion, rescaling, blitting and can do some 2d postprocessing effects as
>>> well.
>>> 
>>> Then there's the separate NVDEC that is a video bitstream decoder. There's
>>> no support for that at the moment. I am working on the IP side of that.
>>> 
>>> The video processing pipeline is then such that NVDEC is fed the bitstream;
>>> NVDEC outputs a YUV picture in a specific format; VIC takes that YUV picture
>>> and converts/rescales it into the desired format. Or if we are encoding
>>> video, VIC takes your RGB image, converts it into a format that NVENC
>>> understands, and so on.
>>> 
>>> So with just VIC support, I could implement some simple 2D things. I don't
>>> know if anyone would want to specifically use the VIC for those since
>>> applications already have fast CPU algorithms. For the video pipeline using
>>> VIC is nice since these units can synchronize work without CPU involvement
>>> and when you're already using NVDEC or NVENC it's barely any extra effort to
>>> involve VIC as well. It can also be useful in power usage sensitive
>>> situations, but we aren't really fit for those situations with the upstream
>>> kernel anyway :)
>> 
>> Ah I thought the nvdec was already enabled, since for i915 that's how we
>> went about things (we have a pretty much exactly matching split in the
>> various video related engines). But if that's not there yet then no
>> worries, all fine.
>> 
>> Since you do seem to plan to enable everything anyway, might be worth it
>> to go directly with something like libva or libvdpau or whatever the cool
>> thing is. libva is my recommendation since it works on non-X11 too afaik,
>> but I have 0 clue. And might be worth it to check out whether you can't do
>> a super-basic libva driver that only does the post processing stuff. With
>> libva you can import/export images, so it might be possible even ... And
>> directly doing the full video engine support instead of a one-off in
>> gstreamer sounds more sensible to me.
> 
> Silly me forgot to add the experts, i.e. Sean (current libva
> maintainer) and libva mailing lists.
> -Daniel


From a purely VPP standpoint libva is something that would give VIC a readily available API in user space.  You don’t have to have a transcoding use case to leverage VPP with a surface with VA-API.  On top of that we do support DMA but import and export to provide flexibility in sharing the memory.  It’s entirely possible to support a subset of entry points (say VPP only) for VA-API through Libva and share that information in a query on the config say for your applications.  We do that all the time with various backends having feature subsets such as our hybrid Intel driver, or the IMG based PVR driver.  I believe there is also a Rock-Chip project experimenting with Libva for their media driver.

I’ll take a look at your link.

Sean

Intel, Corp.




> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Libva mailing list
> Libva@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/libva

_______________________________________________
Libva mailing list
Libva@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libva

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
       [not found]                 ` <20161214141142.jmrofknylonhprbk-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  2016-12-14 14:29                   ` Mikko Perttunen
  2016-12-14 14:48                   ` Daniel Vetter
@ 2017-01-17 16:22                   ` Mikko Perttunen
  2017-01-23  7:42                     ` Daniel Vetter
  2 siblings, 1 reply; 28+ messages in thread
From: Mikko Perttunen @ 2017-01-17 16:22 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, airlied-cv59FeDIM0c

On 12/14/2016 04:11 PM, Daniel Vetter wrote:
> On Wed, Dec 14, 2016 at 03:32:16PM +0200, Mikko Perttunen wrote:
>> On 14.12.2016 15:05, Daniel Vetter wrote:
>>> On Wed, Dec 14, 2016 at 02:41:28PM +0200, Mikko Perttunen wrote:
>>>> On 14.12.2016 14:30, Daniel Vetter wrote:
>>>>> On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
>>>>>> This series adds IOMMU support to Host1x and TegraDRM
>>>>>> and adds support for the VIC (Video Image Compositor)
>>>>>> host1x client. The series is available as a git repository at
>>>>>> git://github.com/cyndis/linux.git; branch vic-2.
>>>>>>
>>>>>> A userspace test case for VIC can be found at
>>>>>> https://github.com/cyndis/drm/tree/work/tegra.
>>>>>> The testcase is in tests/tegra and is called submit_vic.
>>>>>> The testcase/TRM include full headers and documentation
>>>>>> to program the unit. The unit by itself, however, does not
>>>>>> readily map to existing userspace library interfaces, so
>>>>>> implementations for those are not provided.
>>>>>
>>>>> Afaik libva has an entire pile of post-processing support. Pretty sure
>>>>> other video transcode libraries have similar interfaces, so should all be
>>>>> possible to implement this.
>>>>
>>>> We don't have any actual video transcoding support though, so unless it's
>>>> possible to just implement a part of libva and defer the rest to some CPU
>>>> implementation, I don't see how this is useful. I suppose I could implement
>>>> a GStreamer plugin for colorspace conversion or resizing, since those are
>>>> very modular.
>>>
>>> Hm, I guess the question then is, how did that get enabled?
>>
>> What is "that"? I'm not exactly sure.
>>
>> Our architecture is such that there's the VIC that handles colorspace
>> conversion, rescaling, blitting and can do some 2d postprocessing effects as
>> well.
>>
>> Then there's the separate NVDEC that is a video bitstream decoder. There's
>> no support for that at the moment. I am working on the IP side of that.
>>
>> The video processing pipeline is then such that NVDEC is fed the bitstream;
>> NVDEC outputs a YUV picture in a specific format; VIC takes that YUV picture
>> and converts/rescales it into the desired format. Or if we are encoding
>> video, VIC takes your RGB image, converts it into a format that NVENC
>> understands, and so on.
>>
>> So with just VIC support, I could implement some simple 2D things. I don't
>> know if anyone would want to specifically use the VIC for those since
>> applications already have fast CPU algorithms. For the video pipeline using
>> VIC is nice since these units can synchronize work without CPU involvement
>> and when you're already using NVDEC or NVENC it's barely any extra effort to
>> involve VIC as well. It can also be useful in power usage sensitive
>> situations, but we aren't really fit for those situations with the upstream
>> kernel anyway :)
>
> Ah I thought the nvdec was already enabled, since for i915 that's how we
> went about things (we have a pretty much exactly matching split in the
> various video related engines). But if that's not there yet then no
> worries, all fine.
>
> Since you do seem to plan to enable everything anyway, might be worth it
> to go directly with something like libva or libvdpau or whatever the cool
> thing is. libva is my recommendation since it works on non-X11 too afaik,
> but I have 0 clue. And might be worth it to check out whether you can't do
> a super-basic libva driver that only does the post processing stuff. With
> libva you can import/export images, so it might be possible even ... And
> directly doing the full video engine support instead of a one-off in
> gstreamer sounds more sensible to me.
> -Daniel
>

It took a while, but I now have a libva backend to go with this kernel 
driver: https://github.com/cyndis/vaapi-tegra-driver.

It is far from complete, but the libva putsurface testcase runs and its 
output looks correct when compared to the intel backend. Would this be 
suitable or should something more be implemented before merging the 
kernel driver?

Thanks,
Mikko.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/7] Host1x IOMMU support + VIC support
  2017-01-17 16:22                   ` Mikko Perttunen
@ 2017-01-23  7:42                     ` Daniel Vetter
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2017-01-23  7:42 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: dri-devel, Mikko Perttunen, linux-tegra

On Tue, Jan 17, 2017 at 06:22:36PM +0200, Mikko Perttunen wrote:
> On 12/14/2016 04:11 PM, Daniel Vetter wrote:
> > On Wed, Dec 14, 2016 at 03:32:16PM +0200, Mikko Perttunen wrote:
> > > On 14.12.2016 15:05, Daniel Vetter wrote:
> > > > On Wed, Dec 14, 2016 at 02:41:28PM +0200, Mikko Perttunen wrote:
> > > > > On 14.12.2016 14:30, Daniel Vetter wrote:
> > > > > > On Wed, Dec 14, 2016 at 01:16:10PM +0200, Mikko Perttunen wrote:
> > > > > > > This series adds IOMMU support to Host1x and TegraDRM
> > > > > > > and adds support for the VIC (Video Image Compositor)
> > > > > > > host1x client. The series is available as a git repository at
> > > > > > > git://github.com/cyndis/linux.git; branch vic-2.
> > > > > > > 
> > > > > > > A userspace test case for VIC can be found at
> > > > > > > https://github.com/cyndis/drm/tree/work/tegra.
> > > > > > > The testcase is in tests/tegra and is called submit_vic.
> > > > > > > The testcase/TRM include full headers and documentation
> > > > > > > to program the unit. The unit by itself, however, does not
> > > > > > > readily map to existing userspace library interfaces, so
> > > > > > > implementations for those are not provided.
> > > > > > 
> > > > > > Afaik libva has an entire pile of post-processing support. Pretty sure
> > > > > > other video transcode libraries have similar interfaces, so should all be
> > > > > > possible to implement this.
> > > > > 
> > > > > We don't have any actual video transcoding support though, so unless it's
> > > > > possible to just implement a part of libva and defer the rest to some CPU
> > > > > implementation, I don't see how this is useful. I suppose I could implement
> > > > > a GStreamer plugin for colorspace conversion or resizing, since those are
> > > > > very modular.
> > > > 
> > > > Hm, I guess the question then is, how did that get enabled?
> > > 
> > > What is "that"? I'm not exactly sure.
> > > 
> > > Our architecture is such that there's the VIC that handles colorspace
> > > conversion, rescaling, blitting and can do some 2d postprocessing effects as
> > > well.
> > > 
> > > Then there's the separate NVDEC that is a video bitstream decoder. There's
> > > no support for that at the moment. I am working on the IP side of that.
> > > 
> > > The video processing pipeline is then such that NVDEC is fed the bitstream;
> > > NVDEC outputs a YUV picture in a specific format; VIC takes that YUV picture
> > > and converts/rescales it into the desired format. Or if we are encoding
> > > video, VIC takes your RGB image, converts it into a format that NVENC
> > > understands, and so on.
> > > 
> > > So with just VIC support, I could implement some simple 2D things. I don't
> > > know if anyone would want to specifically use the VIC for those since
> > > applications already have fast CPU algorithms. For the video pipeline using
> > > VIC is nice since these units can synchronize work without CPU involvement
> > > and when you're already using NVDEC or NVENC it's barely any extra effort to
> > > involve VIC as well. It can also be useful in power usage sensitive
> > > situations, but we aren't really fit for those situations with the upstream
> > > kernel anyway :)
> > 
> > Ah I thought the nvdec was already enabled, since for i915 that's how we
> > went about things (we have a pretty much exactly matching split in the
> > various video related engines). But if that's not there yet then no
> > worries, all fine.
> > 
> > Since you do seem to plan to enable everything anyway, might be worth it
> > to go directly with something like libva or libvdpau or whatever the cool
> > thing is. libva is my recommendation since it works on non-X11 too afaik,
> > but I have 0 clue. And might be worth it to check out whether you can't do
> > a super-basic libva driver that only does the post processing stuff. With
> > libva you can import/export images, so it might be possible even ... And
> > directly doing the full video engine support instead of a one-off in
> > gstreamer sounds more sensible to me.
> > -Daniel
> > 
> 
> It took a while, but I now have a libva backend to go with this kernel
> driver: https://github.com/cyndis/vaapi-tegra-driver.
> 
> It is far from complete, but the libva putsurface testcase runs and its
> output looks correct when compared to the intel backend. Would this be
> suitable or should something more be implemented before merging the kernel
> driver?

Seems all reasonable to move ahead with this. Thanks for doing it.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API
       [not found]                   ` <20161214143912.GA30280-EkSeR96xj6Pcmrwk2tT4+A@public.gmane.org>
@ 2017-01-27 22:45                     ` Mikko Perttunen
  0 siblings, 0 replies; 28+ messages in thread
From: Mikko Perttunen @ 2017-01-27 22:45 UTC (permalink / raw)
  To: Thierry Reding, Lucas Stach
  Cc: Mikko Perttunen, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

So with the userspace question resolved, only this question of memory 
allocation remains. I see the following options:

- Keep the current __get_free_pages-based allocation. This means that 
firmware loading may fail when memory is fragmented. The function can be 
augmented by adding vmalloc support when IOMMU is enabled, which would 
eliminate those failures.

- Move to doing CMA allocation using the DMA API. This way allocations 
would succeed more likely at least if the user has enough CMA memory.
In this case we'll need to pass an additional struct device * to the 
alloc/free functions. This also requires that the CMA memory is always 
allocated from the lower 4GB physical address range, as firmware can 
only be loaded from there (other operations support 34 bits); I don't 
know how the default CMA region is allocated, maybe I should find out.

This option would also require adding a separate path for when the IOMMU 
is enabled and CMA not available - AIUI, on Tegra, the DMA API always 
allocates from CMA, whether IOMMU is enabled or not. (Or, we could make 
CMA mandatory.)

For me, either option works. Also, sorry if I made some stupid mistake, 
my brain is currently too tired to handle multiple address spaces at the 
same time :)

Cheers,
Mikko.

On 12/14/2016 04:39 PM, Thierry Reding wrote:
> On Wed, Dec 14, 2016 at 03:01:56PM +0100, Lucas Stach wrote:
>> Am Mittwoch, den 14.12.2016, 14:48 +0100 schrieb Thierry Reding:
>>> On Wed, Dec 14, 2016 at 12:35:31PM +0100, Lucas Stach wrote:
>>>> Am Mittwoch, den 14.12.2016, 13:16 +0200 schrieb Mikko Perttunen:
>>>>> Add a new IO virtual memory allocation API to allow clients to
>>>>> allocate non-GEM memory in the Tegra DRM IOMMU domain. This is
>>>>> required e.g. for loading client firmware when clients are attached
>>>>> to the IOMMU domain.
>>>>>
>>>>> The allocator allocates contiguous physical pages that are then
>>>>> mapped contiguously to the IOMMU domain using the iova_domain
>>>>> library provided by the kernel. Contiguous physical pages are
>>>>> used so that the same allocator works also when IOMMU support
>>>>> is disabled and therefore devices access physical memory directly.
>>>>>
>>>> Why is this needed? If you use the DMA API for those buffers you should
>>>> end up with CMA memory in the !IOMMU case and normal paged memory with
>>>> IOMMU enabled. From my understanding this should match the requirements.
>>>
>>> We can't currently use the DMA API for these allocations because it
>>> doesn't allow (or at least didn't back when this was first implemented)
>>> us to share a mapping between two devices.
>>
>> Hm, maybe I'm overlooking something, but isn't this just a matter of
>> allocating on one device, then constructing a SG list (dma_get_sgtable)
>> from the buffer you got and use that to dma_map_sg() the buffer on the
>> other device?
>
> Yes, that would work. What I was referring to is sharing framebuffers
> between multiple CRTCs. Back at the time when IOMMU support was first
> added, I tried to use the DMA API. However, the problem with that was
> that we would've had to effectively dma_map_sg() on every page-flip
> since the buffer is imported into the DRM device, but there's no call
> that would import it for each CRTC only once. So when the framebuffer
> is added to a plane, you have to map it to the corresponding display
> controller. And the dma_map_sg() was, if I recall correctly, on the
> order of 5-10 ms, which is prohibitively expensive to do per frame.
>
> It's also completely unnecessary because all CRTCs in a DRM device can
> simply share the same IOMMU domain. I can't think of a reason why you
> would want or need to use separate domains.
>
> Back at the time this was something that the DMA API couldn't do, it
> would simply assign a separate IOMMU domain per device. It's possible
> that this has changed now given that many others must've run into the
> same problem meanwhile.
>
>> Maybe doing the firmware buffer allocation on host1x (with a 4GB upper
>> bound) and then sharing the SG list to the devices?
>
> That's pretty much what this API is doing. Only it's the other way
> around: we don't share the SG list with other devices for mapping, we
> simply reuse the same mapping across multiple devices, since they're
> all in the same IOMMU domain.
>
>>> The reason why we need these patches is that when IOMMU is enabled, then
>>> the units' falcons will read memory through the IOMMU, so we must have
>>> allocations for GEM buffers and the firmware go through the same
>>> mechanism.
>>
>> Sorry for maybe dumb questions.
>>
>> Do you have any engines other than the GPU that can handle SG
>> themselves?
>
> No, I don't think so.
>
>> Wouldn't you want the GEM objects to be backed by CMA in the !MMU
>> case?
>
> That's exactly what's happening already. If no IOMMU is available we
> allocate buffer objects backing store with dma_alloc_wc().
>
>> How are ordinary GEM objects different from the falcon firmware?
>
> They're not. I think we could probably reuse more of the BO allocation
> functions for the firmware as well. I think Mikko already agreed to look
> into that. We might have to add some special cases, or split up the
> helpers a little differently to avoid creating GEM objects from the
> firmware buffers. We wouldn't want userspace to start mmap()'ing those.
>
> Thierry
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-01-27 22:45 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-14 11:16 [PATCH v2 0/7] Host1x IOMMU support + VIC support Mikko Perttunen
     [not found] ` <20161214111617.24548-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2016-12-14 11:16   ` [PATCH v2 1/7] drm/tegra: Add Tegra DRM allocation API Mikko Perttunen
2016-12-14 11:35     ` Lucas Stach
     [not found]       ` <1481715331.2313.28.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2016-12-14 12:36         ` Mikko Perttunen
2016-12-14 12:56           ` Lucas Stach
     [not found]             ` <1481720206.2313.32.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2016-12-14 13:17               ` Mikko Perttunen
2016-12-14 13:48         ` Thierry Reding
     [not found]           ` <20161214134833.GA29981-EkSeR96xj6Pcmrwk2tT4+A@public.gmane.org>
2016-12-14 14:01             ` Lucas Stach
     [not found]               ` <1481724116.2313.39.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2016-12-14 14:39                 ` Thierry Reding
     [not found]                   ` <20161214143912.GA30280-EkSeR96xj6Pcmrwk2tT4+A@public.gmane.org>
2017-01-27 22:45                     ` Mikko Perttunen
2016-12-14 11:16   ` [PATCH v2 2/7] drm/tegra: Add falcon helper library Mikko Perttunen
2016-12-14 11:16   ` [PATCH v2 3/7] drm/tegra: Add VIC support Mikko Perttunen
2016-12-14 11:16   ` [PATCH v2 4/7] gpu: host1x: Add IOMMU support Mikko Perttunen
2016-12-14 11:16   ` [PATCH v2 5/7] dt-bindings: Add bindings for the Tegra VIC Mikko Perttunen
2016-12-14 11:16   ` [PATCH v2 7/7] arm64: tegra: Enable IOMMU for Host1x on Tegra210 Mikko Perttunen
2016-12-14 12:30   ` [PATCH v2 0/7] Host1x IOMMU support + VIC support Daniel Vetter
     [not found]     ` <20161214123004.urnvlqarcnp5rp4a-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2016-12-14 12:41       ` Mikko Perttunen
2016-12-14 13:05         ` Daniel Vetter
     [not found]           ` <20161214130545.gfpkosyixr5lfkkf-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2016-12-14 13:32             ` Mikko Perttunen
2016-12-14 14:11               ` Daniel Vetter
     [not found]                 ` <20161214141142.jmrofknylonhprbk-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2016-12-14 14:29                   ` Mikko Perttunen
2016-12-14 14:48                   ` Daniel Vetter
     [not found]                     ` <CAKMK7uGw3XwbPU798YJ-9eLbwhMWs7z_F-ro238RfFN8LFURtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-15  2:52                       ` Sean V Kelley
2017-01-17 16:22                   ` Mikko Perttunen
2017-01-23  7:42                     ` Daniel Vetter
2016-12-14 14:18       ` Thierry Reding
2016-12-14 18:26     ` Emil Velikov
2016-12-14 11:16 ` [PATCH v2 6/7] arm64: tegra: Enable VIC on T210 Mikko Perttunen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.