All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] arm64: IOMMU-backed DMA mapping
@ 2015-07-31 17:18 ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-07-31 17:18 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

Hi all,

Here's an update following Catalin's feedback on v4[1].

Changes this round:
- Rebased onto linux-next
  - IOVA alignment fix applied already
  - iommu_iova_cache_init() is now iova_cache_get()
- Tidied up iommu_dma_alloc()
  - Simplified pgprot handling
  - Removed redundant memset
  - Skip coherent page-flushing in a simpler way
- Spotted a bug in iommu_dma_init_domain() where the checks for
  reinitialising an existing domain were backwards.

If it is going to be down to me to tackle all the driver fixes and
conversion of arch/arm dma_ops, I'd still much rather have this
code merged first as a stable base to work with (and un-block arm64
in the meantime). Have we decided yet whether this should go via the
IOMMU tree or the arm64 tree?

Thanks,
Robin.

[1]:http://thread.gmane.org/gmane.linux.kernel.iommu/10181

Robin Murphy (3):
  iommu: Implement common IOMMU ops for DMA mapping
  arm64: Add IOMMU dma_ops
  arm64: Hook up IOMMU dma_ops

 arch/arm64/Kconfig                   |   1 +
 arch/arm64/include/asm/dma-mapping.h |  15 +-
 arch/arm64/mm/dma-mapping.c          | 449 +++++++++++++++++++++++++++++
 drivers/iommu/Kconfig                |   7 +
 drivers/iommu/Makefile               |   1 +
 drivers/iommu/dma-iommu.c            | 534 +++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h            |  84 ++++++
 include/linux/iommu.h                |   1 +
 8 files changed, 1084 insertions(+), 8 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 0/3] arm64: IOMMU-backed DMA mapping
@ 2015-07-31 17:18 ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-07-31 17:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

Here's an update following Catalin's feedback on v4[1].

Changes this round:
- Rebased onto linux-next
  - IOVA alignment fix applied already
  - iommu_iova_cache_init() is now iova_cache_get()
- Tidied up iommu_dma_alloc()
  - Simplified pgprot handling
  - Removed redundant memset
  - Skip coherent page-flushing in a simpler way
- Spotted a bug in iommu_dma_init_domain() where the checks for
  reinitialising an existing domain were backwards.

If it is going to be down to me to tackle all the driver fixes and
conversion of arch/arm dma_ops, I'd still much rather have this
code merged first as a stable base to work with (and un-block arm64
in the meantime). Have we decided yet whether this should go via the
IOMMU tree or the arm64 tree?

Thanks,
Robin.

[1]:http://thread.gmane.org/gmane.linux.kernel.iommu/10181

Robin Murphy (3):
  iommu: Implement common IOMMU ops for DMA mapping
  arm64: Add IOMMU dma_ops
  arm64: Hook up IOMMU dma_ops

 arch/arm64/Kconfig                   |   1 +
 arch/arm64/include/asm/dma-mapping.h |  15 +-
 arch/arm64/mm/dma-mapping.c          | 449 +++++++++++++++++++++++++++++
 drivers/iommu/Kconfig                |   7 +
 drivers/iommu/Makefile               |   1 +
 drivers/iommu/dma-iommu.c            | 534 +++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h            |  84 ++++++
 include/linux/iommu.h                |   1 +
 8 files changed, 1084 insertions(+), 8 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-31 17:18 ` Robin Murphy
@ 2015-07-31 17:18     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-07-31 17:18 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig     |   7 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/dma-iommu.c | 534 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  84 ++++++++
 include/linux/iommu.h     |   1 +
 5 files changed, 627 insertions(+)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8a1bc38..4996dc3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -48,6 +48,13 @@ config OF_IOMMU
        def_bool y
        depends on OF && IOMMU_API
 
+# IOMMU-agnostic DMA-mapping layer
+config IOMMU_DMA
+	bool
+	depends on NEED_SG_DMA_LENGTH
+	select IOMMU_API
+	select IOMMU_IOVA
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PPC32
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index dc6f511..45efa2a 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
new file mode 100644
index 0000000..f34fd46
--- /dev/null
+++ b/drivers/iommu/dma-iommu.c
@@ -0,0 +1,534 @@
+/*
+ * A fairly generic DMA-API to IOMMU-API glue layer.
+ *
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * based in part on arch/arm/mm/dma-mapping.c:
+ * Copyright (C) 2000-2004 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-iommu.h>
+#include <linux/huge_mm.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+
+int iommu_dma_init(void)
+{
+	return iova_cache_get();
+}
+
+/**
+ * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
+ * @domain: IOMMU domain to prepare for DMA-API usage
+ *
+ * IOMMU drivers should normally call this from their domain_alloc
+ * callback when domain->type == IOMMU_DOMAIN_DMA.
+ */
+int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad;
+
+	if (domain->dma_api_cookie)
+		return -EEXIST;
+
+	iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
+	domain->dma_api_cookie = iovad;
+
+	return iovad ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_put_dma_cookie - Release a domain's DMA mapping resources
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ *
+ * IOMMU drivers should normally call this from their domain_free callback.
+ */
+void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+
+	if (!iovad)
+		return;
+
+	put_iova_domain(iovad);
+	kfree(iovad);
+	domain->dma_api_cookie = NULL;
+}
+EXPORT_SYMBOL(iommu_put_dma_cookie);
+
+/**
+ * iommu_dma_init_domain - Initialise a DMA mapping domain
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @base: IOVA at which the mappable address space starts
+ * @size: Size of IOVA space
+ *
+ * @base and @size should be exact multiples of IOMMU page granularity to
+ * avoid rounding surprises. If necessary, we reserve the page at address 0
+ * to ensure it is an invalid IOVA.
+ */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long order, base_pfn, end_pfn;
+
+	if (!iovad)
+		return -ENODEV;
+
+	/* Use the smallest supported page size for IOVA granularity */
+	order = __ffs(domain->ops->pgsize_bitmap);
+	base_pfn = max_t(unsigned long, 1, base >> order);
+	end_pfn = (base + size - 1) >> order;
+
+	/* Check the domain allows at least some access to the device... */
+	if (domain->geometry.force_aperture) {
+		if (base > domain->geometry.aperture_end ||
+		    base + size <= domain->geometry.aperture_start) {
+			pr_warn("specified DMA range outside IOMMU capability\n");
+			return -EFAULT;
+		}
+		/* ...then finally give it a kicking to make sure it fits */
+		base_pfn = max_t(unsigned long, base_pfn,
+				domain->geometry.aperture_start >> order);
+		end_pfn = min_t(unsigned long, end_pfn,
+				domain->geometry.aperture_end >> order);
+	}
+
+	/* All we can safely do with an existing domain is enlarge it */
+	if (iovad->start_pfn) {
+		if (1UL << order != iovad->granule ||
+		    base_pfn != iovad->start_pfn ||
+		    end_pfn < iovad->dma_32bit_pfn) {
+			pr_warn("Incompatible range for DMA domain\n");
+			return -EFAULT;
+		}
+		iovad->dma_32bit_pfn = end_pfn;
+	} else {
+		init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(iommu_dma_init_domain);
+
+/**
+ * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * @dir: Direction of DMA transfer
+ * @coherent: Is the DMA master cache-coherent?
+ *
+ * Return: corresponding IOMMU API page protection flags
+ */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+{
+	int prot = coherent ? IOMMU_CACHE : 0;
+
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		return prot | IOMMU_READ | IOMMU_WRITE;
+	case DMA_TO_DEVICE:
+		return prot | IOMMU_READ;
+	case DMA_FROM_DEVICE:
+		return prot | IOMMU_WRITE;
+	default:
+		return 0;
+	}
+}
+
+static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
+		dma_addr_t dma_limit)
+{
+	unsigned long shift = iova_shift(iovad);
+	unsigned long length = iova_align(iovad, size) >> shift;
+
+	/*
+	 * Enforce size-alignment to be safe - there should probably be
+	 * an attribute to control this per-device, or at least per-domain...
+	 */
+	return alloc_iova(iovad, length, dma_limit >> shift, true);
+}
+
+/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
+static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long shift = iova_shift(iovad);
+	unsigned long pfn = dma_addr >> shift;
+	struct iova *iova = find_iova(iovad, pfn);
+	size_t size = iova_size(iova) << shift;
+
+	/* ...and if we can't, then something is horribly, horribly wrong */
+	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
+	__free_iova(iovad, iova);
+}
+
+static void __iommu_dma_free_pages(struct page **pages, int count)
+{
+	while (count--)
+		__free_page(pages[count]);
+	kvfree(pages);
+}
+
+static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
+{
+	struct page **pages;
+	unsigned int i = 0, array_size = count * sizeof(*pages);
+
+	if (array_size <= PAGE_SIZE)
+		pages = kzalloc(array_size, GFP_KERNEL);
+	else
+		pages = vzalloc(array_size);
+	if (!pages)
+		return NULL;
+
+	/* IOMMU can map any pages, so himem can also be used here */
+	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+
+	while (count) {
+		struct page *page = NULL;
+		int j, order = __fls(count);
+
+		/*
+		 * Higher-order allocations are a convenience rather
+		 * than a necessity, hence using __GFP_NORETRY until
+		 * falling back to single-page allocations.
+		 */
+		for (order = min(order, MAX_ORDER); order > 0; order--) {
+			page = alloc_pages(gfp | __GFP_NORETRY, order);
+			if (!page)
+				continue;
+			if (PageCompound(page)) {
+				if (!split_huge_page(page))
+					break;
+				__free_pages(page, order);
+			} else {
+				split_page(page, order);
+				break;
+			}
+		}
+		if (!page)
+			page = alloc_page(gfp);
+		if (!page) {
+			__iommu_dma_free_pages(pages, i);
+			return NULL;
+		}
+		j = 1 << order;
+		count -= j;
+		while (j--)
+			pages[i++] = page++;
+	}
+	return pages;
+}
+
+/**
+ * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc()
+ * @dev: Device which owns this buffer
+ * @pages: Array of buffer pages as returned by iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @handle: DMA address of buffer
+ *
+ * Frees both the pages associated with the buffer, and the array
+ * describing them
+ */
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle);
+	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+	*handle = DMA_ERROR_CODE;
+}
+
+/**
+ * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
+ * @dev: Device to allocate memory for. Must be a real device
+ *	 attached to an iommu_dma_domain
+ * @size: Size of buffer in bytes
+ * @gfp: Allocation flags
+ * @prot: IOMMU mapping flags
+ * @handle: Out argument for allocated DMA handle
+ * @flush_page: Arch callback which must ensure PAGE_SIZE bytes from the
+ *		given VA/PA are visible to the given non-coherent device.
+ *
+ * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
+ * but an IOMMU which supports smaller pages might not map the whole thing.
+ *
+ * Return: Array of struct page pointers describing the buffer,
+ *	   or NULL on failure.
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size,
+		gfp_t gfp, int prot, dma_addr_t *handle,
+		void (*flush_page)(struct device *, const void *, phys_addr_t))
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct page **pages;
+	struct sg_table sgt;
+	dma_addr_t dma_addr;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	*handle = DMA_ERROR_CODE;
+
+	pages = __iommu_dma_alloc_pages(count, gfp);
+	if (!pages)
+		return NULL;
+
+	iova = __alloc_iova(iovad, size, dev->coherent_dma_mask);
+	if (!iova)
+		goto out_free_pages;
+
+	size = iova_align(iovad, size);
+	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
+		goto out_free_iova;
+
+	if (!(prot & IOMMU_CACHE)) {
+		struct sg_mapping_iter miter;
+		/*
+		 * The CPU-centric flushing implied by SG_MITER_TO_SG isn't
+		 * sufficient here, so skip it by using the "wrong" direction.
+		 */
+		sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG);
+		while (sg_miter_next(&miter))
+			flush_page(dev, miter.addr, page_to_phys(miter.page));
+		sg_miter_stop(&miter);
+	}
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sgt.sgl, sgt.orig_nents, prot)
+			< size)
+		goto out_free_sg;
+
+	*handle = dma_addr;
+	sg_free_table(&sgt);
+	return pages;
+
+out_free_sg:
+	sg_free_table(&sgt);
+out_free_iova:
+	__free_iova(iovad, iova);
+out_free_pages:
+	__iommu_dma_free_pages(pages, count);
+	return NULL;
+}
+
+/**
+ * iommu_dma_mmap - Map a buffer into provided user VMA
+ * @pages: Array representing buffer from iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @vma: VMA describing requested userspace mapping
+ *
+ * Maps the pages of the buffer in @pages into @vma. The caller is responsible
+ * for verifying the correct size and protection of @vma beforehand.
+ */
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
+{
+	unsigned long uaddr = vma->vm_start;
+	unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	int ret = -ENXIO;
+
+	for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
+		ret = vm_insert_page(vma, uaddr, pages[i]);
+		if (ret)
+			break;
+		uaddr += PAGE_SIZE;
+	}
+	return ret;
+}
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot)
+{
+	dma_addr_t dma_addr;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	phys_addr_t phys = page_to_phys(page) + offset;
+	size_t iova_off = iova_offset(iovad, phys);
+	size_t len = iova_align(iovad, size + iova_off);
+	struct iova *iova = __alloc_iova(iovad, len, dma_get_mask(dev));
+
+	if (!iova)
+		return DMA_ERROR_CODE;
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map(domain, dma_addr, phys - iova_off, len, prot)) {
+		__free_iova(iovad, iova);
+		return DMA_ERROR_CODE;
+	}
+	return dma_addr + iova_off;
+}
+
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle);
+}
+
+/*
+ * Go and look at iommu_dma_map_sg first; It's OK, I'll wait...
+ *
+ * ...right, now that the scatterlist pages are all contiguous from the
+ * device's viewpoint, we can collapse any buffer segments which run
+ * together (subject to the device's segment limitations), filling in
+ * the DMA fields at the same time as we run through the list.
+ */
+static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
+		dma_addr_t dma_addr)
+{
+	struct scatterlist *s, *seg = sg;
+	unsigned long seg_mask = dma_get_seg_boundary(dev);
+	unsigned int max_len = dma_get_max_seg_size(dev);
+	unsigned int seg_len = 0, seg_dma = 0;
+	int i, count = 1;
+
+	for_each_sg(sg, s, nents, i) {
+		/* Un-swizzling the fields here, hence the naming mismatch */
+		unsigned int s_offset = sg_dma_address(s);
+		unsigned int s_length = sg_dma_len(s);
+		unsigned int s_dma_len = s->length;
+
+		s->offset = s_offset;
+		s->length = s_length;
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+
+		/*
+		 * This ensures any concatenation we do doesn't exceed the
+		 * dma_parms limits, but it also won't fail if any segments
+		 * were out of spec to begin with - they'll just stay as-is.
+		 */
+		if (seg_len && (seg_dma + seg_len == dma_addr + s_offset) &&
+		    (seg_len + s_dma_len <= max_len) &&
+		    ((seg_dma & seg_mask) <= seg_mask - (seg_len + s_length))
+		   ) {
+			sg_dma_len(seg) += s_dma_len;
+		} else {
+			if (seg_len) {
+				seg = sg_next(seg);
+				count++;
+			}
+			sg_dma_len(seg) = s_dma_len - s_offset;
+			sg_dma_address(seg) = dma_addr + s_offset;
+
+			seg_len = s_offset;
+			seg_dma = dma_addr + s_offset;
+		}
+		seg_len += s_length;
+		dma_addr += s_dma_len;
+	}
+	return count;
+}
+
+/*
+ * If mapping failed, then just restore the original list,
+ * but making sure the DMA fields are invalidated.
+ */
+static void __invalidate_sg(struct scatterlist *sg, int nents)
+{
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nents, i) {
+		if (sg_dma_address(s) != DMA_ERROR_CODE)
+			s->offset = sg_dma_address(s);
+		if (sg_dma_len(s))
+			s->length = sg_dma_len(s);
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+	}
+}
+
+/*
+ * The DMA API client is passing in a scatterlist which could describe
+ * any old buffer layout, but the IOMMU API requires everything to be
+ * aligned to IOMMU pages. Hence the need for this complicated bit of
+ * impedance-matching, to be able to hand off a suitably-aligned list,
+ * but still preserve the original offsets and sizes for the caller.
+ */
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct scatterlist *s;
+	dma_addr_t dma_addr;
+	size_t iova_len = 0;
+	int i;
+
+	/*
+	 * Work out how much IOVA space we need, and align the segments to
+	 * IOVA granules for the IOMMU driver to handle. With some clever
+	 * trickery we can modify the list in-place, but reversibly, by
+	 * hiding the original data in the as-yet-unused DMA fields.
+	 */
+	for_each_sg(sg, s, nents, i) {
+		size_t s_offset = iova_offset(iovad, s->offset);
+		size_t s_length = s->length;
+
+		sg_dma_address(s) = s->offset;
+		sg_dma_len(s) = s_length;
+		s->offset -= s_offset;
+		s_length = iova_align(iovad, s_length + s_offset);
+		s->length = s_length;
+
+		iova_len += s_length;
+	}
+
+	iova = __alloc_iova(iovad, iova_len, dma_get_mask(dev));
+	if (!iova)
+		goto out_restore_sg;
+
+	/*
+	 * We'll leave any physical concatenation to the IOMMU driver's
+	 * implementation - it knows better than we do.
+	 */
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sg, nents, prot) < iova_len)
+		goto out_free_iova;
+
+	return __finalise_sg(dev, sg, nents, dma_addr);
+
+out_free_iova:
+	__free_iova(iovad, iova);
+out_restore_sg:
+	__invalidate_sg(sg, nents);
+	return 0;
+}
+
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	/*
+	 * The scatterlist segments are mapped contiguously
+	 * in IOVA space, so this is incredibly easy.
+	 */
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
+}
+
+int iommu_dma_supported(struct device *dev, u64 mask)
+{
+	/*
+	 * 'Special' IOMMUs which don't have the same addressing capability
+	 * as the CPU will have to wait until we have some way to query that
+	 * before they'll be able to use this framework.
+	 */
+	return 1;
+}
+
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	return dma_addr == DMA_ERROR_CODE;
+}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
new file mode 100644
index 0000000..227299f
--- /dev/null
+++ b/include/linux/dma-iommu.h
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __DMA_IOMMU_H
+#define __DMA_IOMMU_H
+
+#ifdef __KERNEL__
+
+#include <linux/iommu.h>
+
+#ifdef CONFIG_IOMMU_DMA
+
+int iommu_dma_init(void);
+
+/* Domain management interface for IOMMU drivers */
+int iommu_get_dma_cookie(struct iommu_domain *domain);
+void iommu_put_dma_cookie(struct iommu_domain *domain);
+
+/* Setup call for arch DMA mapping code */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size);
+
+/* General helpers for DMA-API <-> IOMMU-API interaction */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+
+/*
+ * These implement the bulk of the relevant DMA mapping callbacks, but require
+ * the arch code to take care of attributes and cache maintenance
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size,
+		gfp_t gfp, int prot, dma_addr_t *handle,
+		void (*flush_page)(struct device *, const void *, phys_addr_t));
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle);
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma);
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot);
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot);
+
+/*
+ * Arch code with no special attribute handling may use these
+ * directly as DMA mapping callbacks for simplicity
+ */
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+int iommu_dma_supported(struct device *dev, u64 mask);
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
+#else
+
+static inline int iommu_dma_init(void)
+{
+	return 0;
+}
+
+static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+}
+
+#endif	/* CONFIG_IOMMU_DMA */
+
+#endif	/* __KERNEL__ */
+#endif	/* __DMA_IOMMU_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f9c1b6d..dd176a8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -81,6 +81,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	void *dma_api_cookie;
 };
 
 enum iommu_cap {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-07-31 17:18     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-07-31 17:18 UTC (permalink / raw)
  To: linux-arm-kernel

Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/Kconfig     |   7 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/dma-iommu.c | 534 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  84 ++++++++
 include/linux/iommu.h     |   1 +
 5 files changed, 627 insertions(+)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8a1bc38..4996dc3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -48,6 +48,13 @@ config OF_IOMMU
        def_bool y
        depends on OF && IOMMU_API
 
+# IOMMU-agnostic DMA-mapping layer
+config IOMMU_DMA
+	bool
+	depends on NEED_SG_DMA_LENGTH
+	select IOMMU_API
+	select IOMMU_IOVA
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PPC32
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index dc6f511..45efa2a 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
new file mode 100644
index 0000000..f34fd46
--- /dev/null
+++ b/drivers/iommu/dma-iommu.c
@@ -0,0 +1,534 @@
+/*
+ * A fairly generic DMA-API to IOMMU-API glue layer.
+ *
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * based in part on arch/arm/mm/dma-mapping.c:
+ * Copyright (C) 2000-2004 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-iommu.h>
+#include <linux/huge_mm.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+
+int iommu_dma_init(void)
+{
+	return iova_cache_get();
+}
+
+/**
+ * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
+ * @domain: IOMMU domain to prepare for DMA-API usage
+ *
+ * IOMMU drivers should normally call this from their domain_alloc
+ * callback when domain->type == IOMMU_DOMAIN_DMA.
+ */
+int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad;
+
+	if (domain->dma_api_cookie)
+		return -EEXIST;
+
+	iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
+	domain->dma_api_cookie = iovad;
+
+	return iovad ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_put_dma_cookie - Release a domain's DMA mapping resources
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ *
+ * IOMMU drivers should normally call this from their domain_free callback.
+ */
+void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+
+	if (!iovad)
+		return;
+
+	put_iova_domain(iovad);
+	kfree(iovad);
+	domain->dma_api_cookie = NULL;
+}
+EXPORT_SYMBOL(iommu_put_dma_cookie);
+
+/**
+ * iommu_dma_init_domain - Initialise a DMA mapping domain
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @base: IOVA at which the mappable address space starts
+ * @size: Size of IOVA space
+ *
+ * @base and @size should be exact multiples of IOMMU page granularity to
+ * avoid rounding surprises. If necessary, we reserve the page at address 0
+ * to ensure it is an invalid IOVA.
+ */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long order, base_pfn, end_pfn;
+
+	if (!iovad)
+		return -ENODEV;
+
+	/* Use the smallest supported page size for IOVA granularity */
+	order = __ffs(domain->ops->pgsize_bitmap);
+	base_pfn = max_t(unsigned long, 1, base >> order);
+	end_pfn = (base + size - 1) >> order;
+
+	/* Check the domain allows at least some access to the device... */
+	if (domain->geometry.force_aperture) {
+		if (base > domain->geometry.aperture_end ||
+		    base + size <= domain->geometry.aperture_start) {
+			pr_warn("specified DMA range outside IOMMU capability\n");
+			return -EFAULT;
+		}
+		/* ...then finally give it a kicking to make sure it fits */
+		base_pfn = max_t(unsigned long, base_pfn,
+				domain->geometry.aperture_start >> order);
+		end_pfn = min_t(unsigned long, end_pfn,
+				domain->geometry.aperture_end >> order);
+	}
+
+	/* All we can safely do with an existing domain is enlarge it */
+	if (iovad->start_pfn) {
+		if (1UL << order != iovad->granule ||
+		    base_pfn != iovad->start_pfn ||
+		    end_pfn < iovad->dma_32bit_pfn) {
+			pr_warn("Incompatible range for DMA domain\n");
+			return -EFAULT;
+		}
+		iovad->dma_32bit_pfn = end_pfn;
+	} else {
+		init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(iommu_dma_init_domain);
+
+/**
+ * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * @dir: Direction of DMA transfer
+ * @coherent: Is the DMA master cache-coherent?
+ *
+ * Return: corresponding IOMMU API page protection flags
+ */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+{
+	int prot = coherent ? IOMMU_CACHE : 0;
+
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		return prot | IOMMU_READ | IOMMU_WRITE;
+	case DMA_TO_DEVICE:
+		return prot | IOMMU_READ;
+	case DMA_FROM_DEVICE:
+		return prot | IOMMU_WRITE;
+	default:
+		return 0;
+	}
+}
+
+static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
+		dma_addr_t dma_limit)
+{
+	unsigned long shift = iova_shift(iovad);
+	unsigned long length = iova_align(iovad, size) >> shift;
+
+	/*
+	 * Enforce size-alignment to be safe - there should probably be
+	 * an attribute to control this per-device, or at least per-domain...
+	 */
+	return alloc_iova(iovad, length, dma_limit >> shift, true);
+}
+
+/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
+static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long shift = iova_shift(iovad);
+	unsigned long pfn = dma_addr >> shift;
+	struct iova *iova = find_iova(iovad, pfn);
+	size_t size = iova_size(iova) << shift;
+
+	/* ...and if we can't, then something is horribly, horribly wrong */
+	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
+	__free_iova(iovad, iova);
+}
+
+static void __iommu_dma_free_pages(struct page **pages, int count)
+{
+	while (count--)
+		__free_page(pages[count]);
+	kvfree(pages);
+}
+
+static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
+{
+	struct page **pages;
+	unsigned int i = 0, array_size = count * sizeof(*pages);
+
+	if (array_size <= PAGE_SIZE)
+		pages = kzalloc(array_size, GFP_KERNEL);
+	else
+		pages = vzalloc(array_size);
+	if (!pages)
+		return NULL;
+
+	/* IOMMU can map any pages, so himem can also be used here */
+	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+
+	while (count) {
+		struct page *page = NULL;
+		int j, order = __fls(count);
+
+		/*
+		 * Higher-order allocations are a convenience rather
+		 * than a necessity, hence using __GFP_NORETRY until
+		 * falling back to single-page allocations.
+		 */
+		for (order = min(order, MAX_ORDER); order > 0; order--) {
+			page = alloc_pages(gfp | __GFP_NORETRY, order);
+			if (!page)
+				continue;
+			if (PageCompound(page)) {
+				if (!split_huge_page(page))
+					break;
+				__free_pages(page, order);
+			} else {
+				split_page(page, order);
+				break;
+			}
+		}
+		if (!page)
+			page = alloc_page(gfp);
+		if (!page) {
+			__iommu_dma_free_pages(pages, i);
+			return NULL;
+		}
+		j = 1 << order;
+		count -= j;
+		while (j--)
+			pages[i++] = page++;
+	}
+	return pages;
+}
+
+/**
+ * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc()
+ * @dev: Device which owns this buffer
+ * @pages: Array of buffer pages as returned by iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @handle: DMA address of buffer
+ *
+ * Frees both the pages associated with the buffer, and the array
+ * describing them
+ */
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle);
+	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+	*handle = DMA_ERROR_CODE;
+}
+
+/**
+ * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
+ * @dev: Device to allocate memory for. Must be a real device
+ *	 attached to an iommu_dma_domain
+ * @size: Size of buffer in bytes
+ * @gfp: Allocation flags
+ * @prot: IOMMU mapping flags
+ * @handle: Out argument for allocated DMA handle
+ * @flush_page: Arch callback which must ensure PAGE_SIZE bytes from the
+ *		given VA/PA are visible to the given non-coherent device.
+ *
+ * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
+ * but an IOMMU which supports smaller pages might not map the whole thing.
+ *
+ * Return: Array of struct page pointers describing the buffer,
+ *	   or NULL on failure.
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size,
+		gfp_t gfp, int prot, dma_addr_t *handle,
+		void (*flush_page)(struct device *, const void *, phys_addr_t))
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct page **pages;
+	struct sg_table sgt;
+	dma_addr_t dma_addr;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	*handle = DMA_ERROR_CODE;
+
+	pages = __iommu_dma_alloc_pages(count, gfp);
+	if (!pages)
+		return NULL;
+
+	iova = __alloc_iova(iovad, size, dev->coherent_dma_mask);
+	if (!iova)
+		goto out_free_pages;
+
+	size = iova_align(iovad, size);
+	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
+		goto out_free_iova;
+
+	if (!(prot & IOMMU_CACHE)) {
+		struct sg_mapping_iter miter;
+		/*
+		 * The CPU-centric flushing implied by SG_MITER_TO_SG isn't
+		 * sufficient here, so skip it by using the "wrong" direction.
+		 */
+		sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG);
+		while (sg_miter_next(&miter))
+			flush_page(dev, miter.addr, page_to_phys(miter.page));
+		sg_miter_stop(&miter);
+	}
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sgt.sgl, sgt.orig_nents, prot)
+			< size)
+		goto out_free_sg;
+
+	*handle = dma_addr;
+	sg_free_table(&sgt);
+	return pages;
+
+out_free_sg:
+	sg_free_table(&sgt);
+out_free_iova:
+	__free_iova(iovad, iova);
+out_free_pages:
+	__iommu_dma_free_pages(pages, count);
+	return NULL;
+}
+
+/**
+ * iommu_dma_mmap - Map a buffer into provided user VMA
+ * @pages: Array representing buffer from iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @vma: VMA describing requested userspace mapping
+ *
+ * Maps the pages of the buffer in @pages into @vma. The caller is responsible
+ * for verifying the correct size and protection of @vma beforehand.
+ */
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
+{
+	unsigned long uaddr = vma->vm_start;
+	unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	int ret = -ENXIO;
+
+	for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
+		ret = vm_insert_page(vma, uaddr, pages[i]);
+		if (ret)
+			break;
+		uaddr += PAGE_SIZE;
+	}
+	return ret;
+}
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot)
+{
+	dma_addr_t dma_addr;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	phys_addr_t phys = page_to_phys(page) + offset;
+	size_t iova_off = iova_offset(iovad, phys);
+	size_t len = iova_align(iovad, size + iova_off);
+	struct iova *iova = __alloc_iova(iovad, len, dma_get_mask(dev));
+
+	if (!iova)
+		return DMA_ERROR_CODE;
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map(domain, dma_addr, phys - iova_off, len, prot)) {
+		__free_iova(iovad, iova);
+		return DMA_ERROR_CODE;
+	}
+	return dma_addr + iova_off;
+}
+
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle);
+}
+
+/*
+ * Go and look at iommu_dma_map_sg first; It's OK, I'll wait...
+ *
+ * ...right, now that the scatterlist pages are all contiguous from the
+ * device's viewpoint, we can collapse any buffer segments which run
+ * together (subject to the device's segment limitations), filling in
+ * the DMA fields at the same time as we run through the list.
+ */
+static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
+		dma_addr_t dma_addr)
+{
+	struct scatterlist *s, *seg = sg;
+	unsigned long seg_mask = dma_get_seg_boundary(dev);
+	unsigned int max_len = dma_get_max_seg_size(dev);
+	unsigned int seg_len = 0, seg_dma = 0;
+	int i, count = 1;
+
+	for_each_sg(sg, s, nents, i) {
+		/* Un-swizzling the fields here, hence the naming mismatch */
+		unsigned int s_offset = sg_dma_address(s);
+		unsigned int s_length = sg_dma_len(s);
+		unsigned int s_dma_len = s->length;
+
+		s->offset = s_offset;
+		s->length = s_length;
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+
+		/*
+		 * This ensures any concatenation we do doesn't exceed the
+		 * dma_parms limits, but it also won't fail if any segments
+		 * were out of spec to begin with - they'll just stay as-is.
+		 */
+		if (seg_len && (seg_dma + seg_len == dma_addr + s_offset) &&
+		    (seg_len + s_dma_len <= max_len) &&
+		    ((seg_dma & seg_mask) <= seg_mask - (seg_len + s_length))
+		   ) {
+			sg_dma_len(seg) += s_dma_len;
+		} else {
+			if (seg_len) {
+				seg = sg_next(seg);
+				count++;
+			}
+			sg_dma_len(seg) = s_dma_len - s_offset;
+			sg_dma_address(seg) = dma_addr + s_offset;
+
+			seg_len = s_offset;
+			seg_dma = dma_addr + s_offset;
+		}
+		seg_len += s_length;
+		dma_addr += s_dma_len;
+	}
+	return count;
+}
+
+/*
+ * If mapping failed, then just restore the original list,
+ * but making sure the DMA fields are invalidated.
+ */
+static void __invalidate_sg(struct scatterlist *sg, int nents)
+{
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nents, i) {
+		if (sg_dma_address(s) != DMA_ERROR_CODE)
+			s->offset = sg_dma_address(s);
+		if (sg_dma_len(s))
+			s->length = sg_dma_len(s);
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+	}
+}
+
+/*
+ * The DMA API client is passing in a scatterlist which could describe
+ * any old buffer layout, but the IOMMU API requires everything to be
+ * aligned to IOMMU pages. Hence the need for this complicated bit of
+ * impedance-matching, to be able to hand off a suitably-aligned list,
+ * but still preserve the original offsets and sizes for the caller.
+ */
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct scatterlist *s;
+	dma_addr_t dma_addr;
+	size_t iova_len = 0;
+	int i;
+
+	/*
+	 * Work out how much IOVA space we need, and align the segments to
+	 * IOVA granules for the IOMMU driver to handle. With some clever
+	 * trickery we can modify the list in-place, but reversibly, by
+	 * hiding the original data in the as-yet-unused DMA fields.
+	 */
+	for_each_sg(sg, s, nents, i) {
+		size_t s_offset = iova_offset(iovad, s->offset);
+		size_t s_length = s->length;
+
+		sg_dma_address(s) = s->offset;
+		sg_dma_len(s) = s_length;
+		s->offset -= s_offset;
+		s_length = iova_align(iovad, s_length + s_offset);
+		s->length = s_length;
+
+		iova_len += s_length;
+	}
+
+	iova = __alloc_iova(iovad, iova_len, dma_get_mask(dev));
+	if (!iova)
+		goto out_restore_sg;
+
+	/*
+	 * We'll leave any physical concatenation to the IOMMU driver's
+	 * implementation - it knows better than we do.
+	 */
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sg, nents, prot) < iova_len)
+		goto out_free_iova;
+
+	return __finalise_sg(dev, sg, nents, dma_addr);
+
+out_free_iova:
+	__free_iova(iovad, iova);
+out_restore_sg:
+	__invalidate_sg(sg, nents);
+	return 0;
+}
+
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	/*
+	 * The scatterlist segments are mapped contiguously
+	 * in IOVA space, so this is incredibly easy.
+	 */
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
+}
+
+int iommu_dma_supported(struct device *dev, u64 mask)
+{
+	/*
+	 * 'Special' IOMMUs which don't have the same addressing capability
+	 * as the CPU will have to wait until we have some way to query that
+	 * before they'll be able to use this framework.
+	 */
+	return 1;
+}
+
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	return dma_addr == DMA_ERROR_CODE;
+}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
new file mode 100644
index 0000000..227299f
--- /dev/null
+++ b/include/linux/dma-iommu.h
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __DMA_IOMMU_H
+#define __DMA_IOMMU_H
+
+#ifdef __KERNEL__
+
+#include <linux/iommu.h>
+
+#ifdef CONFIG_IOMMU_DMA
+
+int iommu_dma_init(void);
+
+/* Domain management interface for IOMMU drivers */
+int iommu_get_dma_cookie(struct iommu_domain *domain);
+void iommu_put_dma_cookie(struct iommu_domain *domain);
+
+/* Setup call for arch DMA mapping code */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size);
+
+/* General helpers for DMA-API <-> IOMMU-API interaction */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+
+/*
+ * These implement the bulk of the relevant DMA mapping callbacks, but require
+ * the arch code to take care of attributes and cache maintenance
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size,
+		gfp_t gfp, int prot, dma_addr_t *handle,
+		void (*flush_page)(struct device *, const void *, phys_addr_t));
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle);
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma);
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot);
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot);
+
+/*
+ * Arch code with no special attribute handling may use these
+ * directly as DMA mapping callbacks for simplicity
+ */
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+int iommu_dma_supported(struct device *dev, u64 mask);
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
+#else
+
+static inline int iommu_dma_init(void)
+{
+	return 0;
+}
+
+static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+}
+
+#endif	/* CONFIG_IOMMU_DMA */
+
+#endif	/* __KERNEL__ */
+#endif	/* __DMA_IOMMU_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f9c1b6d..dd176a8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -81,6 +81,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	void *dma_api_cookie;
 };
 
 enum iommu_cap {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
  2015-07-31 17:18 ` Robin Murphy
@ 2015-07-31 17:18     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-07-31 17:18 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

Taking some inspiration from the arch/arm code, implement the
arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.

Unfortunately the device setup code has to start out as a big ugly mess
in order to work usefully right now, as 'proper' operation depends on
changes to device probe and DMA configuration ordering, IOMMU groups for
platform devices, and default domain support in arm/arm64 IOMMU drivers.
The workarounds here need only exist until that work is finished.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm64/mm/dma-mapping.c | 425 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index e5d74cd..c735f45 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -534,3 +534,428 @@ static int __init dma_debug_do_init(void)
 	return 0;
 }
 fs_initcall(dma_debug_do_init);
+
+
+#ifdef CONFIG_IOMMU_DMA
+#include <linux/dma-iommu.h>
+#include <linux/platform_device.h>
+#include <linux/amba/bus.h>
+
+/* Thankfully, all cache ops are by VA so we can ignore phys here */
+static void flush_page(struct device *dev, const void *virt, phys_addr_t phys)
+{
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+}
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+				 dma_addr_t *handle, gfp_t gfp,
+				 struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	void *addr;
+
+	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
+		return NULL;
+	/*
+	 * Some drivers rely on this, and we probably don't want the
+	 * possibility of stale kernel data being read by devices anyway.
+	 */
+	gfp |= __GFP_ZERO;
+
+	if (gfp & __GFP_WAIT) {
+		struct page **pages;
+		pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
+
+		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	handle,
+					flush_page);
+		if (!pages)
+			return NULL;
+
+		addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
+					      __builtin_return_address(0));
+		if (!addr)
+			iommu_dma_free(dev, pages, size, handle);
+	} else {
+		struct page *page;
+		/*
+		 * In atomic context we can't remap anything, so we'll only
+		 * get the virtually contiguous buffer we need by way of a
+		 * physically contiguous allocation.
+		 */
+		if (coherent) {
+			page = alloc_pages(gfp, get_order(size));
+			addr = page ? page_address(page) : NULL;
+		} else {
+			addr = __alloc_from_pool(size, &page, gfp);
+		}
+		if (!addr)
+			return NULL;
+
+		*handle = iommu_dma_map_page(dev, page, 0, size, ioprot);
+		if (iommu_dma_mapping_error(dev, *handle)) {
+			if (coherent)
+				__free_pages(page, get_order(size));
+			else
+				__free_from_pool(addr, size);
+			addr = NULL;
+		}
+	}
+	return addr;
+}
+
+static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+			       dma_addr_t handle, struct dma_attrs *attrs)
+{
+	/*
+	 * @cpu_addr will be one of 3 things depending on how it was allocated:
+	 * - A remapped array of pages from iommu_dma_alloc(), for all
+	 *   non-atomic allocations.
+	 * - A non-cacheable alias from the atomic pool, for atomic
+	 *   allocations by non-coherent devices.
+	 * - A normal lowmem address, for atomic allocations by
+	 *   coherent devices.
+	 * Hence how dodgy the below logic looks...
+	 */
+	if (__in_atomic_pool(cpu_addr, size)) {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+		__free_from_pool(cpu_addr, size);
+	} else if (is_vmalloc_addr(cpu_addr)){
+		struct vm_struct *area = find_vm_area(cpu_addr);
+
+		if (WARN_ON(!area || !area->pages))
+			return;
+		iommu_dma_free(dev, area->pages, size, &handle);
+		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+	} else {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+		__free_pages(virt_to_page(cpu_addr), get_order(size));
+	}
+}
+
+static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			      struct dma_attrs *attrs)
+{
+	struct vm_struct *area;
+	int ret;
+
+	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
+					     is_device_dma_coherent(dev));
+
+	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	area = find_vm_area(cpu_addr);
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return iommu_dma_mmap(area->pages, size, vma);
+}
+
+static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
+			       void *cpu_addr, dma_addr_t dma_addr,
+			       size_t size, struct dma_attrs *attrs)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct vm_struct *area = find_vm_area(cpu_addr);
+
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+					 GFP_KERNEL);
+}
+
+static void __iommu_sync_single_for_cpu(struct device *dev,
+					dma_addr_t dev_addr, size_t size,
+					enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_unmap_area(phys_to_virt(phys), size, dir);
+}
+
+static void __iommu_sync_single_for_device(struct device *dev,
+					   dma_addr_t dev_addr, size_t size,
+					   enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_map_area(phys_to_virt(phys), size, dir);
+}
+
+static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int prot = dma_direction_to_prot(dir, coherent);
+	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
+
+	if (!iommu_dma_mapping_error(dev, dev_addr) &&
+	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+
+	return dev_addr;
+}
+
+static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
+
+	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static void __iommu_sync_sg_for_cpu(struct device *dev,
+				    struct scatterlist *sgl, int nelems,
+				    enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_unmap_area(sg_virt(sg), sg->length, dir);
+}
+
+static void __iommu_sync_sg_for_device(struct device *dev,
+				       struct scatterlist *sgl, int nelems,
+				       enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_map_area(sg_virt(sg), sg->length, dir);
+}
+
+static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+				int nelems, enum dma_data_direction dir,
+				struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
+
+	return iommu_dma_map_sg(dev, sgl, nelems,
+			dma_direction_to_prot(dir, coherent));
+}
+
+static void __iommu_unmap_sg_attrs(struct device *dev,
+				   struct scatterlist *sgl, int nelems,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
+
+	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static struct dma_map_ops iommu_dma_ops = {
+	.alloc = __iommu_alloc_attrs,
+	.free = __iommu_free_attrs,
+	.mmap = __iommu_mmap_attrs,
+	.get_sgtable = __iommu_get_sgtable,
+	.map_page = __iommu_map_page,
+	.unmap_page = __iommu_unmap_page,
+	.map_sg = __iommu_map_sg_attrs,
+	.unmap_sg = __iommu_unmap_sg_attrs,
+	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
+	.sync_single_for_device = __iommu_sync_single_for_device,
+	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
+	.sync_sg_for_device = __iommu_sync_sg_for_device,
+	.dma_supported = iommu_dma_supported,
+	.mapping_error = iommu_dma_mapping_error,
+};
+
+/*
+ * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
+ * everything it needs to - the device isn't yet fully created, and the
+ * IOMMU driver hasn't seen it yet, so we need this delayed attachment
+ * dance. Once IOMMU probe ordering is sorted to move the
+ * arch_setup_dma_ops() call later, all the notifier bits below become
+ * unnecessary, and will go away.
+ */
+struct iommu_dma_notifier_data {
+	struct list_head list;
+	struct device *dev;
+	struct iommu_domain *dma_domain;
+};
+static LIST_HEAD(iommu_dma_masters);
+static DEFINE_MUTEX(iommu_dma_notifier_lock);
+
+static int __iommu_attach_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct iommu_dma_notifier_data *master, *tmp;
+
+	if (action != BUS_NOTIFY_ADD_DEVICE)
+		return 0;
+	/*
+	 * We expect the list to only contain the most recent addition
+	 * (which should be the same device as in @data) so just process
+	 * the whole thing blindly. If any previous attachments did happen
+	 * to fail, they get a free retry since the domains are still live.
+	 */
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
+		if (iommu_attach_device(master->dma_domain, master->dev)) {
+			pr_warn("Failed to attach device %s to IOMMU mapping; retaining platform DMA ops\n",
+				dev_name(master->dev));
+		} else {
+			master->dev->archdata.dma_ops = &iommu_dma_ops;
+			list_del(&master->list);
+			kfree(master);
+		}
+	}
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int register_iommu_dma_ops_notifier(struct bus_type *bus)
+{
+	int ret;
+	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
+
+	/*
+	 * The device must be attached to a domain before the driver probe
+	 * routine gets a chance to start allocating DMA buffers. However,
+	 * the IOMMU driver also needs a chance to configure the iommu_group
+	 * via its add_device callback first, so we need to make the attach
+	 * happen between those two points. Since the IOMMU core uses a bus
+	 * notifier with default priority for add_device, do the same but
+	 * with a lower priority to ensure the appropriate ordering.
+	 */
+	nb->notifier_call = __iommu_attach_notifier;
+	nb->priority = -100;
+
+	ret = bus_register_notifier(bus, nb);
+	if (ret) {
+		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
+			bus->name);
+		kfree(nb);
+	}
+	return ret;
+}
+
+static int queue_iommu_attach(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_dma_notifier_data *iommudata = NULL;
+
+	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
+	if (!iommudata)
+		return -ENOMEM;
+
+	iommudata->dev = dev;
+	iommudata->dma_domain = domain;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_add(&iommudata->list, &iommu_dma_masters);
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int __init arm64_iommu_dma_init(void)
+{
+	int ret;
+
+	ret = iommu_dma_init();
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&amba_bustype);
+	return ret;
+}
+arch_initcall(arm64_iommu_dma_init);
+
+/* Hijack some domain feature flags for the stop-gap meddling below */
+#define __IOMMU_DOMAIN_ARM64		(1U << 31)
+#define __IOMMU_DOMAIN_ARM64_IOVA	(1U << 30)
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops)
+{
+	struct iommu_domain *domain;
+	int err;
+
+	if (!ops)
+		return;
+	/*
+	 * In a perfect world, everything happened in the right order up to
+	 * here, and the IOMMU core has already attached the device to an
+	 * appropriate default domain for us to set up...
+	 */
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain) {
+		/*
+		 * Urgh. Reliable default domains for platform devices can't
+		 * happen anyway without some sensible way of handling
+		 * non-trivial groups. So until then, HORRIBLE HACKS!
+		 */
+		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
+		if (!domain)
+			domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+		if (!domain)
+			goto out_no_domain;
+
+		domain->ops = ops;
+		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_ARM64;
+		if (!domain->dma_api_cookie) {
+			domain->type |= __IOMMU_DOMAIN_ARM64_IOVA;
+			if (iommu_get_dma_cookie(domain))
+				goto out_put_domain;
+		}
+	}
+
+	if (iommu_dma_init_domain(domain, dma_base, size)) {
+		pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n",
+			size, dev_name(dev));
+		goto out_put_domain;
+	}
+
+	if (dev->iommu_group)
+		err = iommu_attach_device(domain, dev);
+	else
+		err = queue_iommu_attach(domain, dev);
+
+	if (!err) {
+		dev->archdata.dma_ops = &iommu_dma_ops;
+		return;
+	}
+
+out_put_domain:
+	if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+		iommu_put_dma_cookie(domain);
+	if (domain->type & __IOMMU_DOMAIN_ARM64)
+		ops->domain_free(domain);
+out_no_domain:
+	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
+}
+
+#else
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  struct iommu_ops *iommu)
+{ }
+
+#endif  /* CONFIG_IOMMU_DMA */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
@ 2015-07-31 17:18     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-07-31 17:18 UTC (permalink / raw)
  To: linux-arm-kernel

Taking some inspiration from the arch/arm code, implement the
arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.

Unfortunately the device setup code has to start out as a big ugly mess
in order to work usefully right now, as 'proper' operation depends on
changes to device probe and DMA configuration ordering, IOMMU groups for
platform devices, and default domain support in arm/arm64 IOMMU drivers.
The workarounds here need only exist until that work is finished.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/mm/dma-mapping.c | 425 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index e5d74cd..c735f45 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -534,3 +534,428 @@ static int __init dma_debug_do_init(void)
 	return 0;
 }
 fs_initcall(dma_debug_do_init);
+
+
+#ifdef CONFIG_IOMMU_DMA
+#include <linux/dma-iommu.h>
+#include <linux/platform_device.h>
+#include <linux/amba/bus.h>
+
+/* Thankfully, all cache ops are by VA so we can ignore phys here */
+static void flush_page(struct device *dev, const void *virt, phys_addr_t phys)
+{
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+}
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+				 dma_addr_t *handle, gfp_t gfp,
+				 struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	void *addr;
+
+	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
+		return NULL;
+	/*
+	 * Some drivers rely on this, and we probably don't want the
+	 * possibility of stale kernel data being read by devices anyway.
+	 */
+	gfp |= __GFP_ZERO;
+
+	if (gfp & __GFP_WAIT) {
+		struct page **pages;
+		pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
+
+		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	handle,
+					flush_page);
+		if (!pages)
+			return NULL;
+
+		addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
+					      __builtin_return_address(0));
+		if (!addr)
+			iommu_dma_free(dev, pages, size, handle);
+	} else {
+		struct page *page;
+		/*
+		 * In atomic context we can't remap anything, so we'll only
+		 * get the virtually contiguous buffer we need by way of a
+		 * physically contiguous allocation.
+		 */
+		if (coherent) {
+			page = alloc_pages(gfp, get_order(size));
+			addr = page ? page_address(page) : NULL;
+		} else {
+			addr = __alloc_from_pool(size, &page, gfp);
+		}
+		if (!addr)
+			return NULL;
+
+		*handle = iommu_dma_map_page(dev, page, 0, size, ioprot);
+		if (iommu_dma_mapping_error(dev, *handle)) {
+			if (coherent)
+				__free_pages(page, get_order(size));
+			else
+				__free_from_pool(addr, size);
+			addr = NULL;
+		}
+	}
+	return addr;
+}
+
+static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+			       dma_addr_t handle, struct dma_attrs *attrs)
+{
+	/*
+	 * @cpu_addr will be one of 3 things depending on how it was allocated:
+	 * - A remapped array of pages from iommu_dma_alloc(), for all
+	 *   non-atomic allocations.
+	 * - A non-cacheable alias from the atomic pool, for atomic
+	 *   allocations by non-coherent devices.
+	 * - A normal lowmem address, for atomic allocations by
+	 *   coherent devices.
+	 * Hence how dodgy the below logic looks...
+	 */
+	if (__in_atomic_pool(cpu_addr, size)) {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+		__free_from_pool(cpu_addr, size);
+	} else if (is_vmalloc_addr(cpu_addr)){
+		struct vm_struct *area = find_vm_area(cpu_addr);
+
+		if (WARN_ON(!area || !area->pages))
+			return;
+		iommu_dma_free(dev, area->pages, size, &handle);
+		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+	} else {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+		__free_pages(virt_to_page(cpu_addr), get_order(size));
+	}
+}
+
+static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			      struct dma_attrs *attrs)
+{
+	struct vm_struct *area;
+	int ret;
+
+	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
+					     is_device_dma_coherent(dev));
+
+	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	area = find_vm_area(cpu_addr);
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return iommu_dma_mmap(area->pages, size, vma);
+}
+
+static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
+			       void *cpu_addr, dma_addr_t dma_addr,
+			       size_t size, struct dma_attrs *attrs)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct vm_struct *area = find_vm_area(cpu_addr);
+
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+					 GFP_KERNEL);
+}
+
+static void __iommu_sync_single_for_cpu(struct device *dev,
+					dma_addr_t dev_addr, size_t size,
+					enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_unmap_area(phys_to_virt(phys), size, dir);
+}
+
+static void __iommu_sync_single_for_device(struct device *dev,
+					   dma_addr_t dev_addr, size_t size,
+					   enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_map_area(phys_to_virt(phys), size, dir);
+}
+
+static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int prot = dma_direction_to_prot(dir, coherent);
+	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
+
+	if (!iommu_dma_mapping_error(dev, dev_addr) &&
+	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+
+	return dev_addr;
+}
+
+static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
+
+	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static void __iommu_sync_sg_for_cpu(struct device *dev,
+				    struct scatterlist *sgl, int nelems,
+				    enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_unmap_area(sg_virt(sg), sg->length, dir);
+}
+
+static void __iommu_sync_sg_for_device(struct device *dev,
+				       struct scatterlist *sgl, int nelems,
+				       enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_map_area(sg_virt(sg), sg->length, dir);
+}
+
+static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+				int nelems, enum dma_data_direction dir,
+				struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
+
+	return iommu_dma_map_sg(dev, sgl, nelems,
+			dma_direction_to_prot(dir, coherent));
+}
+
+static void __iommu_unmap_sg_attrs(struct device *dev,
+				   struct scatterlist *sgl, int nelems,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
+
+	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static struct dma_map_ops iommu_dma_ops = {
+	.alloc = __iommu_alloc_attrs,
+	.free = __iommu_free_attrs,
+	.mmap = __iommu_mmap_attrs,
+	.get_sgtable = __iommu_get_sgtable,
+	.map_page = __iommu_map_page,
+	.unmap_page = __iommu_unmap_page,
+	.map_sg = __iommu_map_sg_attrs,
+	.unmap_sg = __iommu_unmap_sg_attrs,
+	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
+	.sync_single_for_device = __iommu_sync_single_for_device,
+	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
+	.sync_sg_for_device = __iommu_sync_sg_for_device,
+	.dma_supported = iommu_dma_supported,
+	.mapping_error = iommu_dma_mapping_error,
+};
+
+/*
+ * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
+ * everything it needs to - the device isn't yet fully created, and the
+ * IOMMU driver hasn't seen it yet, so we need this delayed attachment
+ * dance. Once IOMMU probe ordering is sorted to move the
+ * arch_setup_dma_ops() call later, all the notifier bits below become
+ * unnecessary, and will go away.
+ */
+struct iommu_dma_notifier_data {
+	struct list_head list;
+	struct device *dev;
+	struct iommu_domain *dma_domain;
+};
+static LIST_HEAD(iommu_dma_masters);
+static DEFINE_MUTEX(iommu_dma_notifier_lock);
+
+static int __iommu_attach_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct iommu_dma_notifier_data *master, *tmp;
+
+	if (action != BUS_NOTIFY_ADD_DEVICE)
+		return 0;
+	/*
+	 * We expect the list to only contain the most recent addition
+	 * (which should be the same device as in @data) so just process
+	 * the whole thing blindly. If any previous attachments did happen
+	 * to fail, they get a free retry since the domains are still live.
+	 */
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
+		if (iommu_attach_device(master->dma_domain, master->dev)) {
+			pr_warn("Failed to attach device %s to IOMMU mapping; retaining platform DMA ops\n",
+				dev_name(master->dev));
+		} else {
+			master->dev->archdata.dma_ops = &iommu_dma_ops;
+			list_del(&master->list);
+			kfree(master);
+		}
+	}
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int register_iommu_dma_ops_notifier(struct bus_type *bus)
+{
+	int ret;
+	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
+
+	/*
+	 * The device must be attached to a domain before the driver probe
+	 * routine gets a chance to start allocating DMA buffers. However,
+	 * the IOMMU driver also needs a chance to configure the iommu_group
+	 * via its add_device callback first, so we need to make the attach
+	 * happen between those two points. Since the IOMMU core uses a bus
+	 * notifier with default priority for add_device, do the same but
+	 * with a lower priority to ensure the appropriate ordering.
+	 */
+	nb->notifier_call = __iommu_attach_notifier;
+	nb->priority = -100;
+
+	ret = bus_register_notifier(bus, nb);
+	if (ret) {
+		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
+			bus->name);
+		kfree(nb);
+	}
+	return ret;
+}
+
+static int queue_iommu_attach(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_dma_notifier_data *iommudata = NULL;
+
+	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
+	if (!iommudata)
+		return -ENOMEM;
+
+	iommudata->dev = dev;
+	iommudata->dma_domain = domain;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_add(&iommudata->list, &iommu_dma_masters);
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int __init arm64_iommu_dma_init(void)
+{
+	int ret;
+
+	ret = iommu_dma_init();
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&amba_bustype);
+	return ret;
+}
+arch_initcall(arm64_iommu_dma_init);
+
+/* Hijack some domain feature flags for the stop-gap meddling below */
+#define __IOMMU_DOMAIN_ARM64		(1U << 31)
+#define __IOMMU_DOMAIN_ARM64_IOVA	(1U << 30)
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops)
+{
+	struct iommu_domain *domain;
+	int err;
+
+	if (!ops)
+		return;
+	/*
+	 * In a perfect world, everything happened in the right order up to
+	 * here, and the IOMMU core has already attached the device to an
+	 * appropriate default domain for us to set up...
+	 */
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain) {
+		/*
+		 * Urgh. Reliable default domains for platform devices can't
+		 * happen anyway without some sensible way of handling
+		 * non-trivial groups. So until then, HORRIBLE HACKS!
+		 */
+		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
+		if (!domain)
+			domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+		if (!domain)
+			goto out_no_domain;
+
+		domain->ops = ops;
+		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_ARM64;
+		if (!domain->dma_api_cookie) {
+			domain->type |= __IOMMU_DOMAIN_ARM64_IOVA;
+			if (iommu_get_dma_cookie(domain))
+				goto out_put_domain;
+		}
+	}
+
+	if (iommu_dma_init_domain(domain, dma_base, size)) {
+		pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n",
+			size, dev_name(dev));
+		goto out_put_domain;
+	}
+
+	if (dev->iommu_group)
+		err = iommu_attach_device(domain, dev);
+	else
+		err = queue_iommu_attach(domain, dev);
+
+	if (!err) {
+		dev->archdata.dma_ops = &iommu_dma_ops;
+		return;
+	}
+
+out_put_domain:
+	if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+		iommu_put_dma_cookie(domain);
+	if (domain->type & __IOMMU_DOMAIN_ARM64)
+		ops->domain_free(domain);
+out_no_domain:
+	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
+}
+
+#else
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  struct iommu_ops *iommu)
+{ }
+
+#endif  /* CONFIG_IOMMU_DMA */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 3/3] arm64: Hook up IOMMU dma_ops
  2015-07-31 17:18 ` Robin Murphy
@ 2015-07-31 17:18     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-07-31 17:18 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

With iommu_dma_ops in place, hook them up to the configuration code, so
IOMMU-fronted devices will get them automatically.

Acked-by: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/dma-mapping.h | 15 +++++++--------
 arch/arm64/mm/dma-mapping.c          | 24 ++++++++++++++++++++++++
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c8933dc..81584ef 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
+	select IOMMU_DMA if IOMMU_SUPPORT
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index f0d6d0b..7f9edcb 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -56,16 +56,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
-static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				      struct iommu_ops *iommu, bool coherent)
-{
-	if (!acpi_disabled && !dev->archdata.dma_ops)
-		dev->archdata.dma_ops = dma_ops;
-
-	dev->archdata.dma_coherent = coherent;
-}
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
+#ifdef CONFIG_IOMMU_DMA
+void arch_teardown_dma_ops(struct device *dev);
+#define arch_teardown_dma_ops	arch_teardown_dma_ops
+#endif
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c735f45..175dcb2 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -952,6 +952,20 @@ out_no_domain:
 	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
 }
 
+void arch_teardown_dma_ops(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain) {
+		iommu_detach_device(domain, dev);
+		if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+			iommu_put_dma_cookie(domain);
+		if (domain->type & __IOMMU_DOMAIN_ARM64)
+			iommu_domain_free(domain);
+		dev->archdata.dma_ops = NULL;
+	}
+}
+
 #else
 
 static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
@@ -959,3 +973,13 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 { }
 
 #endif  /* CONFIG_IOMMU_DMA */
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent)
+{
+	if (!acpi_disabled && !dev->archdata.dma_ops)
+		dev->archdata.dma_ops = dma_ops;
+
+	dev->archdata.dma_coherent = coherent;
+	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 3/3] arm64: Hook up IOMMU dma_ops
@ 2015-07-31 17:18     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-07-31 17:18 UTC (permalink / raw)
  To: linux-arm-kernel

With iommu_dma_ops in place, hook them up to the configuration code, so
IOMMU-fronted devices will get them automatically.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/dma-mapping.h | 15 +++++++--------
 arch/arm64/mm/dma-mapping.c          | 24 ++++++++++++++++++++++++
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c8933dc..81584ef 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
+	select IOMMU_DMA if IOMMU_SUPPORT
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index f0d6d0b..7f9edcb 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -56,16 +56,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
-static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				      struct iommu_ops *iommu, bool coherent)
-{
-	if (!acpi_disabled && !dev->archdata.dma_ops)
-		dev->archdata.dma_ops = dma_ops;
-
-	dev->archdata.dma_coherent = coherent;
-}
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
+#ifdef CONFIG_IOMMU_DMA
+void arch_teardown_dma_ops(struct device *dev);
+#define arch_teardown_dma_ops	arch_teardown_dma_ops
+#endif
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c735f45..175dcb2 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -952,6 +952,20 @@ out_no_domain:
 	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
 }
 
+void arch_teardown_dma_ops(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain) {
+		iommu_detach_device(domain, dev);
+		if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+			iommu_put_dma_cookie(domain);
+		if (domain->type & __IOMMU_DOMAIN_ARM64)
+			iommu_domain_free(domain);
+		dev->archdata.dma_ops = NULL;
+	}
+}
+
 #else
 
 static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
@@ -959,3 +973,13 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 { }
 
 #endif  /* CONFIG_IOMMU_DMA */
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent)
+{
+	if (!acpi_disabled && !dev->archdata.dma_ops)
+		dev->archdata.dma_ops = dma_ops;
+
+	dev->archdata.dma_coherent = coherent;
+	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/3] arm64: Add IOMMU dma_ops
  2015-07-31 17:18     ` Robin Murphy
@ 2015-08-03 17:33         ` Catalin Marinas
  -1 siblings, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2015-08-03 17:33 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Jul 31, 2015 at 06:18:28PM +0100, Robin Murphy wrote:
> Taking some inspiration from the arch/arm code, implement the
> arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.
> 
> Unfortunately the device setup code has to start out as a big ugly mess
> in order to work usefully right now, as 'proper' operation depends on
> changes to device probe and DMA configuration ordering, IOMMU groups for
> platform devices, and default domain support in arm/arm64 IOMMU drivers.
> The workarounds here need only exist until that work is finished.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

Reviewed-by: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
@ 2015-08-03 17:33         ` Catalin Marinas
  0 siblings, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2015-08-03 17:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 31, 2015 at 06:18:28PM +0100, Robin Murphy wrote:
> Taking some inspiration from the arch/arm code, implement the
> arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.
> 
> Unfortunately the device setup code has to start out as a big ugly mess
> in order to work usefully right now, as 'proper' operation depends on
> changes to device probe and DMA configuration ordering, IOMMU groups for
> platform devices, and default domain support in arm/arm64 IOMMU drivers.
> The workarounds here need only exist until that work is finished.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-31 17:18     ` Robin Murphy
@ 2015-08-03 17:40         ` Catalin Marinas
  -1 siblings, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2015-08-03 17:40 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
> Taking inspiration from the existing arch/arm code, break out some
> generic functions to interface the DMA-API to the IOMMU-API. This will
> do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

Acked-by: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-08-03 17:40         ` Catalin Marinas
  0 siblings, 0 replies; 40+ messages in thread
From: Catalin Marinas @ 2015-08-03 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
> Taking inspiration from the existing arch/arm code, break out some
> generic functions to interface the DMA-API to the IOMMU-API. This will
> do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-31 17:18     ` Robin Murphy
@ 2015-08-06 15:23         ` Will Deacon
  -1 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2015-08-06 15:23 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, Catalin Marinas,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

Joerg,

On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
> Taking inspiration from the existing arch/arm code, break out some
> generic functions to interface the DMA-API to the IOMMU-API. This will
> do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/Kconfig     |   7 +
>  drivers/iommu/Makefile    |   1 +
>  drivers/iommu/dma-iommu.c | 534 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h |  84 ++++++++
>  include/linux/iommu.h     |   1 +
>  5 files changed, 627 insertions(+)
>  create mode 100644 drivers/iommu/dma-iommu.c
>  create mode 100644 include/linux/dma-iommu.h

We're quite keen to get this in for arm64, since we're without IOMMU DMA
ops and need to get something upstream. Do you think this is likely to
be merged for 4.3/4.4 or would we be better off doing our own
arch-private implementation instead?

Sorry to pester, but we've got people basing their patches and products
on this and I don't want to end up having to support out-of-tree code.

Cheers,

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-08-06 15:23         ` Will Deacon
  0 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2015-08-06 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

Joerg,

On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
> Taking inspiration from the existing arch/arm code, break out some
> generic functions to interface the DMA-API to the IOMMU-API. This will
> do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/Kconfig     |   7 +
>  drivers/iommu/Makefile    |   1 +
>  drivers/iommu/dma-iommu.c | 534 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h |  84 ++++++++
>  include/linux/iommu.h     |   1 +
>  5 files changed, 627 insertions(+)
>  create mode 100644 drivers/iommu/dma-iommu.c
>  create mode 100644 include/linux/dma-iommu.h

We're quite keen to get this in for arm64, since we're without IOMMU DMA
ops and need to get something upstream. Do you think this is likely to
be merged for 4.3/4.4 or would we be better off doing our own
arch-private implementation instead?

Sorry to pester, but we've got people basing their patches and products
on this and I don't want to end up having to support out-of-tree code.

Cheers,

Will

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
  2015-08-06 15:23         ` Will Deacon
@ 2015-08-06 17:54             ` joro at 8bytes.org
  -1 siblings, 0 replies; 40+ messages in thread
From: joro-zLv9SwRftAIdnm+yROfE0A @ 2015-08-06 17:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, Catalin Marinas,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

Hi Will,

On Thu, Aug 06, 2015 at 04:23:27PM +0100, Will Deacon wrote:
> We're quite keen to get this in for arm64, since we're without IOMMU DMA
> ops and need to get something upstream. Do you think this is likely to
> be merged for 4.3/4.4 or would we be better off doing our own
> arch-private implementation instead?
> 
> Sorry to pester, but we've got people basing their patches and products
> on this and I don't want to end up having to support out-of-tree code.

I definitly plan to merge it soon, but not sure if its getting into
v4.3. There are a few things I have questions about or need rework, but
I am sure we can work this out.


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-08-06 17:54             ` joro at 8bytes.org
  0 siblings, 0 replies; 40+ messages in thread
From: joro at 8bytes.org @ 2015-08-06 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Thu, Aug 06, 2015 at 04:23:27PM +0100, Will Deacon wrote:
> We're quite keen to get this in for arm64, since we're without IOMMU DMA
> ops and need to get something upstream. Do you think this is likely to
> be merged for 4.3/4.4 or would we be better off doing our own
> arch-private implementation instead?
> 
> Sorry to pester, but we've got people basing their patches and products
> on this and I don't want to end up having to support out-of-tree code.

I definitly plan to merge it soon, but not sure if its getting into
v4.3. There are a few things I have questions about or need rework, but
I am sure we can work this out.


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-31 17:18     ` Robin Murphy
@ 2015-08-07  8:42         ` Joerg Roedel
  -1 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-07  8:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
> +int iommu_get_dma_cookie(struct iommu_domain *domain)
> +{
> +	struct iova_domain *iovad;
> +
> +	if (domain->dma_api_cookie)
> +		return -EEXIST;

Why do you call that dma_api_cookie? It is just a pointer to an iova
allocator, you can just name it as such, like domain->iova.

> +static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
> +		dma_addr_t dma_limit)

I think you also need a struct device here to take segment boundary and
dma_mask into account.

> +/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
> +static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
> +{
> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	unsigned long shift = iova_shift(iovad);
> +	unsigned long pfn = dma_addr >> shift;
> +	struct iova *iova = find_iova(iovad, pfn);
> +	size_t size = iova_size(iova) << shift;
> +
> +	/* ...and if we can't, then something is horribly, horribly wrong */
> +	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);

This is a WARN_ON at most, not a BUG_ON condition, especially since this
type of bug is also catched with the dma-api debugging code.

> +static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
> +{
> +	struct page **pages;
> +	unsigned int i = 0, array_size = count * sizeof(*pages);
> +
> +	if (array_size <= PAGE_SIZE)
> +		pages = kzalloc(array_size, GFP_KERNEL);
> +	else
> +		pages = vzalloc(array_size);
> +	if (!pages)
> +		return NULL;
> +
> +	/* IOMMU can map any pages, so himem can also be used here */
> +	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
> +
> +	while (count) {
> +		struct page *page = NULL;
> +		int j, order = __fls(count);
> +
> +		/*
> +		 * Higher-order allocations are a convenience rather
> +		 * than a necessity, hence using __GFP_NORETRY until
> +		 * falling back to single-page allocations.
> +		 */
> +		for (order = min(order, MAX_ORDER); order > 0; order--) {
> +			page = alloc_pages(gfp | __GFP_NORETRY, order);
> +			if (!page)
> +				continue;
> +			if (PageCompound(page)) {
> +				if (!split_huge_page(page))
> +					break;
> +				__free_pages(page, order);
> +			} else {
> +				split_page(page, order);
> +				break;
> +			}
> +		}
> +		if (!page)
> +			page = alloc_page(gfp);
> +		if (!page) {
> +			__iommu_dma_free_pages(pages, i);
> +			return NULL;
> +		}
> +		j = 1 << order;
> +		count -= j;
> +		while (j--)
> +			pages[i++] = page++;
> +	}
> +	return pages;
> +}

Hmm, most dma-api implementation just try to allocate a big enough
region from the page-alloctor. Is it implemented different here to avoid
the use of CMA?


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-08-07  8:42         ` Joerg Roedel
  0 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-07  8:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
> +int iommu_get_dma_cookie(struct iommu_domain *domain)
> +{
> +	struct iova_domain *iovad;
> +
> +	if (domain->dma_api_cookie)
> +		return -EEXIST;

Why do you call that dma_api_cookie? It is just a pointer to an iova
allocator, you can just name it as such, like domain->iova.

> +static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
> +		dma_addr_t dma_limit)

I think you also need a struct device here to take segment boundary and
dma_mask into account.

> +/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
> +static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
> +{
> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	unsigned long shift = iova_shift(iovad);
> +	unsigned long pfn = dma_addr >> shift;
> +	struct iova *iova = find_iova(iovad, pfn);
> +	size_t size = iova_size(iova) << shift;
> +
> +	/* ...and if we can't, then something is horribly, horribly wrong */
> +	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);

This is a WARN_ON@most, not a BUG_ON condition, especially since this
type of bug is also catched with the dma-api debugging code.

> +static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
> +{
> +	struct page **pages;
> +	unsigned int i = 0, array_size = count * sizeof(*pages);
> +
> +	if (array_size <= PAGE_SIZE)
> +		pages = kzalloc(array_size, GFP_KERNEL);
> +	else
> +		pages = vzalloc(array_size);
> +	if (!pages)
> +		return NULL;
> +
> +	/* IOMMU can map any pages, so himem can also be used here */
> +	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
> +
> +	while (count) {
> +		struct page *page = NULL;
> +		int j, order = __fls(count);
> +
> +		/*
> +		 * Higher-order allocations are a convenience rather
> +		 * than a necessity, hence using __GFP_NORETRY until
> +		 * falling back to single-page allocations.
> +		 */
> +		for (order = min(order, MAX_ORDER); order > 0; order--) {
> +			page = alloc_pages(gfp | __GFP_NORETRY, order);
> +			if (!page)
> +				continue;
> +			if (PageCompound(page)) {
> +				if (!split_huge_page(page))
> +					break;
> +				__free_pages(page, order);
> +			} else {
> +				split_page(page, order);
> +				break;
> +			}
> +		}
> +		if (!page)
> +			page = alloc_page(gfp);
> +		if (!page) {
> +			__iommu_dma_free_pages(pages, i);
> +			return NULL;
> +		}
> +		j = 1 << order;
> +		count -= j;
> +		while (j--)
> +			pages[i++] = page++;
> +	}
> +	return pages;
> +}

Hmm, most dma-api implementation just try to allocate a big enough
region from the page-alloctor. Is it implemented different here to avoid
the use of CMA?


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/3] arm64: Add IOMMU dma_ops
  2015-07-31 17:18     ` Robin Murphy
@ 2015-08-07  8:52         ` Joerg Roedel
  -1 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-07  8:52 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

On Fri, Jul 31, 2015 at 06:18:28PM +0100, Robin Murphy wrote:
> +/*
> + * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
> + * everything it needs to - the device isn't yet fully created, and the
> + * IOMMU driver hasn't seen it yet, so we need this delayed attachment
> + * dance. Once IOMMU probe ordering is sorted to move the
> + * arch_setup_dma_ops() call later, all the notifier bits below become
> + * unnecessary, and will go away.
> + */
> +struct iommu_dma_notifier_data {
> +	struct list_head list;
> +	struct device *dev;
> +	struct iommu_domain *dma_domain;
> +};
> +static LIST_HEAD(iommu_dma_masters);
> +static DEFINE_MUTEX(iommu_dma_notifier_lock);

Ugh, thats incredibly ugly. Why can't you do the setup work then the
iommu driver sees the device? Just call the dma-api setup functions
there (like the x86 iommu drivers do it too) and be done without any
notifiers.

> +static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> +				  const struct iommu_ops *ops)
> +{
> +	struct iommu_domain *domain;
> +	int err;
> +
> +	if (!ops)
> +		return;
> +	/*
> +	 * In a perfect world, everything happened in the right order up to
> +	 * here, and the IOMMU core has already attached the device to an
> +	 * appropriate default domain for us to set up...
> +	 */
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain) {
> +		/*
> +		 * Urgh. Reliable default domains for platform devices can't
> +		 * happen anyway without some sensible way of handling
> +		 * non-trivial groups. So until then, HORRIBLE HACKS!
> +		 */

I don't get this, what is preventing to rely on default domains here?

> +		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);

The IOMMU core should already tried to allocate an IOMMU_DOMAIN_DMA type
domain. No need to try this again here.



	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
@ 2015-08-07  8:52         ` Joerg Roedel
  0 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-07  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 31, 2015 at 06:18:28PM +0100, Robin Murphy wrote:
> +/*
> + * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
> + * everything it needs to - the device isn't yet fully created, and the
> + * IOMMU driver hasn't seen it yet, so we need this delayed attachment
> + * dance. Once IOMMU probe ordering is sorted to move the
> + * arch_setup_dma_ops() call later, all the notifier bits below become
> + * unnecessary, and will go away.
> + */
> +struct iommu_dma_notifier_data {
> +	struct list_head list;
> +	struct device *dev;
> +	struct iommu_domain *dma_domain;
> +};
> +static LIST_HEAD(iommu_dma_masters);
> +static DEFINE_MUTEX(iommu_dma_notifier_lock);

Ugh, thats incredibly ugly. Why can't you do the setup work then the
iommu driver sees the device? Just call the dma-api setup functions
there (like the x86 iommu drivers do it too) and be done without any
notifiers.

> +static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> +				  const struct iommu_ops *ops)
> +{
> +	struct iommu_domain *domain;
> +	int err;
> +
> +	if (!ops)
> +		return;
> +	/*
> +	 * In a perfect world, everything happened in the right order up to
> +	 * here, and the IOMMU core has already attached the device to an
> +	 * appropriate default domain for us to set up...
> +	 */
> +	domain = iommu_get_domain_for_dev(dev);
> +	if (!domain) {
> +		/*
> +		 * Urgh. Reliable default domains for platform devices can't
> +		 * happen anyway without some sensible way of handling
> +		 * non-trivial groups. So until then, HORRIBLE HACKS!
> +		 */

I don't get this, what is preventing to rely on default domains here?

> +		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);

The IOMMU core should already tried to allocate an IOMMU_DOMAIN_DMA type
domain. No need to try this again here.



	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/3] arm64: Hook up IOMMU dma_ops
  2015-07-31 17:18     ` Robin Murphy
@ 2015-08-07  8:55         ` Joerg Roedel
  -1 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-07  8:55 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

On Fri, Jul 31, 2015 at 06:18:29PM +0100, Robin Murphy wrote:
> +void arch_teardown_dma_ops(struct device *dev)
> +{
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (domain) {
> +		iommu_detach_device(domain, dev);
> +		if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
> +			iommu_put_dma_cookie(domain);
> +		if (domain->type & __IOMMU_DOMAIN_ARM64)
> +			iommu_domain_free(domain);
> +		dev->archdata.dma_ops = NULL;
> +	}
> +}

When is this called? In case a device gets removed the IOMMU core and
driver should take care of destroying the domain, at least when its a
default-domain. This is part of the iommu-group handling.


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 3/3] arm64: Hook up IOMMU dma_ops
@ 2015-08-07  8:55         ` Joerg Roedel
  0 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-07  8:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 31, 2015 at 06:18:29PM +0100, Robin Murphy wrote:
> +void arch_teardown_dma_ops(struct device *dev)
> +{
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (domain) {
> +		iommu_detach_device(domain, dev);
> +		if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
> +			iommu_put_dma_cookie(domain);
> +		if (domain->type & __IOMMU_DOMAIN_ARM64)
> +			iommu_domain_free(domain);
> +		dev->archdata.dma_ops = NULL;
> +	}
> +}

When is this called? In case a device gets removed the IOMMU core and
driver should take care of destroying the domain, at least when its a
default-domain. This is part of the iommu-group handling.


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
  2015-08-07  8:42         ` Joerg Roedel
@ 2015-08-07 13:38             ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-08-07 13:38 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, Catalin Marinas, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

Hi Joerg,

Thanks for taking a look,

On 07/08/15 09:42, Joerg Roedel wrote:
> On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
>> +int iommu_get_dma_cookie(struct iommu_domain *domain)
>> +{
>> +	struct iova_domain *iovad;
>> +
>> +	if (domain->dma_api_cookie)
>> +		return -EEXIST;
>
> Why do you call that dma_api_cookie? It is just a pointer to an iova
> allocator, you can just name it as such, like domain->iova.

Sure, it was more the case that since it had to be in the top-level 
generic domain structure, I didn't want it to be too 
implementation-specific. I figured this was a reasonable compromise that 
wouldn't be a waste of space for implementations with different 
per-domain DMA API data - e.g. the AMD IOMMU driver could then make use 
of protection_domain->domain->dma_api_cookie instead of having 
protection_domain->priv, but that's a patch that wouldn't belong in this 
series anyway.

If you really hate that idea, then yeah, let's just call it iova and 
consider if factoring out redundancy is still applicable later.

>> +static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
>> +		dma_addr_t dma_limit)
>
> I think you also need a struct device here to take segment boundary and
> dma_mask into account.

To the best of my understanding, those limits are only relevant when 
actually handing off a scatterlist to a client device doing hardware 
scatter-gather operations, so it's not so much the IOVA allocation that 
matters, but where the segments lie within it when handling dma_map_sg.

However, you do raise a good point - in the current "map everything 
consecutively" approach, if there is a non-power-of-2-sized segment in 
the middle of a scatterlist, then subsequent segments could possibly end 
up inadvertently straddling a boundary. That needs handling in 
iommu_dma_map_sg; I'll fix it appropriately.

>> +/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
>> +static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
>> +{
>> +	struct iova_domain *iovad = domain->dma_api_cookie;
>> +	unsigned long shift = iova_shift(iovad);
>> +	unsigned long pfn = dma_addr >> shift;
>> +	struct iova *iova = find_iova(iovad, pfn);
>> +	size_t size = iova_size(iova) << shift;
>> +
>> +	/* ...and if we can't, then something is horribly, horribly wrong */
>> +	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
>
> This is a WARN_ON at most, not a BUG_ON condition, especially since this
> type of bug is also catched with the dma-api debugging code.

Indeed, DMA_DEBUG will check that a driver is making DMA API calls to 
the arch code in the right way; this is a different check, to catch 
things like the arch code passing the wrong domain into this layer, or 
someone else having messed directly with the domain via the IOMMU API. 
If the iommu_unmap doesn't match the IOVA region we looked up, that 
means the IOMMU page tables have somehow become inconsistent with the 
IOVA allocator, so we are in an unrecoverable situation where we can no 
longer be sure what devices have access to. That's bad.

>> +static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
>> +{
>> +	struct page **pages;
>> +	unsigned int i = 0, array_size = count * sizeof(*pages);
>> +
>> +	if (array_size <= PAGE_SIZE)
>> +		pages = kzalloc(array_size, GFP_KERNEL);
>> +	else
>> +		pages = vzalloc(array_size);
>> +	if (!pages)
>> +		return NULL;
>> +
>> +	/* IOMMU can map any pages, so himem can also be used here */
>> +	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
>> +
>> +	while (count) {
>> +		struct page *page = NULL;
>> +		int j, order = __fls(count);
>> +
>> +		/*
>> +		 * Higher-order allocations are a convenience rather
>> +		 * than a necessity, hence using __GFP_NORETRY until
>> +		 * falling back to single-page allocations.
>> +		 */
>> +		for (order = min(order, MAX_ORDER); order > 0; order--) {
>> +			page = alloc_pages(gfp | __GFP_NORETRY, order);
>> +			if (!page)
>> +				continue;
>> +			if (PageCompound(page)) {
>> +				if (!split_huge_page(page))
>> +					break;
>> +				__free_pages(page, order);
>> +			} else {
>> +				split_page(page, order);
>> +				break;
>> +			}
>> +		}
>> +		if (!page)
>> +			page = alloc_page(gfp);
>> +		if (!page) {
>> +			__iommu_dma_free_pages(pages, i);
>> +			return NULL;
>> +		}
>> +		j = 1 << order;
>> +		count -= j;
>> +		while (j--)
>> +			pages[i++] = page++;
>> +	}
>> +	return pages;
>> +}
>
> Hmm, most dma-api implementation just try to allocate a big enough
> region from the page-alloctor. Is it implemented different here to avoid
> the use of CMA?

AFAIK, yes (this is just a slight tidyup of the existing code that 
32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the 
display guys want increasingly massive contiguous allocations for 
framebuffers, layers, etc., so having IOMMU magic deal with that saves 
CMA for non-IOMMU devices that really need it.

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-08-07 13:38             ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-08-07 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joerg,

Thanks for taking a look,

On 07/08/15 09:42, Joerg Roedel wrote:
> On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
>> +int iommu_get_dma_cookie(struct iommu_domain *domain)
>> +{
>> +	struct iova_domain *iovad;
>> +
>> +	if (domain->dma_api_cookie)
>> +		return -EEXIST;
>
> Why do you call that dma_api_cookie? It is just a pointer to an iova
> allocator, you can just name it as such, like domain->iova.

Sure, it was more the case that since it had to be in the top-level 
generic domain structure, I didn't want it to be too 
implementation-specific. I figured this was a reasonable compromise that 
wouldn't be a waste of space for implementations with different 
per-domain DMA API data - e.g. the AMD IOMMU driver could then make use 
of protection_domain->domain->dma_api_cookie instead of having 
protection_domain->priv, but that's a patch that wouldn't belong in this 
series anyway.

If you really hate that idea, then yeah, let's just call it iova and 
consider if factoring out redundancy is still applicable later.

>> +static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
>> +		dma_addr_t dma_limit)
>
> I think you also need a struct device here to take segment boundary and
> dma_mask into account.

To the best of my understanding, those limits are only relevant when 
actually handing off a scatterlist to a client device doing hardware 
scatter-gather operations, so it's not so much the IOVA allocation that 
matters, but where the segments lie within it when handling dma_map_sg.

However, you do raise a good point - in the current "map everything 
consecutively" approach, if there is a non-power-of-2-sized segment in 
the middle of a scatterlist, then subsequent segments could possibly end 
up inadvertently straddling a boundary. That needs handling in 
iommu_dma_map_sg; I'll fix it appropriately.

>> +/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
>> +static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
>> +{
>> +	struct iova_domain *iovad = domain->dma_api_cookie;
>> +	unsigned long shift = iova_shift(iovad);
>> +	unsigned long pfn = dma_addr >> shift;
>> +	struct iova *iova = find_iova(iovad, pfn);
>> +	size_t size = iova_size(iova) << shift;
>> +
>> +	/* ...and if we can't, then something is horribly, horribly wrong */
>> +	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
>
> This is a WARN_ON at most, not a BUG_ON condition, especially since this
> type of bug is also catched with the dma-api debugging code.

Indeed, DMA_DEBUG will check that a driver is making DMA API calls to 
the arch code in the right way; this is a different check, to catch 
things like the arch code passing the wrong domain into this layer, or 
someone else having messed directly with the domain via the IOMMU API. 
If the iommu_unmap doesn't match the IOVA region we looked up, that 
means the IOMMU page tables have somehow become inconsistent with the 
IOVA allocator, so we are in an unrecoverable situation where we can no 
longer be sure what devices have access to. That's bad.

>> +static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
>> +{
>> +	struct page **pages;
>> +	unsigned int i = 0, array_size = count * sizeof(*pages);
>> +
>> +	if (array_size <= PAGE_SIZE)
>> +		pages = kzalloc(array_size, GFP_KERNEL);
>> +	else
>> +		pages = vzalloc(array_size);
>> +	if (!pages)
>> +		return NULL;
>> +
>> +	/* IOMMU can map any pages, so himem can also be used here */
>> +	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
>> +
>> +	while (count) {
>> +		struct page *page = NULL;
>> +		int j, order = __fls(count);
>> +
>> +		/*
>> +		 * Higher-order allocations are a convenience rather
>> +		 * than a necessity, hence using __GFP_NORETRY until
>> +		 * falling back to single-page allocations.
>> +		 */
>> +		for (order = min(order, MAX_ORDER); order > 0; order--) {
>> +			page = alloc_pages(gfp | __GFP_NORETRY, order);
>> +			if (!page)
>> +				continue;
>> +			if (PageCompound(page)) {
>> +				if (!split_huge_page(page))
>> +					break;
>> +				__free_pages(page, order);
>> +			} else {
>> +				split_page(page, order);
>> +				break;
>> +			}
>> +		}
>> +		if (!page)
>> +			page = alloc_page(gfp);
>> +		if (!page) {
>> +			__iommu_dma_free_pages(pages, i);
>> +			return NULL;
>> +		}
>> +		j = 1 << order;
>> +		count -= j;
>> +		while (j--)
>> +			pages[i++] = page++;
>> +	}
>> +	return pages;
>> +}
>
> Hmm, most dma-api implementation just try to allocate a big enough
> region from the page-alloctor. Is it implemented different here to avoid
> the use of CMA?

AFAIK, yes (this is just a slight tidyup of the existing code that 
32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the 
display guys want increasingly massive contiguous allocations for 
framebuffers, layers, etc., so having IOMMU magic deal with that saves 
CMA for non-IOMMU devices that really need it.

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/3] arm64: Add IOMMU dma_ops
  2015-08-07  8:52         ` Joerg Roedel
@ 2015-08-07 15:27             ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-08-07 15:27 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, Catalin Marinas, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

On 07/08/15 09:52, Joerg Roedel wrote:
> On Fri, Jul 31, 2015 at 06:18:28PM +0100, Robin Murphy wrote:
>> +/*
>> + * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
>> + * everything it needs to - the device isn't yet fully created, and the
>> + * IOMMU driver hasn't seen it yet, so we need this delayed attachment
>> + * dance. Once IOMMU probe ordering is sorted to move the
>> + * arch_setup_dma_ops() call later, all the notifier bits below become
>> + * unnecessary, and will go away.
>> + */
>> +struct iommu_dma_notifier_data {
>> +	struct list_head list;
>> +	struct device *dev;
>> +	struct iommu_domain *dma_domain;
>> +};
>> +static LIST_HEAD(iommu_dma_masters);
>> +static DEFINE_MUTEX(iommu_dma_notifier_lock);
>
> Ugh, thats incredibly ugly. Why can't you do the setup work then the
> iommu driver sees the device? Just call the dma-api setup functions
> there (like the x86 iommu drivers do it too) and be done without any
> notifiers.

As per the comments, the issue here lies in the order in which the 
OF/driver core code currently calls things for platform devices: as it 
stands we can't attach the device to a domain in arch_setup_dma_ops() 
because it doesn't have a group, and we can't even add it to a group 
ourselves because it isn't fully created and doesn't exist in sysfs yet. 
The only reason arch/arm is currently getting away without this 
workaround is that the few IOMMU drivers there hooked up to the generic 
infrastructure don't actually mind that they get an attach_device from 
arch_setup_dma_ops() before they've even seen an add_device (largely 
because they don't care about groups).

Laurent's probe-deferral series largely solves these problems in the 
right place - adding identical boilerplate code to every IOMMU driver to 
do something they shouldn't have to know about (and don't necessarily 
have all the right information for) is exactly what we don't want to do. 
As discussed over on another thread, I'm planning to pick that series up 
and polish it off after this, but my top priority is getting the basic 
dma_ops functionality into arm64 that people need right now. I will be 
only too happy when I can submit the patch removing this notifier 
workaround ;)

>> +static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>> +				  const struct iommu_ops *ops)
>> +{
>> +	struct iommu_domain *domain;
>> +	int err;
>> +
>> +	if (!ops)
>> +		return;
>> +	/*
>> +	 * In a perfect world, everything happened in the right order up to
>> +	 * here, and the IOMMU core has already attached the device to an
>> +	 * appropriate default domain for us to set up...
>> +	 */
>> +	domain = iommu_get_domain_for_dev(dev);
>> +	if (!domain) {
>> +		/*
>> +		 * Urgh. Reliable default domains for platform devices can't
>> +		 * happen anyway without some sensible way of handling
>> +		 * non-trivial groups. So until then, HORRIBLE HACKS!
>> +		 */
>
> I don't get this, what is preventing to rely on default domains here?

No driver other than the AMD IOMMU has any support yet. Support for 
IOMMU_DOMAIN_DMA can easily be added to existing drivers based on patch 
1 of this series, but more generally it's not entirely clear how default 
domains are going to work beyond x86. On systems like Juno or Seattle 
with different sets of masters behind different IOMMU instances (with 
limited domain contexts each), the most sensible approach would be to 
have a single default domain per IOMMU (spanning domains across 
instances would need some hideous synchronisation logic for some 
implementations), but the current domain_alloc interface gives us no way 
to do that. On something like RK3288 with two different types of IOMMU 
on the platform "bus", it breaks down even further as there's no way to 
guarantee that iommu_domain_alloc() even gives you a domain from the 
right *driver* (hence bypassing it by going through ops directly here).

>
>> +		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
>
> The IOMMU core should already tried to allocate an IOMMU_DOMAIN_DMA type
> domain. No need to try this again here.

Only for PCI devices, via iommu_group_get_for_pci_dev(). The code here, 
however, only runs for platform devices - ops will be always null for a 
PCI device since of_iommu_configure() will have bailed out (see the 
silly warning removed by my patch you picked up the other day). Once 
iommu_group_get_for_dev() supports platform devices, this too can go 
away. In the meantime if someone adds PCI support to 
of_iommu_configure() and IOMMU_DOMAIN_DMA support to their IOMMU driver, 
then we'll get a domain back from iommu_get_domain_for_dev() and just 
use that.

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
@ 2015-08-07 15:27             ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-08-07 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/08/15 09:52, Joerg Roedel wrote:
> On Fri, Jul 31, 2015 at 06:18:28PM +0100, Robin Murphy wrote:
>> +/*
>> + * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
>> + * everything it needs to - the device isn't yet fully created, and the
>> + * IOMMU driver hasn't seen it yet, so we need this delayed attachment
>> + * dance. Once IOMMU probe ordering is sorted to move the
>> + * arch_setup_dma_ops() call later, all the notifier bits below become
>> + * unnecessary, and will go away.
>> + */
>> +struct iommu_dma_notifier_data {
>> +	struct list_head list;
>> +	struct device *dev;
>> +	struct iommu_domain *dma_domain;
>> +};
>> +static LIST_HEAD(iommu_dma_masters);
>> +static DEFINE_MUTEX(iommu_dma_notifier_lock);
>
> Ugh, thats incredibly ugly. Why can't you do the setup work then the
> iommu driver sees the device? Just call the dma-api setup functions
> there (like the x86 iommu drivers do it too) and be done without any
> notifiers.

As per the comments, the issue here lies in the order in which the 
OF/driver core code currently calls things for platform devices: as it 
stands we can't attach the device to a domain in arch_setup_dma_ops() 
because it doesn't have a group, and we can't even add it to a group 
ourselves because it isn't fully created and doesn't exist in sysfs yet. 
The only reason arch/arm is currently getting away without this 
workaround is that the few IOMMU drivers there hooked up to the generic 
infrastructure don't actually mind that they get an attach_device from 
arch_setup_dma_ops() before they've even seen an add_device (largely 
because they don't care about groups).

Laurent's probe-deferral series largely solves these problems in the 
right place - adding identical boilerplate code to every IOMMU driver to 
do something they shouldn't have to know about (and don't necessarily 
have all the right information for) is exactly what we don't want to do. 
As discussed over on another thread, I'm planning to pick that series up 
and polish it off after this, but my top priority is getting the basic 
dma_ops functionality into arm64 that people need right now. I will be 
only too happy when I can submit the patch removing this notifier 
workaround ;)

>> +static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>> +				  const struct iommu_ops *ops)
>> +{
>> +	struct iommu_domain *domain;
>> +	int err;
>> +
>> +	if (!ops)
>> +		return;
>> +	/*
>> +	 * In a perfect world, everything happened in the right order up to
>> +	 * here, and the IOMMU core has already attached the device to an
>> +	 * appropriate default domain for us to set up...
>> +	 */
>> +	domain = iommu_get_domain_for_dev(dev);
>> +	if (!domain) {
>> +		/*
>> +		 * Urgh. Reliable default domains for platform devices can't
>> +		 * happen anyway without some sensible way of handling
>> +		 * non-trivial groups. So until then, HORRIBLE HACKS!
>> +		 */
>
> I don't get this, what is preventing to rely on default domains here?

No driver other than the AMD IOMMU has any support yet. Support for 
IOMMU_DOMAIN_DMA can easily be added to existing drivers based on patch 
1 of this series, but more generally it's not entirely clear how default 
domains are going to work beyond x86. On systems like Juno or Seattle 
with different sets of masters behind different IOMMU instances (with 
limited domain contexts each), the most sensible approach would be to 
have a single default domain per IOMMU (spanning domains across 
instances would need some hideous synchronisation logic for some 
implementations), but the current domain_alloc interface gives us no way 
to do that. On something like RK3288 with two different types of IOMMU 
on the platform "bus", it breaks down even further as there's no way to 
guarantee that iommu_domain_alloc() even gives you a domain from the 
right *driver* (hence bypassing it by going through ops directly here).

>
>> +		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
>
> The IOMMU core should already tried to allocate an IOMMU_DOMAIN_DMA type
> domain. No need to try this again here.

Only for PCI devices, via iommu_group_get_for_pci_dev(). The code here, 
however, only runs for platform devices - ops will be always null for a 
PCI device since of_iommu_configure() will have bailed out (see the 
silly warning removed by my patch you picked up the other day). Once 
iommu_group_get_for_dev() supports platform devices, this too can go 
away. In the meantime if someone adds PCI support to 
of_iommu_configure() and IOMMU_DOMAIN_DMA support to their IOMMU driver, 
then we'll get a domain back from iommu_get_domain_for_dev() and just 
use that.

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
  2015-08-07 13:38             ` Robin Murphy
@ 2015-08-11  9:37                 ` Joerg Roedel
  -1 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-11  9:37 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, Catalin Marinas, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote:
> Indeed, DMA_DEBUG will check that a driver is making DMA API calls
> to the arch code in the right way; this is a different check, to
> catch things like the arch code passing the wrong domain into this
> layer, or someone else having messed directly with the domain via
> the IOMMU API. If the iommu_unmap doesn't match the IOVA region we
> looked up, that means the IOMMU page tables have somehow become
> inconsistent with the IOVA allocator, so we are in an unrecoverable
> situation where we can no longer be sure what devices have access
> to. That's bad.

Sure, but the BUG_ON would also trigger on things like a double-free,
which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient.

> AFAIK, yes (this is just a slight tidyup of the existing code that
> 32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the
> display guys want increasingly massive contiguous allocations for
> framebuffers, layers, etc., so having IOMMU magic deal with that
> saves CMA for non-IOMMU devices that really need it.

Makes sense, I thougt about something similar for x86 too to avoid the
high-order allocations we currently do. I guess the buffer will later be
mapped into the vmalloc space for the CPU?


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-08-11  9:37                 ` Joerg Roedel
  0 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-11  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote:
> Indeed, DMA_DEBUG will check that a driver is making DMA API calls
> to the arch code in the right way; this is a different check, to
> catch things like the arch code passing the wrong domain into this
> layer, or someone else having messed directly with the domain via
> the IOMMU API. If the iommu_unmap doesn't match the IOVA region we
> looked up, that means the IOMMU page tables have somehow become
> inconsistent with the IOVA allocator, so we are in an unrecoverable
> situation where we can no longer be sure what devices have access
> to. That's bad.

Sure, but the BUG_ON would also trigger on things like a double-free,
which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient.

> AFAIK, yes (this is just a slight tidyup of the existing code that
> 32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the
> display guys want increasingly massive contiguous allocations for
> framebuffers, layers, etc., so having IOMMU magic deal with that
> saves CMA for non-IOMMU devices that really need it.

Makes sense, I thougt about something similar for x86 too to avoid the
high-order allocations we currently do. I guess the buffer will later be
mapped into the vmalloc space for the CPU?


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/3] arm64: Add IOMMU dma_ops
  2015-08-07 15:27             ` Robin Murphy
@ 2015-08-11  9:49                 ` Joerg Roedel
  -1 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-11  9:49 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, Catalin Marinas, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

On Fri, Aug 07, 2015 at 04:27:56PM +0100, Robin Murphy wrote:
> As per the comments, the issue here lies in the order in which the
> OF/driver core code currently calls things for platform devices: as
> it stands we can't attach the device to a domain in
> arch_setup_dma_ops() because it doesn't have a group, and we can't
> even add it to a group ourselves because it isn't fully created and
> doesn't exist in sysfs yet. The only reason arch/arm is currently
> getting away without this workaround is that the few IOMMU drivers
> there hooked up to the generic infrastructure don't actually mind
> that they get an attach_device from arch_setup_dma_ops() before
> they've even seen an add_device (largely because they don't care
> about groups).

Hmm, what about just registering the iommu-driver in arch_setup_dma_ops
with bus_set_iommu and not care about devices? The code will iterate
over the devices already on the bus and tries to attach them to the
iommu driver. But as you said, the devices are not yet on the bus, so
when a device is later added by the OF/driver core code you can do
everything needed for the device in the add_device call-back.

This might include initializing the hardware iommu needed for the
device and setting its per-device dma_ops.

> Laurent's probe-deferral series largely solves these problems in the
> right place - adding identical boilerplate code to every IOMMU
> driver to do something they shouldn't have to know about (and don't
> necessarily have all the right information for) is exactly what we
> don't want to do. As discussed over on another thread, I'm planning
> to pick that series up and polish it off after this, but my top
> priority is getting the basic dma_ops functionality into arm64 that
> people need right now. I will be only too happy when I can submit
> the patch removing this notifier workaround ;)

I've experienced it often that someone promises me to fix things later,
but that it doesn't happen then, so please understand that I am pretty
cautious about adding such hacks ;)

> No driver other than the AMD IOMMU has any support yet. Support for
> IOMMU_DOMAIN_DMA can easily be added to existing drivers based on
> patch 1 of this series, but more generally it's not entirely clear
> how default domains are going to work beyond x86. On systems like
> Juno or Seattle with different sets of masters behind different
> IOMMU instances (with limited domain contexts each), the most
> sensible approach would be to have a single default domain per IOMMU
> (spanning domains across instances would need some hideous
> synchronisation logic for some implementations), but the current
> domain_alloc interface gives us no way to do that. On something like
> RK3288 with two different types of IOMMU on the platform "bus", it
> breaks down even further as there's no way to guarantee that
> iommu_domain_alloc() even gives you a domain from the right *driver*
> (hence bypassing it by going through ops directly here).

Default domain allocation comes down to how the bus organizes its
iommu-groups. For every group (at least in its current design) a default
domain is allocated. An a group is typically only behind a single iommu
instance.

> Only for PCI devices, via iommu_group_get_for_pci_dev(). The code
> here, however, only runs for platform devices - ops will be always
> null for a PCI device since of_iommu_configure() will have bailed
> out (see the silly warning removed by my patch you picked up the
> other day). Once iommu_group_get_for_dev() supports platform
> devices, this too can go away. In the meantime if someone adds PCI
> support to of_iommu_configure() and IOMMU_DOMAIN_DMA support to
> their IOMMU driver, then we'll get a domain back from
> iommu_get_domain_for_dev() and just use that.

What is the plan for getting an iommu-groups implementation for the
platform bus?


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
@ 2015-08-11  9:49                 ` Joerg Roedel
  0 siblings, 0 replies; 40+ messages in thread
From: Joerg Roedel @ 2015-08-11  9:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Aug 07, 2015 at 04:27:56PM +0100, Robin Murphy wrote:
> As per the comments, the issue here lies in the order in which the
> OF/driver core code currently calls things for platform devices: as
> it stands we can't attach the device to a domain in
> arch_setup_dma_ops() because it doesn't have a group, and we can't
> even add it to a group ourselves because it isn't fully created and
> doesn't exist in sysfs yet. The only reason arch/arm is currently
> getting away without this workaround is that the few IOMMU drivers
> there hooked up to the generic infrastructure don't actually mind
> that they get an attach_device from arch_setup_dma_ops() before
> they've even seen an add_device (largely because they don't care
> about groups).

Hmm, what about just registering the iommu-driver in arch_setup_dma_ops
with bus_set_iommu and not care about devices? The code will iterate
over the devices already on the bus and tries to attach them to the
iommu driver. But as you said, the devices are not yet on the bus, so
when a device is later added by the OF/driver core code you can do
everything needed for the device in the add_device call-back.

This might include initializing the hardware iommu needed for the
device and setting its per-device dma_ops.

> Laurent's probe-deferral series largely solves these problems in the
> right place - adding identical boilerplate code to every IOMMU
> driver to do something they shouldn't have to know about (and don't
> necessarily have all the right information for) is exactly what we
> don't want to do. As discussed over on another thread, I'm planning
> to pick that series up and polish it off after this, but my top
> priority is getting the basic dma_ops functionality into arm64 that
> people need right now. I will be only too happy when I can submit
> the patch removing this notifier workaround ;)

I've experienced it often that someone promises me to fix things later,
but that it doesn't happen then, so please understand that I am pretty
cautious about adding such hacks ;)

> No driver other than the AMD IOMMU has any support yet. Support for
> IOMMU_DOMAIN_DMA can easily be added to existing drivers based on
> patch 1 of this series, but more generally it's not entirely clear
> how default domains are going to work beyond x86. On systems like
> Juno or Seattle with different sets of masters behind different
> IOMMU instances (with limited domain contexts each), the most
> sensible approach would be to have a single default domain per IOMMU
> (spanning domains across instances would need some hideous
> synchronisation logic for some implementations), but the current
> domain_alloc interface gives us no way to do that. On something like
> RK3288 with two different types of IOMMU on the platform "bus", it
> breaks down even further as there's no way to guarantee that
> iommu_domain_alloc() even gives you a domain from the right *driver*
> (hence bypassing it by going through ops directly here).

Default domain allocation comes down to how the bus organizes its
iommu-groups. For every group (at least in its current design) a default
domain is allocated. An a group is typically only behind a single iommu
instance.

> Only for PCI devices, via iommu_group_get_for_pci_dev(). The code
> here, however, only runs for platform devices - ops will be always
> null for a PCI device since of_iommu_configure() will have bailed
> out (see the silly warning removed by my patch you picked up the
> other day). Once iommu_group_get_for_dev() supports platform
> devices, this too can go away. In the meantime if someone adds PCI
> support to of_iommu_configure() and IOMMU_DOMAIN_DMA support to
> their IOMMU driver, then we'll get a domain back from
> iommu_get_domain_for_dev() and just use that.

What is the plan for getting an iommu-groups implementation for the
platform bus?


	Joerg

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
  2015-08-11  9:37                 ` Joerg Roedel
@ 2015-08-11 13:31                     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-08-11 13:31 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, Catalin Marinas, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

Hi Joerg,

On 11/08/15 10:37, Joerg Roedel wrote:
> On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote:
>> Indeed, DMA_DEBUG will check that a driver is making DMA API calls
>> to the arch code in the right way; this is a different check, to
>> catch things like the arch code passing the wrong domain into this
>> layer, or someone else having messed directly with the domain via
>> the IOMMU API. If the iommu_unmap doesn't match the IOVA region we
>> looked up, that means the IOMMU page tables have somehow become
>> inconsistent with the IOVA allocator, so we are in an unrecoverable
>> situation where we can no longer be sure what devices have access
>> to. That's bad.
>
> Sure, but the BUG_ON would also trigger on things like a double-free,
> which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient.

Oh dear, it gets even better than that; in the case of a simple 
double-unmap where the IOVA is already free, we wouldn't even get as far 
as that check because we'd die calling iova_size(NULL). How on Earth did 
I get to v5 without spotting that? :(

Anyway, on reflection I think you're probably right; I've clearly been 
working on this for long enough to start falling into the "my thing is 
obviously more important than all the other things" trap.

>> AFAIK, yes (this is just a slight tidyup of the existing code that
>> 32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the
>> display guys want increasingly massive contiguous allocations for
>> framebuffers, layers, etc., so having IOMMU magic deal with that
>> saves CMA for non-IOMMU devices that really need it.
>
> Makes sense, I thougt about something similar for x86 too to avoid the
> high-order allocations we currently do. I guess the buffer will later be
> mapped into the vmalloc space for the CPU?

Indeed - for non-coherent devices we have to remap all allocations 
(IOMMU or not) anyway in order to get a non-cacheable CPU mapping of the 
buffer, so having non-contiguous pages is no bother; for coherent 
devices we can just do the same thing but keep the vmalloc mapping 
cacheable. There's also the DMA_ATTR_NO_KERNEL_MAPPING case (e.g. GPU 
just wants a big buffer to render into and read back out again) where we 
wouldn't need a CPU address at all, although on arm64 vmalloc space is 
cheap enough that we've no plans to implement that at the moment.

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-08-11 13:31                     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-08-11 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joerg,

On 11/08/15 10:37, Joerg Roedel wrote:
> On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote:
>> Indeed, DMA_DEBUG will check that a driver is making DMA API calls
>> to the arch code in the right way; this is a different check, to
>> catch things like the arch code passing the wrong domain into this
>> layer, or someone else having messed directly with the domain via
>> the IOMMU API. If the iommu_unmap doesn't match the IOVA region we
>> looked up, that means the IOMMU page tables have somehow become
>> inconsistent with the IOVA allocator, so we are in an unrecoverable
>> situation where we can no longer be sure what devices have access
>> to. That's bad.
>
> Sure, but the BUG_ON would also trigger on things like a double-free,
> which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient.

Oh dear, it gets even better than that; in the case of a simple 
double-unmap where the IOVA is already free, we wouldn't even get as far 
as that check because we'd die calling iova_size(NULL). How on Earth did 
I get to v5 without spotting that? :(

Anyway, on reflection I think you're probably right; I've clearly been 
working on this for long enough to start falling into the "my thing is 
obviously more important than all the other things" trap.

>> AFAIK, yes (this is just a slight tidyup of the existing code that
>> 32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the
>> display guys want increasingly massive contiguous allocations for
>> framebuffers, layers, etc., so having IOMMU magic deal with that
>> saves CMA for non-IOMMU devices that really need it.
>
> Makes sense, I thougt about something similar for x86 too to avoid the
> high-order allocations we currently do. I guess the buffer will later be
> mapped into the vmalloc space for the CPU?

Indeed - for non-coherent devices we have to remap all allocations 
(IOMMU or not) anyway in order to get a non-cacheable CPU mapping of the 
buffer, so having non-contiguous pages is no bother; for coherent 
devices we can just do the same thing but keep the vmalloc mapping 
cacheable. There's also the DMA_ATTR_NO_KERNEL_MAPPING case (e.g. GPU 
just wants a big buffer to render into and read back out again) where we 
wouldn't need a CPU address at all, although on arm64 vmalloc space is 
cheap enough that we've no plans to implement that at the moment.

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/3] arm64: Add IOMMU dma_ops
  2015-08-11  9:49                 ` Joerg Roedel
@ 2015-08-11 20:15                     ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-08-11 20:15 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, Catalin Marinas, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA

Hi Joerg,

On 11/08/15 10:49, Joerg Roedel wrote:
> On Fri, Aug 07, 2015 at 04:27:56PM +0100, Robin Murphy wrote:
>> As per the comments, the issue here lies in the order in which the
>> OF/driver core code currently calls things for platform devices: as
>> it stands we can't attach the device to a domain in
>> arch_setup_dma_ops() because it doesn't have a group, and we can't
>> even add it to a group ourselves because it isn't fully created and
>> doesn't exist in sysfs yet. The only reason arch/arm is currently
>> getting away without this workaround is that the few IOMMU drivers
>> there hooked up to the generic infrastructure don't actually mind
>> that they get an attach_device from arch_setup_dma_ops() before
>> they've even seen an add_device (largely because they don't care
>> about groups).
>
> Hmm, what about just registering the iommu-driver in arch_setup_dma_ops
> with bus_set_iommu and not care about devices? The code will iterate
> over the devices already on the bus and tries to attach them to the
> iommu driver. But as you said, the devices are not yet on the bus, so
> when a device is later added by the OF/driver core code you can do
> everything needed for the device in the add_device call-back.

The main problem here is that "the bus" turns out to be a nebulous and 
poorly-defined thing. "The PCI bus" breaks down as soon as you have 
multiple host controllers - you could quite easily build a SoC/chipset 
where not all the host controllers/root complexes are behind IOMMUs, or 
some are behind different types of IOMMU, and then what? The "platform 
bus" is just a glorified holding pen for device structures and utterly 
useless as any kind of abstraction.

Since I'm not concerning myself with PCI at the moment; considering the 
perspective of "the" IOMMU driver on the platform bus (assuming said 
driver is up and running first, either via OF_DECLARE or deferred 
probing of masters), the issues with relying on the add_device callback 
from the bus notifier alone are:

1) We get these callbacks for everything - devices behind one of our 
IOMMUs, devices behind different IOMMUs, devices with no IOMMU in their 
path at all - with nothing but a struct device pointer to try to 
distinguish one from the other.

2) Even for the devices actually relevant to us, we don't easily have 
the details we need to be able to set it up. In the PCI case, it's 
simple because the device has one bus ID which can trivially be looked 
up by common code; if you have driver-specific ACPI tables that identify 
platform devices and give them a single ID that you can kind of handle 
like a PCI device, fair enough; in the more general DT case, though, 
devices are going to have have any number of arbitrary IDs plus who 
knows what other associated data.

Now, both of those could probably be handled by having a big mess of 
open-coded DT parsing in every IOMMU driver's add_device, but that's 
exactly the kind of thing that having a generic DT binding should avoid. 
The generic binding is good enough to express most such arbitrary data 
pretty well, and even crazy stuff like a single device mastering through 
multiple IOMMUs with different IDs, so there's every argument for 
keeping the DT parsing generic and in core code. That's what the 
of_xlate infrastructure does: the appropriate IOMMU driver gets one or 
more of_xlate calls for a master device, from which it can collate all 
the "bus" information it needs, then later it gets the add_device 
callback once the device exists, at which point it can retrieve that 
data and use it to set up the device, allocate a group, etc.

If allocating a group for a platform device automatically set up a DMA 
mapping domain and attached it, then we wouldn't need to do anything 
with domains in arch_setup_dma_ops and the ordering problem vanishes. 
However, the drivers can't support DMA domains without a dma_ops 
implementation to back them up, which needs the DMA mapping code, which 
needs DMA domains in order to work... Hence the initial need to 
bootstrap the process via fake DMA domains in the arch code, enabling 
steady incremental development instead of one massive wide-reaching 
patch dump that's virtually impossible for me to test well and for 
others to develop against.

> This might include initializing the hardware iommu needed for the
> device and setting its per-device dma_ops.

I'm not sure that feels appropriate for our situation - If your IOMMU is 
tightly integrated into the I/O hub which forms your CPU's only 
connection to the outside world, then having it do architectural things 
seems reasonable. The kind of IOMMUs I'm dealing with here, however, are 
either just some IP block that you stitch into your SoC's mess of 
interconnects somewhere like any other peripheral (to keep your legacy 
32-bit devices behind), or even just a simple TLB with a page table 
walker and some control registers which you integrate directly into your 
own bigger devices. The latter are not really even general-purpose 
enough for arbitrary IOMMU API use, but treating them as separate IOMMUs 
makes sense from a common-driver perspective.

Having drivers for those kinds of IOMMU do things like setting dma_ops 
directly when they aren't even necessarily tied to a particular CPU 
architecture, and may have to coexist with drivers for incompatible 
IOMMUs in the same system, sounds like a recipe for tons of duplicated 
code and horrible hacks. I think on DT-based systems we *have* to keep 
system topology and device infrastructure handling in the core OF code 
and not have IOMMU drivers reinventing funnily-shaped and conflicting 
wheels all over the place.

>> Laurent's probe-deferral series largely solves these problems in the
>> right place - adding identical boilerplate code to every IOMMU
>> driver to do something they shouldn't have to know about (and don't
>> necessarily have all the right information for) is exactly what we
>> don't want to do. As discussed over on another thread, I'm planning
>> to pick that series up and polish it off after this, but my top
>> priority is getting the basic dma_ops functionality into arm64 that
>> people need right now. I will be only too happy when I can submit
>> the patch removing this notifier workaround ;)
>
> I've experienced it often that someone promises me to fix things later,
> but that it doesn't happen then, so please understand that I am pretty
> cautious about adding such hacks ;)

Oh, I understand completely - that's why I've tried as much as possible 
to keep all the workarounds out of the core code and local to arm64 
where it's easier for us to remove them cleanly. If it's any 
consolation, said patch is already written:

http://www.linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=af0110ed070c13417023ca7a560dc43e9d1c46f7

(the aforementioned rough WIP branch, from which I'm hoping to pull 
together an RFC for -rc1)

>> No driver other than the AMD IOMMU has any support yet. Support for
>> IOMMU_DOMAIN_DMA can easily be added to existing drivers based on
>> patch 1 of this series, but more generally it's not entirely clear
>> how default domains are going to work beyond x86. On systems like
>> Juno or Seattle with different sets of masters behind different
>> IOMMU instances (with limited domain contexts each), the most
>> sensible approach would be to have a single default domain per IOMMU
>> (spanning domains across instances would need some hideous
>> synchronisation logic for some implementations), but the current
>> domain_alloc interface gives us no way to do that. On something like
>> RK3288 with two different types of IOMMU on the platform "bus", it
>> breaks down even further as there's no way to guarantee that
>> iommu_domain_alloc() even gives you a domain from the right *driver*
>> (hence bypassing it by going through ops directly here).
>
> Default domain allocation comes down to how the bus organizes its
> iommu-groups. For every group (at least in its current design) a default
> domain is allocated. An a group is typically only behind a single iommu
> instance.

Default-domain-per-group is great for device isolation, but the main 
thing people seem to want on ARM is the opposite of that - assembling 
big bits of virtually-contiguous memory that are visible to all the 
devices that need to share them. Per-IOMMU (or even per-system if 
appropriate) default domains provide a really neat solution for that 
use-case and mean we could mostly completely remove the IOMMU-specific 
code from GPU drivers, which is one of the things standing in the way of 
having arch/arm share the common implementation.

I'll note that limited contexts aren't a made-up consideration either. 
The PCIe SMMU in my Juno can support a maximum of 4 domains, yet the bus 
has onboard LAN and SATA plus 4 slots...

>> Only for PCI devices, via iommu_group_get_for_pci_dev(). The code
>> here, however, only runs for platform devices - ops will be always
>> null for a PCI device since of_iommu_configure() will have bailed
>> out (see the silly warning removed by my patch you picked up the
>> other day). Once iommu_group_get_for_dev() supports platform
>> devices, this too can go away. In the meantime if someone adds PCI
>> support to of_iommu_configure() and IOMMU_DOMAIN_DMA support to
>> their IOMMU driver, then we'll get a domain back from
>> iommu_get_domain_for_dev() and just use that.
>
> What is the plan for getting an iommu-groups implementation for the
> platform bus?

The realistic options are trying to handle it with logic in the drivers 
(engenders code duplication, has potentially unsolvable ordering issues) 
or defining groups statically in DT via a new binding (needs to avoid 
being too much of a Linux-specific implementation detail). I tried 
prototyping the former ages ago and it got overcomplicated very quickly. 
Having considered the static DT method more recently, I've sketched out 
some ideas for a binding and a rough design for the code, but nothing's 
actually written yet. It is another of the "make stuff work on arm64" 
balls I have in the air, though, if only because the USB SMMU on Juno is 
otherwise unusable (well, unless you hack out one or other of the 
EHCI/OHCI pair).

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
@ 2015-08-11 20:15                     ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-08-11 20:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joerg,

On 11/08/15 10:49, Joerg Roedel wrote:
> On Fri, Aug 07, 2015 at 04:27:56PM +0100, Robin Murphy wrote:
>> As per the comments, the issue here lies in the order in which the
>> OF/driver core code currently calls things for platform devices: as
>> it stands we can't attach the device to a domain in
>> arch_setup_dma_ops() because it doesn't have a group, and we can't
>> even add it to a group ourselves because it isn't fully created and
>> doesn't exist in sysfs yet. The only reason arch/arm is currently
>> getting away without this workaround is that the few IOMMU drivers
>> there hooked up to the generic infrastructure don't actually mind
>> that they get an attach_device from arch_setup_dma_ops() before
>> they've even seen an add_device (largely because they don't care
>> about groups).
>
> Hmm, what about just registering the iommu-driver in arch_setup_dma_ops
> with bus_set_iommu and not care about devices? The code will iterate
> over the devices already on the bus and tries to attach them to the
> iommu driver. But as you said, the devices are not yet on the bus, so
> when a device is later added by the OF/driver core code you can do
> everything needed for the device in the add_device call-back.

The main problem here is that "the bus" turns out to be a nebulous and 
poorly-defined thing. "The PCI bus" breaks down as soon as you have 
multiple host controllers - you could quite easily build a SoC/chipset 
where not all the host controllers/root complexes are behind IOMMUs, or 
some are behind different types of IOMMU, and then what? The "platform 
bus" is just a glorified holding pen for device structures and utterly 
useless as any kind of abstraction.

Since I'm not concerning myself with PCI at the moment; considering the 
perspective of "the" IOMMU driver on the platform bus (assuming said 
driver is up and running first, either via OF_DECLARE or deferred 
probing of masters), the issues with relying on the add_device callback 
from the bus notifier alone are:

1) We get these callbacks for everything - devices behind one of our 
IOMMUs, devices behind different IOMMUs, devices with no IOMMU in their 
path at all - with nothing but a struct device pointer to try to 
distinguish one from the other.

2) Even for the devices actually relevant to us, we don't easily have 
the details we need to be able to set it up. In the PCI case, it's 
simple because the device has one bus ID which can trivially be looked 
up by common code; if you have driver-specific ACPI tables that identify 
platform devices and give them a single ID that you can kind of handle 
like a PCI device, fair enough; in the more general DT case, though, 
devices are going to have have any number of arbitrary IDs plus who 
knows what other associated data.

Now, both of those could probably be handled by having a big mess of 
open-coded DT parsing in every IOMMU driver's add_device, but that's 
exactly the kind of thing that having a generic DT binding should avoid. 
The generic binding is good enough to express most such arbitrary data 
pretty well, and even crazy stuff like a single device mastering through 
multiple IOMMUs with different IDs, so there's every argument for 
keeping the DT parsing generic and in core code. That's what the 
of_xlate infrastructure does: the appropriate IOMMU driver gets one or 
more of_xlate calls for a master device, from which it can collate all 
the "bus" information it needs, then later it gets the add_device 
callback once the device exists, at which point it can retrieve that 
data and use it to set up the device, allocate a group, etc.

If allocating a group for a platform device automatically set up a DMA 
mapping domain and attached it, then we wouldn't need to do anything 
with domains in arch_setup_dma_ops and the ordering problem vanishes. 
However, the drivers can't support DMA domains without a dma_ops 
implementation to back them up, which needs the DMA mapping code, which 
needs DMA domains in order to work... Hence the initial need to 
bootstrap the process via fake DMA domains in the arch code, enabling 
steady incremental development instead of one massive wide-reaching 
patch dump that's virtually impossible for me to test well and for 
others to develop against.

> This might include initializing the hardware iommu needed for the
> device and setting its per-device dma_ops.

I'm not sure that feels appropriate for our situation - If your IOMMU is 
tightly integrated into the I/O hub which forms your CPU's only 
connection to the outside world, then having it do architectural things 
seems reasonable. The kind of IOMMUs I'm dealing with here, however, are 
either just some IP block that you stitch into your SoC's mess of 
interconnects somewhere like any other peripheral (to keep your legacy 
32-bit devices behind), or even just a simple TLB with a page table 
walker and some control registers which you integrate directly into your 
own bigger devices. The latter are not really even general-purpose 
enough for arbitrary IOMMU API use, but treating them as separate IOMMUs 
makes sense from a common-driver perspective.

Having drivers for those kinds of IOMMU do things like setting dma_ops 
directly when they aren't even necessarily tied to a particular CPU 
architecture, and may have to coexist with drivers for incompatible 
IOMMUs in the same system, sounds like a recipe for tons of duplicated 
code and horrible hacks. I think on DT-based systems we *have* to keep 
system topology and device infrastructure handling in the core OF code 
and not have IOMMU drivers reinventing funnily-shaped and conflicting 
wheels all over the place.

>> Laurent's probe-deferral series largely solves these problems in the
>> right place - adding identical boilerplate code to every IOMMU
>> driver to do something they shouldn't have to know about (and don't
>> necessarily have all the right information for) is exactly what we
>> don't want to do. As discussed over on another thread, I'm planning
>> to pick that series up and polish it off after this, but my top
>> priority is getting the basic dma_ops functionality into arm64 that
>> people need right now. I will be only too happy when I can submit
>> the patch removing this notifier workaround ;)
>
> I've experienced it often that someone promises me to fix things later,
> but that it doesn't happen then, so please understand that I am pretty
> cautious about adding such hacks ;)

Oh, I understand completely - that's why I've tried as much as possible 
to keep all the workarounds out of the core code and local to arm64 
where it's easier for us to remove them cleanly. If it's any 
consolation, said patch is already written:

http://www.linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=af0110ed070c13417023ca7a560dc43e9d1c46f7

(the aforementioned rough WIP branch, from which I'm hoping to pull 
together an RFC for -rc1)

>> No driver other than the AMD IOMMU has any support yet. Support for
>> IOMMU_DOMAIN_DMA can easily be added to existing drivers based on
>> patch 1 of this series, but more generally it's not entirely clear
>> how default domains are going to work beyond x86. On systems like
>> Juno or Seattle with different sets of masters behind different
>> IOMMU instances (with limited domain contexts each), the most
>> sensible approach would be to have a single default domain per IOMMU
>> (spanning domains across instances would need some hideous
>> synchronisation logic for some implementations), but the current
>> domain_alloc interface gives us no way to do that. On something like
>> RK3288 with two different types of IOMMU on the platform "bus", it
>> breaks down even further as there's no way to guarantee that
>> iommu_domain_alloc() even gives you a domain from the right *driver*
>> (hence bypassing it by going through ops directly here).
>
> Default domain allocation comes down to how the bus organizes its
> iommu-groups. For every group (at least in its current design) a default
> domain is allocated. An a group is typically only behind a single iommu
> instance.

Default-domain-per-group is great for device isolation, but the main 
thing people seem to want on ARM is the opposite of that - assembling 
big bits of virtually-contiguous memory that are visible to all the 
devices that need to share them. Per-IOMMU (or even per-system if 
appropriate) default domains provide a really neat solution for that 
use-case and mean we could mostly completely remove the IOMMU-specific 
code from GPU drivers, which is one of the things standing in the way of 
having arch/arm share the common implementation.

I'll note that limited contexts aren't a made-up consideration either. 
The PCIe SMMU in my Juno can support a maximum of 4 domains, yet the bus 
has onboard LAN and SATA plus 4 slots...

>> Only for PCI devices, via iommu_group_get_for_pci_dev(). The code
>> here, however, only runs for platform devices - ops will be always
>> null for a PCI device since of_iommu_configure() will have bailed
>> out (see the silly warning removed by my patch you picked up the
>> other day). Once iommu_group_get_for_dev() supports platform
>> devices, this too can go away. In the meantime if someone adds PCI
>> support to of_iommu_configure() and IOMMU_DOMAIN_DMA support to
>> their IOMMU driver, then we'll get a domain back from
>> iommu_get_domain_for_dev() and just use that.
>
> What is the plan for getting an iommu-groups implementation for the
> platform bus?

The realistic options are trying to handle it with logic in the drivers 
(engenders code duplication, has potentially unsolvable ordering issues) 
or defining groups statically in DT via a new binding (needs to avoid 
being too much of a Linux-specific implementation detail). I tried 
prototyping the former ages ago and it got overcomplicated very quickly. 
Having considered the static DT method more recently, I've sketched out 
some ideas for a binding and a rough design for the code, but nothing's 
actually written yet. It is another of the "make stuff work on arm64" 
balls I have in the air, though, if only because the USB SMMU on Juno is 
otherwise unusable (well, unless you hack out one or other of the 
EHCI/OHCI pair).

Robin.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/3] arm64: IOMMU-backed DMA mapping
  2015-07-31 17:18 ` Robin Murphy
@ 2015-08-26  6:19     ` Yong Wu
  -1 siblings, 0 replies; 40+ messages in thread
From: Yong Wu @ 2015-08-26  6:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw,
	arnd-r2nGTMty4D4, catalin.marinas-5wv7dgnIgG8,
	will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, 2015-07-31 at 18:18 +0100, Robin Murphy wrote:
> Hi all,
> 
> Here's an update following Catalin's feedback on v4[1].
> 
> Changes this round:
> - Rebased onto linux-next
>   - IOVA alignment fix applied already
>   - iommu_iova_cache_init() is now iova_cache_get()
> - Tidied up iommu_dma_alloc()
>   - Simplified pgprot handling
>   - Removed redundant memset
>   - Skip coherent page-flushing in a simpler way
> - Spotted a bug in iommu_dma_init_domain() where the checks for
>   reinitialising an existing domain were backwards.
> 
> If it is going to be down to me to tackle all the driver fixes and
> conversion of arch/arm dma_ops, I'd still much rather have this
> code merged first as a stable base to work with (and un-block arm64
> in the meantime). Have we decided yet whether this should go via the
> IOMMU tree or the arm64 tree?

Hi Robin,
   Sorry to disturb you. Is there any plan for the next version of arm64
DMA. If it's yes, when could we get it?
   Thanks very much.

> 
> Thanks,
> Robin.
> 
> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/10181
> 
> Robin Murphy (3):
>   iommu: Implement common IOMMU ops for DMA mapping
>   arm64: Add IOMMU dma_ops
>   arm64: Hook up IOMMU dma_ops
> 
>  arch/arm64/Kconfig                   |   1 +
>  arch/arm64/include/asm/dma-mapping.h |  15 +-
>  arch/arm64/mm/dma-mapping.c          | 449 +++++++++++++++++++++++++++++
>  drivers/iommu/Kconfig                |   7 +
>  drivers/iommu/Makefile               |   1 +
>  drivers/iommu/dma-iommu.c            | 534 +++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h            |  84 ++++++
>  include/linux/iommu.h                |   1 +
>  8 files changed, 1084 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/iommu/dma-iommu.c
>  create mode 100644 include/linux/dma-iommu.h
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 0/3] arm64: IOMMU-backed DMA mapping
@ 2015-08-26  6:19     ` Yong Wu
  0 siblings, 0 replies; 40+ messages in thread
From: Yong Wu @ 2015-08-26  6:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2015-07-31 at 18:18 +0100, Robin Murphy wrote:
> Hi all,
> 
> Here's an update following Catalin's feedback on v4[1].
> 
> Changes this round:
> - Rebased onto linux-next
>   - IOVA alignment fix applied already
>   - iommu_iova_cache_init() is now iova_cache_get()
> - Tidied up iommu_dma_alloc()
>   - Simplified pgprot handling
>   - Removed redundant memset
>   - Skip coherent page-flushing in a simpler way
> - Spotted a bug in iommu_dma_init_domain() where the checks for
>   reinitialising an existing domain were backwards.
> 
> If it is going to be down to me to tackle all the driver fixes and
> conversion of arch/arm dma_ops, I'd still much rather have this
> code merged first as a stable base to work with (and un-block arm64
> in the meantime). Have we decided yet whether this should go via the
> IOMMU tree or the arm64 tree?

Hi Robin,
   Sorry to disturb you. Is there any plan for the next version of arm64
DMA. If it's yes, when could we get it?
   Thanks very much.

> 
> Thanks,
> Robin.
> 
> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/10181
> 
> Robin Murphy (3):
>   iommu: Implement common IOMMU ops for DMA mapping
>   arm64: Add IOMMU dma_ops
>   arm64: Hook up IOMMU dma_ops
> 
>  arch/arm64/Kconfig                   |   1 +
>  arch/arm64/include/asm/dma-mapping.h |  15 +-
>  arch/arm64/mm/dma-mapping.c          | 449 +++++++++++++++++++++++++++++
>  drivers/iommu/Kconfig                |   7 +
>  drivers/iommu/Makefile               |   1 +
>  drivers/iommu/dma-iommu.c            | 534 +++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h            |  84 ++++++
>  include/linux/iommu.h                |   1 +
>  8 files changed, 1084 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/iommu/dma-iommu.c
>  create mode 100644 include/linux/dma-iommu.h
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/3] arm64: Add IOMMU dma_ops
  2015-07-31 17:18     ` Robin Murphy
@ 2015-09-22 17:12         ` Daniel Kurtz
  -1 siblings, 0 replies; 40+ messages in thread
From: Daniel Kurtz via iommu @ 2015-09-22 17:12 UTC (permalink / raw)
  To: Robin Murphy, Marek Szyprowski
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, Arnd Bergmann,
	Catalin Marinas, Will Deacon, Tiffany Lin,
	open list:IOMMU DRIVERS, wuchengli-F7+t8E8rja9g9hUCZPvPmw,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Pawel Osciak,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA, Yingjoe Chen,
	Thierry Reding

Hi Robin,

On Sat, Aug 1, 2015 at 1:18 AM, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:
> Taking some inspiration from the arch/arm code, implement the
> arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.
>
> Unfortunately the device setup code has to start out as a big ugly mess
> in order to work usefully right now, as 'proper' operation depends on
> changes to device probe and DMA configuration ordering, IOMMU groups for
> platform devices, and default domain support in arm/arm64 IOMMU drivers.
> The workarounds here need only exist until that work is finished.
>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---

[snip]

> +static void __iommu_sync_sg_for_cpu(struct device *dev,
> +                                   struct scatterlist *sgl, int nelems,
> +                                   enum dma_data_direction dir)
> +{
> +       struct scatterlist *sg;
> +       int i;
> +
> +       if (is_device_dma_coherent(dev))
> +               return;
> +
> +       for_each_sg(sgl, sg, nelems, i)
> +               __dma_unmap_area(sg_virt(sg), sg->length, dir);
> +}

In an earlier review [0], Marek asked you to change the loop in
__iommu_sync_sg_for_cpu loop() to loop over the virtual areas when
invalidating/cleaning memory ranges.

[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/328232.html

However, this changed the meaning of the 'nelems' argument from what
was for arm_iommu_sync_sg_for_cpu() in arch/arm/mm/dma-mapping.c:
 "number of buffers to sync (returned from dma_map_sg)"
to:
 "number of virtual areas to sync (same as was passed to dma_map_sg)"

This has caused some confusion by callers of dma_sync_sg_for_device()
that must work for both arm & arm64 as illustrated by [1].
[1] https://lkml.org/lkml/2015/9/21/250

Based on the implementation of debug_dma_sync_sg_for_cpu() in
lib/dma-debug.c, I think the arm interpretation of nelems (returned
from dma_map_sg) is correct.

Therefore, I think we need an outer iteration over dma chunks, and an
inner iteration that calls __dma_map_area() over the set virtual areas
that correspond to that dma chunk, both here and for
__iommu_sync_sg_for_device().  This will be complicated by the fact
that iommu pages could actually be smaller than PAGE_SIZE, and offset
within a single physical page.  Also, as an optimization, we would
want to combine contiguous virtual areas into a single call to
__dma_unmap_area().

-Dan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
@ 2015-09-22 17:12         ` Daniel Kurtz
  0 siblings, 0 replies; 40+ messages in thread
From: Daniel Kurtz @ 2015-09-22 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On Sat, Aug 1, 2015 at 1:18 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> Taking some inspiration from the arch/arm code, implement the
> arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.
>
> Unfortunately the device setup code has to start out as a big ugly mess
> in order to work usefully right now, as 'proper' operation depends on
> changes to device probe and DMA configuration ordering, IOMMU groups for
> platform devices, and default domain support in arm/arm64 IOMMU drivers.
> The workarounds here need only exist until that work is finished.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---

[snip]

> +static void __iommu_sync_sg_for_cpu(struct device *dev,
> +                                   struct scatterlist *sgl, int nelems,
> +                                   enum dma_data_direction dir)
> +{
> +       struct scatterlist *sg;
> +       int i;
> +
> +       if (is_device_dma_coherent(dev))
> +               return;
> +
> +       for_each_sg(sgl, sg, nelems, i)
> +               __dma_unmap_area(sg_virt(sg), sg->length, dir);
> +}

In an earlier review [0], Marek asked you to change the loop in
__iommu_sync_sg_for_cpu loop() to loop over the virtual areas when
invalidating/cleaning memory ranges.

[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/328232.html

However, this changed the meaning of the 'nelems' argument from what
was for arm_iommu_sync_sg_for_cpu() in arch/arm/mm/dma-mapping.c:
 "number of buffers to sync (returned from dma_map_sg)"
to:
 "number of virtual areas to sync (same as was passed to dma_map_sg)"

This has caused some confusion by callers of dma_sync_sg_for_device()
that must work for both arm & arm64 as illustrated by [1].
[1] https://lkml.org/lkml/2015/9/21/250

Based on the implementation of debug_dma_sync_sg_for_cpu() in
lib/dma-debug.c, I think the arm interpretation of nelems (returned
from dma_map_sg) is correct.

Therefore, I think we need an outer iteration over dma chunks, and an
inner iteration that calls __dma_map_area() over the set virtual areas
that correspond to that dma chunk, both here and for
__iommu_sync_sg_for_device().  This will be complicated by the fact
that iommu pages could actually be smaller than PAGE_SIZE, and offset
within a single physical page.  Also, as an optimization, we would
want to combine contiguous virtual areas into a single call to
__dma_unmap_area().

-Dan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/3] arm64: Add IOMMU dma_ops
  2015-09-22 17:12         ` Daniel Kurtz
@ 2015-09-22 18:11             ` Robin Murphy
  -1 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-09-22 18:11 UTC (permalink / raw)
  To: Daniel Kurtz, Marek Szyprowski
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, Arnd Bergmann,
	Catalin Marinas, Will Deacon, Tiffany Lin,
	open list:IOMMU DRIVERS, wuchengli-F7+t8E8rja9g9hUCZPvPmw,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Pawel Osciak,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA, Yingjoe Chen,
	Thierry Reding

Hi Dan,

On 22/09/15 18:12, Daniel Kurtz wrote:
> Hi Robin,
>
> On Sat, Aug 1, 2015 at 1:18 AM, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:
>> Taking some inspiration from the arch/arm code, implement the
>> arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.
>>
>> Unfortunately the device setup code has to start out as a big ugly mess
>> in order to work usefully right now, as 'proper' operation depends on
>> changes to device probe and DMA configuration ordering, IOMMU groups for
>> platform devices, and default domain support in arm/arm64 IOMMU drivers.
>> The workarounds here need only exist until that work is finished.
>>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>
> [snip]
>
>> +static void __iommu_sync_sg_for_cpu(struct device *dev,
>> +                                   struct scatterlist *sgl, int nelems,
>> +                                   enum dma_data_direction dir)
>> +{
>> +       struct scatterlist *sg;
>> +       int i;
>> +
>> +       if (is_device_dma_coherent(dev))
>> +               return;
>> +
>> +       for_each_sg(sgl, sg, nelems, i)
>> +               __dma_unmap_area(sg_virt(sg), sg->length, dir);
>> +}
>
> In an earlier review [0], Marek asked you to change the loop in
> __iommu_sync_sg_for_cpu loop() to loop over the virtual areas when
> invalidating/cleaning memory ranges.
>
> [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/328232.html
>
> However, this changed the meaning of the 'nelems' argument from what
> was for arm_iommu_sync_sg_for_cpu() in arch/arm/mm/dma-mapping.c:
>   "number of buffers to sync (returned from dma_map_sg)"
> to:
>   "number of virtual areas to sync (same as was passed to dma_map_sg)"
>
> This has caused some confusion by callers of dma_sync_sg_for_device()
> that must work for both arm & arm64 as illustrated by [1].
> [1] https://lkml.org/lkml/2015/9/21/250

Funnily enough, I happened to stumble across that earlier of my own 
volition, and felt obliged to respond ;)

> Based on the implementation of debug_dma_sync_sg_for_cpu() in
> lib/dma-debug.c, I think the arm interpretation of nelems (returned
> from dma_map_sg) is correct.

As I explained over on the other thread, you can only do cache 
maintenance on CPU addresses, and those haven't changed regardless of 
what mapping you set up in the IOMMU for the device to see, therefore 
iterating over the mapped DMA chunks makes no sense if you have no way 
to infer a CPU address from a DMA address alone (indeed, I struggled a 
bit to get this initially, hence Marek's feedback). Note that the 
arm_iommu_sync_sg_* code is iterating over entries using the original 
CPU address, offset and length fields in exactly that way, not using the 
DMA address/length fields at all, therefore if you pass in less than the 
original number of entries you'll simply miss out part of the buffer; 
what that code _does_ is indeed correct, but it's not the same thing as 
the comments imply, and the comments are wrong.

AFAICS, debug_dma_sync_sg_* still expects to be called with the original 
nents as well, it just bails out early after mapped_ents entries since 
any further entries won't have DMA addresses to check anyway.

I suspect the offending comments were simply copied from the 
arm_dma_sync_sg_* implementations, which rather counterintuitively _do_ 
operate on the mapped DMA addresses, because they might be flushing a 
bounced copy of the buffer instead of the original pages (and can depend 
on the necessary 1:1 DMA:CPU relationship either way).

Robin.

[0]:http://article.gmane.org/gmane.linux.kernel/2044263

>
> Therefore, I think we need an outer iteration over dma chunks, and an
> inner iteration that calls __dma_map_area() over the set virtual areas
> that correspond to that dma chunk, both here and for
> __iommu_sync_sg_for_device().  This will be complicated by the fact
> that iommu pages could actually be smaller than PAGE_SIZE, and offset
> within a single physical page.  Also, as an optimization, we would
> want to combine contiguous virtual areas into a single call to
> __dma_unmap_area().
>
> -Dan
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 2/3] arm64: Add IOMMU dma_ops
@ 2015-09-22 18:11             ` Robin Murphy
  0 siblings, 0 replies; 40+ messages in thread
From: Robin Murphy @ 2015-09-22 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dan,

On 22/09/15 18:12, Daniel Kurtz wrote:
> Hi Robin,
>
> On Sat, Aug 1, 2015 at 1:18 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>> Taking some inspiration from the arch/arm code, implement the
>> arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.
>>
>> Unfortunately the device setup code has to start out as a big ugly mess
>> in order to work usefully right now, as 'proper' operation depends on
>> changes to device probe and DMA configuration ordering, IOMMU groups for
>> platform devices, and default domain support in arm/arm64 IOMMU drivers.
>> The workarounds here need only exist until that work is finished.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>
> [snip]
>
>> +static void __iommu_sync_sg_for_cpu(struct device *dev,
>> +                                   struct scatterlist *sgl, int nelems,
>> +                                   enum dma_data_direction dir)
>> +{
>> +       struct scatterlist *sg;
>> +       int i;
>> +
>> +       if (is_device_dma_coherent(dev))
>> +               return;
>> +
>> +       for_each_sg(sgl, sg, nelems, i)
>> +               __dma_unmap_area(sg_virt(sg), sg->length, dir);
>> +}
>
> In an earlier review [0], Marek asked you to change the loop in
> __iommu_sync_sg_for_cpu loop() to loop over the virtual areas when
> invalidating/cleaning memory ranges.
>
> [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/328232.html
>
> However, this changed the meaning of the 'nelems' argument from what
> was for arm_iommu_sync_sg_for_cpu() in arch/arm/mm/dma-mapping.c:
>   "number of buffers to sync (returned from dma_map_sg)"
> to:
>   "number of virtual areas to sync (same as was passed to dma_map_sg)"
>
> This has caused some confusion by callers of dma_sync_sg_for_device()
> that must work for both arm & arm64 as illustrated by [1].
> [1] https://lkml.org/lkml/2015/9/21/250

Funnily enough, I happened to stumble across that earlier of my own 
volition, and felt obliged to respond ;)

> Based on the implementation of debug_dma_sync_sg_for_cpu() in
> lib/dma-debug.c, I think the arm interpretation of nelems (returned
> from dma_map_sg) is correct.

As I explained over on the other thread, you can only do cache 
maintenance on CPU addresses, and those haven't changed regardless of 
what mapping you set up in the IOMMU for the device to see, therefore 
iterating over the mapped DMA chunks makes no sense if you have no way 
to infer a CPU address from a DMA address alone (indeed, I struggled a 
bit to get this initially, hence Marek's feedback). Note that the 
arm_iommu_sync_sg_* code is iterating over entries using the original 
CPU address, offset and length fields in exactly that way, not using the 
DMA address/length fields at all, therefore if you pass in less than the 
original number of entries you'll simply miss out part of the buffer; 
what that code _does_ is indeed correct, but it's not the same thing as 
the comments imply, and the comments are wrong.

AFAICS, debug_dma_sync_sg_* still expects to be called with the original 
nents as well, it just bails out early after mapped_ents entries since 
any further entries won't have DMA addresses to check anyway.

I suspect the offending comments were simply copied from the 
arm_dma_sync_sg_* implementations, which rather counterintuitively _do_ 
operate on the mapped DMA addresses, because they might be flushing a 
bounced copy of the buffer instead of the original pages (and can depend 
on the necessary 1:1 DMA:CPU relationship either way).

Robin.

[0]:http://article.gmane.org/gmane.linux.kernel/2044263

>
> Therefore, I think we need an outer iteration over dma chunks, and an
> inner iteration that calls __dma_map_area() over the set virtual areas
> that correspond to that dma chunk, both here and for
> __iommu_sync_sg_for_device().  This will be complicated by the fact
> that iommu pages could actually be smaller than PAGE_SIZE, and offset
> within a single physical page.  Also, as an optimization, we would
> want to combine contiguous virtual areas into a single call to
> __dma_unmap_area().
>
> -Dan
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-09-22 18:11 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-31 17:18 [PATCH v5 0/3] arm64: IOMMU-backed DMA mapping Robin Murphy
2015-07-31 17:18 ` Robin Murphy
     [not found] ` <cover.1438362603.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-07-31 17:18   ` [PATCH v5 1/3] iommu: Implement common IOMMU ops for " Robin Murphy
2015-07-31 17:18     ` Robin Murphy
     [not found]     ` <6ce6b501501f611297ae0eae31e07b0d2060eaae.1438362603.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-08-03 17:40       ` Catalin Marinas
2015-08-03 17:40         ` Catalin Marinas
2015-08-06 15:23       ` Will Deacon
2015-08-06 15:23         ` Will Deacon
     [not found]         ` <20150806152327.GH25483-5wv7dgnIgG8@public.gmane.org>
2015-08-06 17:54           ` joro-zLv9SwRftAIdnm+yROfE0A
2015-08-06 17:54             ` joro at 8bytes.org
2015-08-07  8:42       ` Joerg Roedel
2015-08-07  8:42         ` Joerg Roedel
     [not found]         ` <20150807084228.GU14980-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-08-07 13:38           ` Robin Murphy
2015-08-07 13:38             ` Robin Murphy
     [not found]             ` <55C4B4DF.4040608-5wv7dgnIgG8@public.gmane.org>
2015-08-11  9:37               ` Joerg Roedel
2015-08-11  9:37                 ` Joerg Roedel
     [not found]                 ` <20150811093742.GC14980-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-08-11 13:31                   ` Robin Murphy
2015-08-11 13:31                     ` Robin Murphy
2015-07-31 17:18   ` [PATCH v5 2/3] arm64: Add IOMMU dma_ops Robin Murphy
2015-07-31 17:18     ` Robin Murphy
     [not found]     ` <8a5abd0a9929aae160ccb74d7a8d9c3698f61910.1438362603.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-08-03 17:33       ` Catalin Marinas
2015-08-03 17:33         ` Catalin Marinas
2015-08-07  8:52       ` Joerg Roedel
2015-08-07  8:52         ` Joerg Roedel
     [not found]         ` <20150807085233.GV14980-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-08-07 15:27           ` Robin Murphy
2015-08-07 15:27             ` Robin Murphy
     [not found]             ` <55C4CE7C.7050205-5wv7dgnIgG8@public.gmane.org>
2015-08-11  9:49               ` Joerg Roedel
2015-08-11  9:49                 ` Joerg Roedel
     [not found]                 ` <20150811094951.GD14980-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-08-11 20:15                   ` Robin Murphy
2015-08-11 20:15                     ` Robin Murphy
2015-09-22 17:12       ` Daniel Kurtz via iommu
2015-09-22 17:12         ` Daniel Kurtz
     [not found]         ` <CAGS+omCDYrjpr--+sUzaKCxo12Eff6TC04RgroDgKvxHwK3t2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-22 18:11           ` Robin Murphy
2015-09-22 18:11             ` Robin Murphy
2015-07-31 17:18   ` [PATCH v5 3/3] arm64: Hook up " Robin Murphy
2015-07-31 17:18     ` Robin Murphy
     [not found]     ` <caecbce93dd4870995a000bebc8f58d1ca7e551e.1438362603.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-08-07  8:55       ` Joerg Roedel
2015-08-07  8:55         ` Joerg Roedel
2015-08-26  6:19   ` [PATCH v5 0/3] arm64: IOMMU-backed DMA mapping Yong Wu
2015-08-26  6:19     ` Yong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.