iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* add a new dma_alloc_noncontiguous API v2
@ 2021-02-02  9:51 Christoph Hellwig
  2021-02-02  9:51 ` [PATCH 1/7] dma-mapping: remove the {alloc,free}_noncoherent methods Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-02  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, Robin Murphy, linux-kernel, linux-doc

Hi all,

this series adds the new noncontiguous DMA allocation API requested by
various media driver maintainers.

Changes since v1:
 - document that flush_kernel_vmap_range and invalidate_kernel_vmap_range
   must be called once an allocation is mapped into KVA
 - add dma-debug support
 - remove the separate dma_handle argument, and instead create fully formed
   DMA mapped scatterlists
 - use a directional allocation in uvcvideo
 - call invalidate_kernel_vmap_range from uvcvideo
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/7] dma-mapping: remove the {alloc,free}_noncoherent methods
  2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
@ 2021-02-02  9:51 ` Christoph Hellwig
  2021-02-02  9:51 ` [PATCH 2/7] dma-mapping: add a dma_mmap_pages helper Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-02  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, Robin Murphy, linux-kernel, linux-doc

It turns out allowing non-contigous allocations here was a rather bad
idea, as we'll now need to define ways to get the pages for mmaping
or dma_buf sharing.  Revert this change and stick to the original
concept.  A different API for the use case of non-contigous allocations
will be added back later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/core-api/dma-api.rst | 64 ++++++++++--------------------
 drivers/iommu/dma-iommu.c          | 30 --------------
 include/linux/dma-map-ops.h        |  5 ---
 include/linux/dma-mapping.h        | 17 ++++++--
 kernel/dma/mapping.c               | 40 -------------------
 5 files changed, 35 insertions(+), 121 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 75cb757bbff00a..e6d23f117308df 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -528,16 +528,14 @@ an I/O device, you should not be using this part of the API.
 
 ::
 
-	void *
-	dma_alloc_noncoherent(struct device *dev, size_t size,
-			dma_addr_t *dma_handle, enum dma_data_direction dir,
-			gfp_t gfp)
+	struct page *
+	dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+			enum dma_data_direction dir, gfp_t gfp)
 
-This routine allocates a region of <size> bytes of consistent memory.  It
-returns a pointer to the allocated region (in the processor's virtual address
-space) or NULL if the allocation failed.  The returned memory may or may not
-be in the kernel direct mapping.  Drivers must not call virt_to_page on
-the returned memory region.
+This routine allocates a region of <size> bytes of non-coherent memory.  It
+returns a pointer to first struct page for the region, or NULL if the
+allocation failed. The resulting struct page can be used for everything a
+struct page is suitable for.
 
 It also returns a <dma_handle> which may be cast to an unsigned integer the
 same width as the bus and given to the device as the DMA address base of
@@ -558,51 +556,33 @@ reused.
 ::
 
 	void
-	dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
+	dma_free_pages(struct device *dev, size_t size, struct page *page,
 			dma_addr_t dma_handle, enum dma_data_direction dir)
 
-Free a region of memory previously allocated using dma_alloc_noncoherent().
-dev, size and dma_handle and dir must all be the same as those passed into
-dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
-dma_alloc_noncoherent().
+Free a region of memory previously allocated using dma_alloc_pages().
+dev, size, dma_handle and dir must all be the same as those passed into
+dma_alloc_pages().  page must be the pointer returned by dma_alloc_pages().
 
 ::
 
-	struct page *
-	dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
-			enum dma_data_direction dir, gfp_t gfp)
-
-This routine allocates a region of <size> bytes of non-coherent memory.  It
-returns a pointer to first struct page for the region, or NULL if the
-allocation failed. The resulting struct page can be used for everything a
-struct page is suitable for.
-
-It also returns a <dma_handle> which may be cast to an unsigned integer the
-same width as the bus and given to the device as the DMA address base of
-the region.
-
-The dir parameter specified if data is read and/or written by the device,
-see dma_map_single() for details.
-
-The gfp parameter allows the caller to specify the ``GFP_`` flags (see
-kmalloc()) for the allocation, but rejects flags used to specify a memory
-zone such as GFP_DMA or GFP_HIGHMEM.
+	void *
+	dma_alloc_noncoherent(struct device *dev, size_t size,
+			dma_addr_t *dma_handle, enum dma_data_direction dir,
+			gfp_t gfp)
 
-Before giving the memory to the device, dma_sync_single_for_device() needs
-to be called, and before reading memory written by the device,
-dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
-reused.
+This routine is a convenient wrapper around dma_alloc_pages that returns the
+kernel virtual address for the allocated memory instead of the page structure.
 
 ::
 
 	void
-	dma_free_pages(struct device *dev, size_t size, struct page *page,
+	dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
 			dma_addr_t dma_handle, enum dma_data_direction dir)
 
-Free a region of memory previously allocated using dma_alloc_pages().
-dev, size and dma_handle and dir must all be the same as those passed into
-dma_alloc_noncoherent().  page must be the pointer returned by
-dma_alloc_pages().
+Free a region of memory previously allocated using dma_alloc_noncoherent().
+dev, size, dma_handle and dir must all be the same as those passed into
+dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
+dma_alloc_noncoherent().
 
 ::
 
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4078358ed66ea8..255533faf90599 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1197,34 +1197,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t size,
 	return cpu_addr;
 }
 
-#ifdef CONFIG_DMA_REMAP
-static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
-		dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
-{
-	if (!gfpflags_allow_blocking(gfp)) {
-		struct page *page;
-
-		page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
-		if (!page)
-			return NULL;
-		return page_address(page);
-	}
-
-	return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
-				     PAGE_KERNEL, 0);
-}
-
-static void iommu_dma_free_noncoherent(struct device *dev, size_t size,
-		void *cpu_addr, dma_addr_t handle, enum dma_data_direction dir)
-{
-	__iommu_dma_unmap(dev, handle, size);
-	__iommu_dma_free(dev, size, cpu_addr);
-}
-#else
-#define iommu_dma_alloc_noncoherent		NULL
-#define iommu_dma_free_noncoherent		NULL
-#endif /* CONFIG_DMA_REMAP */
-
 static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs)
@@ -1295,8 +1267,6 @@ static const struct dma_map_ops iommu_dma_ops = {
 	.free			= iommu_dma_free,
 	.alloc_pages		= dma_common_alloc_pages,
 	.free_pages		= dma_common_free_pages,
-	.alloc_noncoherent	= iommu_dma_alloc_noncoherent,
-	.free_noncoherent	= iommu_dma_free_noncoherent,
 	.mmap			= iommu_dma_mmap,
 	.get_sgtable		= iommu_dma_get_sgtable,
 	.map_page		= iommu_dma_map_page,
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 70fcd0f610ea48..11e02537b9e01b 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -22,11 +22,6 @@ struct dma_map_ops {
 			gfp_t gfp);
 	void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
 			dma_addr_t dma_handle, enum dma_data_direction dir);
-	void *(*alloc_noncoherent)(struct device *dev, size_t size,
-			dma_addr_t *dma_handle, enum dma_data_direction dir,
-			gfp_t gfp);
-	void (*free_noncoherent)(struct device *dev, size_t size, void *vaddr,
-			dma_addr_t dma_handle, enum dma_data_direction dir);
 	int (*mmap)(struct device *, struct vm_area_struct *,
 			void *, dma_addr_t, size_t, unsigned long attrs);
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f391a..fbfa3f5abd9498 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -263,10 +263,19 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
 void dma_free_pages(struct device *dev, size_t size, struct page *page,
 		dma_addr_t dma_handle, enum dma_data_direction dir);
-void *dma_alloc_noncoherent(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
-void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr,
-		dma_addr_t dma_handle, enum dma_data_direction dir);
+
+static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+	struct page *page = dma_alloc_pages(dev, size, dma_handle, dir, gfp);
+	return page ? page_address(page) : NULL;
+}
+
+static inline void dma_free_noncoherent(struct device *dev, size_t size,
+		void *vaddr, dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+	dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir);
+}
 
 static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index f87a89d086544b..68992e35c8c3a7 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -515,46 +515,6 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page,
 }
 EXPORT_SYMBOL_GPL(dma_free_pages);
 
-void *dma_alloc_noncoherent(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-	void *vaddr;
-
-	if (!ops || !ops->alloc_noncoherent) {
-		struct page *page;
-
-		page = dma_alloc_pages(dev, size, dma_handle, dir, gfp);
-		if (!page)
-			return NULL;
-		return page_address(page);
-	}
-
-	size = PAGE_ALIGN(size);
-	vaddr = ops->alloc_noncoherent(dev, size, dma_handle, dir, gfp);
-	if (vaddr)
-		debug_dma_map_page(dev, virt_to_page(vaddr), 0, size, dir,
-				   *dma_handle);
-	return vaddr;
-}
-EXPORT_SYMBOL_GPL(dma_alloc_noncoherent);
-
-void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr,
-		dma_addr_t dma_handle, enum dma_data_direction dir)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	if (!ops || !ops->free_noncoherent) {
-		dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir);
-		return;
-	}
-
-	size = PAGE_ALIGN(size);
-	debug_dma_unmap_page(dev, dma_handle, size, dir);
-	ops->free_noncoherent(dev, size, vaddr, dma_handle, dir);
-}
-EXPORT_SYMBOL_GPL(dma_free_noncoherent);
-
 int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/7] dma-mapping: add a dma_mmap_pages helper
  2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
  2021-02-02  9:51 ` [PATCH 1/7] dma-mapping: remove the {alloc,free}_noncoherent methods Christoph Hellwig
@ 2021-02-02  9:51 ` Christoph Hellwig
  2021-02-02  9:51 ` [PATCH 3/7] dma-mapping: refactor dma_{alloc,free}_pages Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-02  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, Robin Murphy, linux-kernel, linux-doc

Add a helper to map memory allocated using dma_alloc_pages into
a user address space, similar to the dma_alloc_attrs function for
coherent allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/core-api/dma-api.rst | 10 ++++++++++
 include/linux/dma-mapping.h        |  2 ++
 kernel/dma/mapping.c               | 13 +++++++++++++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index e6d23f117308df..157a474ae54416 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -563,6 +563,16 @@ Free a region of memory previously allocated using dma_alloc_pages().
 dev, size, dma_handle and dir must all be the same as those passed into
 dma_alloc_pages().  page must be the pointer returned by dma_alloc_pages().
 
+::
+
+	int
+	dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+		       size_t size, struct page *page)
+
+Map an allocation returned from dma_alloc_pages() into a user address space.
+dev and size must be the same as those passed into dma_alloc_pages().
+page must be the pointer returned by dma_alloc_pages().
+
 ::
 
 	void *
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index fbfa3f5abd9498..4977a748cb9483 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -263,6 +263,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
 void dma_free_pages(struct device *dev, size_t size, struct page *page,
 		dma_addr_t dma_handle, enum dma_data_direction dir);
+int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+		size_t size, struct page *page);
 
 static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 68992e35c8c3a7..c1e515496c067b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -515,6 +515,19 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page,
 }
 EXPORT_SYMBOL_GPL(dma_free_pages);
 
+int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+		size_t size, struct page *page)
+{
+	unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	if (vma->vm_pgoff >= count || vma_pages(vma) > count - vma->vm_pgoff)
+		return -ENXIO;
+	return remap_pfn_range(vma, vma->vm_start,
+			       page_to_pfn(page) + vma->vm_pgoff,
+			       vma_pages(vma) << PAGE_SHIFT, vma->vm_page_prot);
+}
+EXPORT_SYMBOL_GPL(dma_mmap_pages);
+
 int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/7] dma-mapping: refactor dma_{alloc,free}_pages
  2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
  2021-02-02  9:51 ` [PATCH 1/7] dma-mapping: remove the {alloc,free}_noncoherent methods Christoph Hellwig
  2021-02-02  9:51 ` [PATCH 2/7] dma-mapping: add a dma_mmap_pages helper Christoph Hellwig
@ 2021-02-02  9:51 ` Christoph Hellwig
  2021-02-02  9:51 ` [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-02  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, Robin Murphy, linux-kernel, linux-doc

Factour out internal versions without the dma_debug calls in preparation
for callers that will need different dma_debug calls.

Note that this changes the dma_debug calls to get the not page aligned
size values, but as long as alloc and free agree on one variant we are
fine.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/mapping.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index c1e515496c067b..5e87dac6cc6d9a 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -475,11 +475,10 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 }
 EXPORT_SYMBOL(dma_free_attrs);
 
-struct page *dma_alloc_pages(struct device *dev, size_t size,
+static struct page *__dma_alloc_pages(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-	struct page *page;
 
 	if (WARN_ON_ONCE(!dev->coherent_dma_mask))
 		return NULL;
@@ -488,31 +487,41 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
 
 	size = PAGE_ALIGN(size);
 	if (dma_alloc_direct(dev, ops))
-		page = dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
-	else if (ops->alloc_pages)
-		page = ops->alloc_pages(dev, size, dma_handle, dir, gfp);
-	else
+		return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
+	if (!ops->alloc_pages)
 		return NULL;
+	return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
+}
 
-	debug_dma_map_page(dev, page, 0, size, dir, *dma_handle);
+struct page *dma_alloc_pages(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+	struct page *page = __dma_alloc_pages(dev, size, dma_handle, dir, gfp);
 
+	if (page)
+		debug_dma_map_page(dev, page, 0, size, dir, *dma_handle);
 	return page;
 }
 EXPORT_SYMBOL_GPL(dma_alloc_pages);
 
-void dma_free_pages(struct device *dev, size_t size, struct page *page,
+static void __dma_free_pages(struct device *dev, size_t size, struct page *page,
 		dma_addr_t dma_handle, enum dma_data_direction dir)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	size = PAGE_ALIGN(size);
-	debug_dma_unmap_page(dev, dma_handle, size, dir);
-
 	if (dma_alloc_direct(dev, ops))
 		dma_direct_free_pages(dev, size, page, dma_handle, dir);
 	else if (ops->free_pages)
 		ops->free_pages(dev, size, page, dma_handle, dir);
 }
+
+void dma_free_pages(struct device *dev, size_t size, struct page *page,
+		dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+	debug_dma_unmap_page(dev, dma_handle, size, dir);
+	__dma_free_pages(dev, size, page, dma_handle, dir);
+}
 EXPORT_SYMBOL_GPL(dma_free_pages);
 
 int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API
  2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2021-02-02  9:51 ` [PATCH 3/7] dma-mapping: refactor dma_{alloc,free}_pages Christoph Hellwig
@ 2021-02-02  9:51 ` Christoph Hellwig
  2021-02-16 18:55   ` Robin Murphy
  2021-02-02  9:51 ` [PATCH 5/7] dma-iommu: refactor iommu_dma_alloc_remap Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-02  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, Robin Murphy, linux-kernel, linux-doc

Add a new API that returns a potentiall virtually non-contigous sg_table
and a DMA address.  This API is only properly implemented for dma-iommu
and will simply return a contigious chunk as a fallback.

The intent is that media drivers can use this API if either:

 - no kernel mapping or only temporary kernel mappings are required.
   That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
 - a kernel mapping is required for cached and DMA mapped pages, but
   the driver also needs the pages to e.g. map them to userspace.
   In that sense it is a replacement for some aspects of the recently
   removed and never fully implemented DMA_ATTR_NON_CONSISTENT

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/core-api/dma-api.rst |  74 +++++++++++++++++++++
 include/linux/dma-map-ops.h        |  18 +++++
 include/linux/dma-mapping.h        |  31 +++++++++
 kernel/dma/mapping.c               | 103 +++++++++++++++++++++++++++++
 4 files changed, 226 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 157a474ae54416..e24b2447f4bfe6 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -594,6 +594,80 @@ dev, size, dma_handle and dir must all be the same as those passed into
 dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
 dma_alloc_noncoherent().
 
+::
+
+	struct sg_table *
+	dma_alloc_noncontiguous(struct device *dev, size_t size,
+				enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates  <size> bytes of non-coherent and possibly non-contiguous
+memory.  It returns a pointer to struct sg_table that describes the allocated
+and DMA mapped memory, or NULL if the allocation failed. The resulting memory
+can be used for struct page mapped into a scatterlist are suitable for.
+
+The return sg_table is guaranteed to have 1 single DMA mapped segment as
+indicated by sgt->nents, but it might have multiple CPU side segments as
+indicated by sgt->orig_nents.
+
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+Before giving the memory to the device, dma_sync_sgtable_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_sgtable_for_cpu(), just like for streaming DMA mappings that are
+reused.
+
+::
+
+	void
+	dma_free_noncontiguous(struct device *dev, size_t size,
+			       struct sg_table *sgt,
+			       enum dma_data_direction dir)
+
+Free memory previously allocated using dma_alloc_noncontiguous().  dev, size,
+and dir must all be the same as those passed into dma_alloc_noncontiguous().
+sgt must be the pointer returned by dma_alloc_noncontiguous().
+
+::
+
+	void *
+	dma_vmap_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt)
+
+Return a contiguous kernel mapping for an allocation returned from
+dma_alloc_noncontiguous().  dev and size must be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
+Once a non-contiguous allocation is mapped using this function, the
+flush_kernel_vmap_range() and invalidate_kernel_vmap_range() APIs must be used
+to manage the coherency of the kernel mapping.
+
+::
+
+	void
+	dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
+
+Unmap a kernel mapping returned by dma_vmap_noncontiguous().  dev must be the
+same the one passed into dma_alloc_noncontiguous().  vaddr must be the pointer
+returned by dma_vmap_noncontiguous().
+
+
+::
+
+	int
+	dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
+			       size_t size, struct sg_table *sgt)
+
+Map an allocation returned from dma_alloc_noncontiguous() into a user address
+space.  dev and size must be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
 ::
 
 	int
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 11e02537b9e01b..fe46a41130e662 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -22,6 +22,10 @@ struct dma_map_ops {
 			gfp_t gfp);
 	void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
 			dma_addr_t dma_handle, enum dma_data_direction dir);
+	struct sg_table *(*alloc_noncontiguous)(struct device *dev, size_t size,
+			enum dma_data_direction dir, gfp_t gfp);
+	void (*free_noncontiguous)(struct device *dev, size_t size,
+			struct sg_table *sgt, enum dma_data_direction dir);
 	int (*mmap)(struct device *, struct vm_area_struct *,
 			void *, dma_addr_t, size_t, unsigned long attrs);
 
@@ -198,6 +202,20 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
+/*
+ * This is the actual return value from the ->alloc_noncontiguous method.
+ * The users of the DMA API should only care about the sg_table, but to make
+ * the DMA-API internal vmaping and freeing easier we stash away the page
+ * array as well (except for the fallback case).  This can go away any time,
+ * e.g. when a vmap-variant that takes a scatterlist comes along.
+ */
+struct dma_sgt_handle {
+	struct sg_table sgt;
+	struct page **pages;
+};
+#define sgt_handle(sgt) \
+	container_of((sgt), struct dma_sgt_handle, sgt)
+
 int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4977a748cb9483..6f4d34739f5cc6 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -144,6 +144,15 @@ u64 dma_get_required_mask(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
 unsigned long dma_get_merge_boundary(struct device *dev);
+struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
+		enum dma_data_direction dir, gfp_t gfp);
+void dma_free_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt, enum dma_data_direction dir);
+void *dma_vmap_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt);
+void dma_vunmap_noncontiguous(struct device *dev, void *vaddr);
+int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
+		size_t size, struct sg_table *sgt);
 #else /* CONFIG_HAS_DMA */
 static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 		struct page *page, size_t offset, size_t size,
@@ -257,6 +266,28 @@ static inline unsigned long dma_get_merge_boundary(struct device *dev)
 {
 	return 0;
 }
+static inline struct sg_table *dma_alloc_noncontiguous(struct device *dev,
+		size_t size, enum dma_data_direction dir, gfp_t gfp)
+{
+	return NULL;
+}
+static inline void dma_free_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt, enum dma_data_direction dir)
+{
+}
+static inline void *dma_vmap_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt)
+{
+	return NULL;
+}
+static inline void dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
+{
+}
+static inline int dma_mmap_noncontiguous(struct device *dev,
+		struct vm_area_struct *vma, size_t size, struct sg_table *sgt)
+{
+	return -EINVAL;
+}
 #endif /* CONFIG_HAS_DMA */
 
 struct page *dma_alloc_pages(struct device *dev, size_t size,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 5e87dac6cc6d9a..5a62439ed483af 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -537,6 +537,109 @@ int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(dma_mmap_pages);
 
+static struct sg_table *alloc_single_sgt(struct device *dev, size_t size,
+		enum dma_data_direction dir, gfp_t gfp)
+{
+	struct sg_table *sgt;
+	struct page *page;
+
+	sgt = kmalloc(sizeof(*sgt), gfp);
+	if (!sgt)
+		return NULL;
+	if (sg_alloc_table(sgt, 1, gfp))
+		goto out_free_sgt;
+	page = __dma_alloc_pages(dev, size, &sgt->sgl->dma_address, dir, gfp);
+	if (!page)
+		goto out_free_table;
+	sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+	sg_dma_len(sgt->sgl) = sgt->sgl->length;
+	return sgt;
+out_free_table:
+	sg_free_table(sgt);
+out_free_sgt:
+	kfree(sgt);
+	return NULL;
+}
+
+struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
+		enum dma_data_direction dir, gfp_t gfp)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	struct sg_table *sgt;
+
+	if (ops && ops->alloc_noncontiguous)
+		sgt = ops->alloc_noncontiguous(dev, size, dir, gfp);
+	else
+		sgt = alloc_single_sgt(dev, size, dir, gfp);
+
+	if (sgt) {
+		sgt->nents = 1;
+		debug_dma_map_sg(dev, sgt->sgl, sgt->orig_nents, 1, dir);
+	}
+	return sgt;
+}
+EXPORT_SYMBOL_GPL(dma_alloc_noncontiguous);
+
+static void free_single_sgt(struct device *dev, size_t size,
+		struct sg_table *sgt, enum dma_data_direction dir)
+{
+	__dma_free_pages(dev, size, sg_page(sgt->sgl), sgt->sgl->dma_address,
+			 dir);
+	sg_free_table(sgt);
+	kfree(sgt);
+}
+
+void dma_free_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt, enum dma_data_direction dir)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	debug_dma_unmap_sg(dev, sgt->sgl, sgt->orig_nents, dir);
+	if (ops && ops->free_noncontiguous)
+		ops->free_noncontiguous(dev, size, sgt, dir);
+	else
+		free_single_sgt(dev, size, sgt, dir);
+}
+EXPORT_SYMBOL_GPL(dma_free_noncontiguous);
+
+void *dma_vmap_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	if (ops && ops->alloc_noncontiguous)
+		return vmap(sgt_handle(sgt)->pages, count, VM_MAP, PAGE_KERNEL);
+	return page_address(sg_page(sgt->sgl));
+}
+EXPORT_SYMBOL_GPL(dma_vmap_noncontiguous);
+
+void dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	if (ops && ops->alloc_noncontiguous)
+		vunmap(vaddr);
+}
+EXPORT_SYMBOL_GPL(dma_vunmap_noncontiguous);
+
+int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
+		size_t size, struct sg_table *sgt)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	if (ops && ops->alloc_noncontiguous) {
+		unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+		if (vma->vm_pgoff >= count ||
+		    vma_pages(vma) > count - vma->vm_pgoff)
+			return -ENXIO;
+		return vm_map_pages(vma, sgt_handle(sgt)->pages, count);
+	}
+	return dma_mmap_pages(dev, vma, size, sg_page(sgt->sgl));
+}
+EXPORT_SYMBOL_GPL(dma_mmap_noncontiguous);
+
 int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/7] dma-iommu: refactor iommu_dma_alloc_remap
  2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2021-02-02  9:51 ` [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API Christoph Hellwig
@ 2021-02-02  9:51 ` Christoph Hellwig
  2021-02-02  9:51 ` [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-02  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, Robin Murphy, linux-kernel, linux-doc

Split out a new helper that only allocates a sg_table worth of
memory without mapping it into contiguous kernel address space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/dma-iommu.c | 67 ++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 255533faf90599..85cb004d7a44c6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -661,23 +661,12 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
 	return pages;
 }
 
-/**
- * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
- * @dev: Device to allocate memory for. Must be a real device
- *	 attached to an iommu_dma_domain
- * @size: Size of buffer in bytes
- * @dma_handle: Out argument for allocated DMA handle
- * @gfp: Allocation flags
- * @prot: pgprot_t to use for the remapped mapping
- * @attrs: DMA attributes for this allocation
- *
- * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
+/*
+ * If size is less than PAGE_SIZE, then a full CPU page will be allocated,
  * but an IOMMU which supports smaller pages might not map the whole thing.
- *
- * Return: Mapped virtual address, or NULL on failure.
  */
-static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
-		dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
+static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
+		size_t size, struct sg_table *sgt, gfp_t gfp, pgprot_t prot,
 		unsigned long attrs)
 {
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -687,11 +676,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
 	int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
 	unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
 	struct page **pages;
-	struct sg_table sgt;
 	dma_addr_t iova;
-	void *vaddr;
-
-	*dma_handle = DMA_MAPPING_ERROR;
 
 	if (unlikely(iommu_dma_deferred_attach(dev, domain)))
 		return NULL;
@@ -717,38 +702,56 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
 	if (!iova)
 		goto out_free_pages;
 
-	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
+	if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
 		goto out_free_iova;
 
 	if (!(ioprot & IOMMU_CACHE)) {
 		struct scatterlist *sg;
 		int i;
 
-		for_each_sg(sgt.sgl, sg, sgt.orig_nents, i)
+		for_each_sg(sgt->sgl, sg, sgt->orig_nents, i)
 			arch_dma_prep_coherent(sg_page(sg), sg->length);
 	}
 
-	if (iommu_map_sg_atomic(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
+	if (iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt->orig_nents, ioprot)
 			< size)
 		goto out_free_sg;
 
+	sgt->sgl->dma_address = iova;
+	return pages;
+
+out_free_sg:
+	sg_free_table(sgt);
+out_free_iova:
+	iommu_dma_free_iova(cookie, iova, size, NULL);
+out_free_pages:
+	__iommu_dma_free_pages(pages, count);
+	return NULL;
+}
+
+static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
+		unsigned long attrs)
+{
+	struct page **pages;
+	struct sg_table sgt;
+	void *vaddr;
+
+	pages = __iommu_dma_alloc_noncontiguous(dev, size, &sgt, gfp, prot,
+						attrs);
+	if (!pages)
+		return NULL;
+	*dma_handle = sgt.sgl->dma_address;
+	sg_free_table(&sgt);
 	vaddr = dma_common_pages_remap(pages, size, prot,
 			__builtin_return_address(0));
 	if (!vaddr)
 		goto out_unmap;
-
-	*dma_handle = iova;
-	sg_free_table(&sgt);
 	return vaddr;
 
 out_unmap:
-	__iommu_dma_unmap(dev, iova, size);
-out_free_sg:
-	sg_free_table(&sgt);
-out_free_iova:
-	iommu_dma_free_iova(cookie, iova, size, NULL);
-out_free_pages:
-	__iommu_dma_free_pages(pages, count);
+	__iommu_dma_unmap(dev, *dma_handle, size);
+	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
 	return NULL;
 }
 
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous
  2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2021-02-02  9:51 ` [PATCH 5/7] dma-iommu: refactor iommu_dma_alloc_remap Christoph Hellwig
@ 2021-02-02  9:51 ` Christoph Hellwig
  2021-02-16  8:14   ` Tomasz Figa
  2021-02-02  9:51 ` [PATCH 7/7] media: uvcvideo: Use dma_alloc_noncontiguos API Christoph Hellwig
  2021-02-07 18:48 ` add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-02  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, Robin Murphy, linux-kernel, linux-doc

Implement support for allocating a non-contiguous DMA region.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/iommu/dma-iommu.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 85cb004d7a44c6..4e0b170d38d57a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -718,6 +718,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
 		goto out_free_sg;
 
 	sgt->sgl->dma_address = iova;
+	sgt->sgl->dma_length = size;
 	return pages;
 
 out_free_sg:
@@ -755,6 +756,36 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
 	return NULL;
 }
 
+#ifdef CONFIG_DMA_REMAP
+static struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev,
+		size_t size, enum dma_data_direction dir, gfp_t gfp)
+{
+	struct dma_sgt_handle *sh;
+
+	sh = kmalloc(sizeof(*sh), gfp);
+	if (!sh)
+		return NULL;
+
+	sh->pages = __iommu_dma_alloc_noncontiguous(dev, size, &sh->sgt, gfp,
+						    PAGE_KERNEL, 0);
+	if (!sh->pages) {
+		kfree(sh);
+		return NULL;
+	}
+	return &sh->sgt;
+}
+
+static void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt, enum dma_data_direction dir)
+{
+	struct dma_sgt_handle *sh = sgt_handle(sgt);
+
+	__iommu_dma_unmap(dev, sgt->sgl->dma_address, size);
+	__iommu_dma_free_pages(sh->pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+	sg_free_table(&sh->sgt);
+}
+#endif /* CONFIG_DMA_REMAP */
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -1270,6 +1301,10 @@ static const struct dma_map_ops iommu_dma_ops = {
 	.free			= iommu_dma_free,
 	.alloc_pages		= dma_common_alloc_pages,
 	.free_pages		= dma_common_free_pages,
+#ifdef CONFIG_DMA_REMAP
+	.alloc_noncontiguous	= iommu_dma_alloc_noncontiguous,
+	.free_noncontiguous	= iommu_dma_free_noncontiguous,
+#endif
 	.mmap			= iommu_dma_mmap,
 	.get_sgtable		= iommu_dma_get_sgtable,
 	.map_page		= iommu_dma_map_page,
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/7] media: uvcvideo: Use dma_alloc_noncontiguos API
  2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2021-02-02  9:51 ` [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous Christoph Hellwig
@ 2021-02-02  9:51 ` Christoph Hellwig
  2021-02-04  7:39   ` Hillf Danton
  2021-02-07 18:48 ` add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-02  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, Robin Murphy, linux-kernel, linux-doc

From: Ricardo Ribalda <ribalda@chromium.org>

On architectures where the is no coherent caching such as ARM use the
dma_alloc_noncontiguos API and handle manually the cache flushing using
dma_sync_sgtable().

With this patch on the affected architectures we can measure up to 20x
performance improvement in uvc_video_copy_data_work().

Eg: aarch64 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 67034480 : duration 33303
FPS: 29.99
URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
raw decode speed: 9.931 Gbits/s
raw URB handling speed: 1.025 Gbits/s
throughput: 16.102 Mbits/s
URB decode CPU usage 0.162600 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 54683536 : duration 33302
FPS: 29.99
URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max (uS)
raw decode speed: 365.470 Mbits/s
raw URB handling speed: 295.986 Mbits/s
throughput: 13.136 Mbits/s
URB decode CPU usage 3.594500 %

Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/media/usb/uvc/uvc_video.c | 79 ++++++++++++++++++++++---------
 drivers/media/usb/uvc/uvcvideo.h  |  4 +-
 2 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index a6a441d92b9488..0a7d287dc41528 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -6,11 +6,13 @@
  *          Laurent Pinchart (laurent.pinchart@ideasonboard.com)
  */
 
+#include <linux/highmem.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/usb.h>
+#include <linux/usb/hcd.h>
 #include <linux/videodev2.h>
 #include <linux/vmalloc.h>
 #include <linux/wait.h>
@@ -1097,6 +1099,26 @@ static int uvc_video_decode_start(struct uvc_streaming *stream,
 	return data[0];
 }
 
+static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
+{
+	return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
+}
+
+static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
+{
+	struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream);
+
+	if (for_device) {
+		dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
+					    DMA_FROM_DEVICE);
+	} else {
+		dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
+					 DMA_FROM_DEVICE);
+		invalidate_kernel_vmap_range(uvc_urb->buffer,
+					     uvc_urb->stream->urb_size);
+	}
+}
+
 /*
  * uvc_video_decode_data_work: Asynchronous memcpy processing
  *
@@ -1118,6 +1140,8 @@ static void uvc_video_copy_data_work(struct work_struct *work)
 		uvc_queue_buffer_release(op->buf);
 	}
 
+	uvc_urb_dma_sync(uvc_urb, true);
+
 	ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
 	if (ret < 0)
 		uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
@@ -1539,10 +1563,12 @@ static void uvc_video_complete(struct urb *urb)
 	 * Process the URB headers, and optionally queue expensive memcpy tasks
 	 * to be deferred to a work queue.
 	 */
+	uvc_urb_dma_sync(uvc_urb, false);
 	stream->decode(uvc_urb, buf, buf_meta);
 
 	/* If no async work is needed, resubmit the URB immediately. */
 	if (!uvc_urb->async_operations) {
+		uvc_urb_dma_sync(uvc_urb, true);
 		ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
 		if (ret < 0)
 			uvc_printk(KERN_ERR,
@@ -1559,24 +1585,46 @@ static void uvc_video_complete(struct urb *urb)
  */
 static void uvc_free_urb_buffers(struct uvc_streaming *stream)
 {
+	struct device *dma_dev = dma_dev = stream_to_dmadev(stream);
 	struct uvc_urb *uvc_urb;
 
 	for_each_uvc_urb(uvc_urb, stream) {
 		if (!uvc_urb->buffer)
 			continue;
 
-#ifndef CONFIG_DMA_NONCOHERENT
-		usb_free_coherent(stream->dev->udev, stream->urb_size,
-				  uvc_urb->buffer, uvc_urb->dma);
-#else
-		kfree(uvc_urb->buffer);
-#endif
+		dma_vunmap_noncontiguous(dma_dev, uvc_urb->buffer);
+		dma_free_noncontiguous(dma_dev, stream->urb_size, uvc_urb->sgt,
+				       DMA_FROM_DEVICE);
+
 		uvc_urb->buffer = NULL;
 	}
 
 	stream->urb_size = 0;
 }
 
+static bool uvc_alloc_urb_buffer(struct uvc_streaming *stream,
+				 struct uvc_urb *uvc_urb, gfp_t gfp_flags)
+{
+	struct device *dma_dev = stream_to_dmadev(stream);
+
+
+	uvc_urb->sgt = dma_alloc_noncontiguous(dma_dev, stream->urb_size,
+					       DMA_FROM_DEVICE, gfp_flags);
+	if (!uvc_urb->sgt)
+		return false;
+	uvc_urb->dma = uvc_urb->sgt->sgl->dma_address;
+
+	uvc_urb->buffer = dma_vmap_noncontiguous(dma_dev, stream->urb_size,
+						 uvc_urb->sgt);
+	if (!uvc_urb->buffer) {
+		dma_free_noncontiguous(dma_dev, stream->urb_size,
+				       uvc_urb->sgt, DMA_FROM_DEVICE);
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * Allocate transfer buffers. This function can be called with buffers
  * already allocated when resuming from suspend, in which case it will
@@ -1607,19 +1655,11 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
 
 	/* Retry allocations until one succeed. */
 	for (; npackets > 1; npackets /= 2) {
+		stream->urb_size = psize * npackets;
 		for (i = 0; i < UVC_URBS; ++i) {
 			struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
 
-			stream->urb_size = psize * npackets;
-#ifndef CONFIG_DMA_NONCOHERENT
-			uvc_urb->buffer = usb_alloc_coherent(
-				stream->dev->udev, stream->urb_size,
-				gfp_flags | __GFP_NOWARN, &uvc_urb->dma);
-#else
-			uvc_urb->buffer =
-			    kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
-#endif
-			if (!uvc_urb->buffer) {
+			if (!uvc_alloc_urb_buffer(stream, uvc_urb, gfp_flags)) {
 				uvc_free_urb_buffers(stream);
 				break;
 			}
@@ -1728,12 +1768,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
 		urb->context = uvc_urb;
 		urb->pipe = usb_rcvisocpipe(stream->dev->udev,
 				ep->desc.bEndpointAddress);
-#ifndef CONFIG_DMA_NONCOHERENT
 		urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
 		urb->transfer_dma = uvc_urb->dma;
-#else
-		urb->transfer_flags = URB_ISO_ASAP;
-#endif
 		urb->interval = ep->desc.bInterval;
 		urb->transfer_buffer = uvc_urb->buffer;
 		urb->complete = uvc_video_complete;
@@ -1793,10 +1829,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
 
 		usb_fill_bulk_urb(urb, stream->dev->udev, pipe,	uvc_urb->buffer,
 				  size, uvc_video_complete, uvc_urb);
-#ifndef CONFIG_DMA_NONCOHERENT
 		urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
 		urb->transfer_dma = uvc_urb->dma;
-#endif
 
 		uvc_urb->urb = urb;
 	}
@@ -1891,6 +1925,7 @@ static int uvc_video_start_transfer(struct uvc_streaming *stream,
 
 	/* Submit the URBs. */
 	for_each_uvc_urb(uvc_urb, stream) {
+		uvc_urb_dma_sync(uvc_urb, true);
 		ret = usb_submit_urb(uvc_urb->urb, gfp_flags);
 		if (ret < 0) {
 			uvc_printk(KERN_ERR, "Failed to submit URB %u (%d).\n",
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index a3dfacf069c44d..a386114bd22999 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -521,7 +521,8 @@ struct uvc_copy_op {
  * @urb: the URB described by this context structure
  * @stream: UVC streaming context
  * @buffer: memory storage for the URB
- * @dma: DMA coherent addressing for the urb_buffer
+ * @dma: Allocated DMA handle
+ * @sgt: sgt_table with the urb locations in memory
  * @async_operations: counter to indicate the number of copy operations
  * @copy_operations: work descriptors for asynchronous copy operations
  * @work: work queue entry for asynchronous decode
@@ -532,6 +533,7 @@ struct uvc_urb {
 
 	char *buffer;
 	dma_addr_t dma;
+	struct sg_table *sgt;
 
 	unsigned int async_operations;
 	struct uvc_copy_op copy_operations[UVC_MAX_PACKETS];
-- 
2.29.2

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/7] media: uvcvideo: Use dma_alloc_noncontiguos API
  2021-02-02  9:51 ` [PATCH 7/7] media: uvcvideo: Use dma_alloc_noncontiguos API Christoph Hellwig
@ 2021-02-04  7:39   ` Hillf Danton
  0 siblings, 0 replies; 25+ messages in thread
From: Hillf Danton @ 2021-02-04  7:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sven Van Asbroeck, Mauro Carvalho Chehab, linux-media, linux-doc,
	linux-kernel, iommu, Ricardo Ribalda, Sergey Senozhatsky,
	Robin Murphy

Tue,  2 Feb 2021 10:51:10 +0100
> From: Ricardo Ribalda <ribalda@chromium.org>
> 
> On architectures where the is no coherent caching such as ARM use the
> dma_alloc_noncontiguos API and handle manually the cache flushing using
> dma_sync_sgtable().
> 
> With this patch on the affected architectures we can measure up to 20x
> performance improvement in uvc_video_copy_data_work().
> 
> Eg: aarch64 with an external usb camera
> 
> NON_CONTIGUOUS
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 67034480 : duration 33303
> FPS: 29.99
> URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
> header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
> latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
> decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
> raw decode speed: 9.931 Gbits/s
> raw URB handling speed: 1.025 Gbits/s
> throughput: 16.102 Mbits/s
> URB decode CPU usage 0.162600 %
> 
> COHERENT
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 54683536 : duration 33302
> FPS: 29.99
> URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
> header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
> latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
> decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max (uS)
> raw decode speed: 365.470 Mbits/s
> raw URB handling speed: 295.986 Mbits/s
> throughput: 13.136 Mbits/s
> URB decode CPU usage 3.594500 %
> 
> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/media/usb/uvc/uvc_video.c | 79 ++++++++++++++++++++++---------
>  drivers/media/usb/uvc/uvcvideo.h  |  4 +-
>  2 files changed, 60 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
> index a6a441d92b9488..0a7d287dc41528 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -6,11 +6,13 @@
>   *          Laurent Pinchart (laurent.pinchart@ideasonboard.com)
>   */
>  
> +#include <linux/highmem.h>
>  #include <linux/kernel.h>
>  #include <linux/list.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>
>  #include <linux/usb.h>
> +#include <linux/usb/hcd.h>
>  #include <linux/videodev2.h>
>  #include <linux/vmalloc.h>
>  #include <linux/wait.h>
> @@ -1097,6 +1099,26 @@ static int uvc_video_decode_start(struct uvc_streaming *stream,
>  	return data[0];
>  }
>  
> +static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
> +{
> +	return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> +}
> +
> +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
> +{
> +	struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream);
> +
> +	if (for_device) {
> +		dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
> +					    DMA_FROM_DEVICE);
> +	} else {
> +		dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
> +					 DMA_FROM_DEVICE);
> +		invalidate_kernel_vmap_range(uvc_urb->buffer,
> +					     uvc_urb->stream->urb_size);
> +	}
> +}
> +
>  /*
>   * uvc_video_decode_data_work: Asynchronous memcpy processing
>   *
> @@ -1118,6 +1140,8 @@ static void uvc_video_copy_data_work(struct work_struct *work)
>  		uvc_queue_buffer_release(op->buf);
>  	}
>  
> +	uvc_urb_dma_sync(uvc_urb, true);
> +
>  	ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
>  	if (ret < 0)
>  		uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
> @@ -1539,10 +1563,12 @@ static void uvc_video_complete(struct urb *urb)
>  	 * Process the URB headers, and optionally queue expensive memcpy tasks
>  	 * to be deferred to a work queue.
>  	 */
> +	uvc_urb_dma_sync(uvc_urb, false);
>  	stream->decode(uvc_urb, buf, buf_meta);
>  
>  	/* If no async work is needed, resubmit the URB immediately. */
>  	if (!uvc_urb->async_operations) {
> +		uvc_urb_dma_sync(uvc_urb, true);
>  		ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
>  		if (ret < 0)
>  			uvc_printk(KERN_ERR,
> @@ -1559,24 +1585,46 @@ static void uvc_video_complete(struct urb *urb)
>   */
>  static void uvc_free_urb_buffers(struct uvc_streaming *stream)
>  {
> +	struct device *dma_dev = dma_dev = stream_to_dmadev(stream);
>  	struct uvc_urb *uvc_urb;
>  
>  	for_each_uvc_urb(uvc_urb, stream) {
>  		if (!uvc_urb->buffer)
>  			continue;
>  
> -#ifndef CONFIG_DMA_NONCOHERENT
> -		usb_free_coherent(stream->dev->udev, stream->urb_size,
> -				  uvc_urb->buffer, uvc_urb->dma);
> -#else
> -		kfree(uvc_urb->buffer);
> -#endif
> +		dma_vunmap_noncontiguous(dma_dev, uvc_urb->buffer);
> +		dma_free_noncontiguous(dma_dev, stream->urb_size, uvc_urb->sgt,
> +				       DMA_FROM_DEVICE);
> +
>  		uvc_urb->buffer = NULL;
>  	}
>  
>  	stream->urb_size = 0;
>  }
>  
> +static bool uvc_alloc_urb_buffer(struct uvc_streaming *stream,
> +				 struct uvc_urb *uvc_urb, gfp_t gfp_flags)
> +{
> +	struct device *dma_dev = stream_to_dmadev(stream);
> +
> +
> +	uvc_urb->sgt = dma_alloc_noncontiguous(dma_dev, stream->urb_size,
> +					       DMA_FROM_DEVICE, gfp_flags);
> +	if (!uvc_urb->sgt)
> +		return false;
> +	uvc_urb->dma = uvc_urb->sgt->sgl->dma_address;
> +
> +	uvc_urb->buffer = dma_vmap_noncontiguous(dma_dev, stream->urb_size,
> +						 uvc_urb->sgt);
> +	if (!uvc_urb->buffer) {
> +		dma_free_noncontiguous(dma_dev, stream->urb_size,
> +				       uvc_urb->sgt, DMA_FROM_DEVICE);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
>  /*
>   * Allocate transfer buffers. This function can be called with buffers
>   * already allocated when resuming from suspend, in which case it will
> @@ -1607,19 +1655,11 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
>  
>  	/* Retry allocations until one succeed. */
>  	for (; npackets > 1; npackets /= 2) {
> +		stream->urb_size = psize * npackets;
>  		for (i = 0; i < UVC_URBS; ++i) {
>  			struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
>  
> -			stream->urb_size = psize * npackets;
> -#ifndef CONFIG_DMA_NONCOHERENT
> -			uvc_urb->buffer = usb_alloc_coherent(
> -				stream->dev->udev, stream->urb_size,
> -				gfp_flags | __GFP_NOWARN, &uvc_urb->dma);
> -#else
> -			uvc_urb->buffer =
> -			    kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
> -#endif
> -			if (!uvc_urb->buffer) {
> +			if (!uvc_alloc_urb_buffer(stream, uvc_urb, gfp_flags)) {
>  				uvc_free_urb_buffers(stream);
>  				break;
>  			}
> @@ -1728,12 +1768,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
>  		urb->context = uvc_urb;
>  		urb->pipe = usb_rcvisocpipe(stream->dev->udev,
>  				ep->desc.bEndpointAddress);
> -#ifndef CONFIG_DMA_NONCOHERENT
>  		urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
>  		urb->transfer_dma = uvc_urb->dma;
> -#else
> -		urb->transfer_flags = URB_ISO_ASAP;
> -#endif
>  		urb->interval = ep->desc.bInterval;
>  		urb->transfer_buffer = uvc_urb->buffer;
>  		urb->complete = uvc_video_complete;
> @@ -1793,10 +1829,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
>  
>  		usb_fill_bulk_urb(urb, stream->dev->udev, pipe,	uvc_urb->buffer,
>  				  size, uvc_video_complete, uvc_urb);
> -#ifndef CONFIG_DMA_NONCOHERENT
>  		urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
>  		urb->transfer_dma = uvc_urb->dma;
> -#endif
>  
>  		uvc_urb->urb = urb;
>  	}
> @@ -1891,6 +1925,7 @@ static int uvc_video_start_transfer(struct uvc_streaming *stream,
>  
>  	/* Submit the URBs. */
>  	for_each_uvc_urb(uvc_urb, stream) {
> +		uvc_urb_dma_sync(uvc_urb, true);
>  		ret = usb_submit_urb(uvc_urb->urb, gfp_flags);
>  		if (ret < 0) {
>  			uvc_printk(KERN_ERR, "Failed to submit URB %u (%d).\n",
> diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> index a3dfacf069c44d..a386114bd22999 100644
> --- a/drivers/media/usb/uvc/uvcvideo.h
> +++ b/drivers/media/usb/uvc/uvcvideo.h
> @@ -521,7 +521,8 @@ struct uvc_copy_op {
>   * @urb: the URB described by this context structure
>   * @stream: UVC streaming context
>   * @buffer: memory storage for the URB
> - * @dma: DMA coherent addressing for the urb_buffer
> + * @dma: Allocated DMA handle
> + * @sgt: sgt_table with the urb locations in memory
>   * @async_operations: counter to indicate the number of copy operations
>   * @copy_operations: work descriptors for asynchronous copy operations
>   * @work: work queue entry for asynchronous decode
> @@ -532,6 +533,7 @@ struct uvc_urb {
>  
>  	char *buffer;
>  	dma_addr_t dma;
> +	struct sg_table *sgt;
>  
>  	unsigned int async_operations;
>  	struct uvc_copy_op copy_operations[UVC_MAX_PACKETS];
> -- 
> 2.29.2
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: add a new dma_alloc_noncontiguous API v2
  2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2021-02-02  9:51 ` [PATCH 7/7] media: uvcvideo: Use dma_alloc_noncontiguos API Christoph Hellwig
@ 2021-02-07 18:48 ` Christoph Hellwig
  2021-02-08 11:33   ` Tomasz Figa
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-07 18:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Marek Szyprowski, Tomasz Figa,
	Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-doc, Robin Murphy, linux-kernel, linux-media

Any comments?

On Tue, Feb 02, 2021 at 10:51:03AM +0100, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds the new noncontiguous DMA allocation API requested by
> various media driver maintainers.
> 
> Changes since v1:
>  - document that flush_kernel_vmap_range and invalidate_kernel_vmap_range
>    must be called once an allocation is mapped into KVA
>  - add dma-debug support
>  - remove the separate dma_handle argument, and instead create fully formed
>    DMA mapped scatterlists
>  - use a directional allocation in uvcvideo
>  - call invalidate_kernel_vmap_range from uvcvideo
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: add a new dma_alloc_noncontiguous API v2
  2021-02-07 18:48 ` add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
@ 2021-02-08 11:33   ` Tomasz Figa
       [not found]     ` <20210209082213.GA31902@lst.de>
  0 siblings, 1 reply; 25+ messages in thread
From: Tomasz Figa @ 2021-02-08 11:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mauro Carvalho Chehab, Linux Doc Mailing List,
	Linux Kernel Mailing List, open list:IOMMU DRIVERS,
	Ricardo Ribalda, Sergey Senozhatsky, Robin Murphy,
	Linux Media Mailing List

Hi Christoph,

On Mon, Feb 8, 2021 at 3:49 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Any comments?
>

Sorry for the delay. The whole series looks very good to me. Thanks a lot.

Reviewed-by: Tomasz Figa <tfiga@chromium.org>

Best regards,
Tomasz

> On Tue, Feb 02, 2021 at 10:51:03AM +0100, Christoph Hellwig wrote:
> > Hi all,
> >
> > this series adds the new noncontiguous DMA allocation API requested by
> > various media driver maintainers.
> >
> > Changes since v1:
> >  - document that flush_kernel_vmap_range and invalidate_kernel_vmap_range
> >    must be called once an allocation is mapped into KVA
> >  - add dma-debug support
> >  - remove the separate dma_handle argument, and instead create fully formed
> >    DMA mapped scatterlists
> >  - use a directional allocation in uvcvideo
> >  - call invalidate_kernel_vmap_range from uvcvideo
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> ---end quoted text---
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: add a new dma_alloc_noncontiguous API v2
       [not found]     ` <20210209082213.GA31902@lst.de>
@ 2021-02-09  8:29       ` Ricardo Ribalda
  2021-02-09 14:46         ` Ricardo Ribalda
  0 siblings, 1 reply; 25+ messages in thread
From: Ricardo Ribalda @ 2021-02-09  8:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sergey Senozhatsky, Linux Doc Mailing List,
	Linux Kernel Mailing List, Linux Media Mailing List,
	open list:IOMMU DRIVERS, Mauro Carvalho Chehab, Robin Murphy

Hi Christoph

On Tue, Feb 9, 2021 at 9:22 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Mon, Feb 08, 2021 at 08:33:50PM +0900, Tomasz Figa wrote:
> > Sorry for the delay. The whole series looks very good to me. Thanks a lot.
> >
> > Reviewed-by: Tomasz Figa <tfiga@chromium.org>
>
> Thanks.
>
> Ricardo, do the uvcvideo changes look good to you?  I'd like to queue
> the series up for this merge window.

Let me test them in real hardware today.

Thanks!


-- 
Ricardo Ribalda
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: add a new dma_alloc_noncontiguous API v2
  2021-02-09  8:29       ` Ricardo Ribalda
@ 2021-02-09 14:46         ` Ricardo Ribalda
       [not found]           ` <20210209170217.GA10199@lst.de>
  0 siblings, 1 reply; 25+ messages in thread
From: Ricardo Ribalda @ 2021-02-09 14:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sergey Senozhatsky, Linux Doc Mailing List,
	Linux Kernel Mailing List, Linux Media Mailing List,
	open list:IOMMU DRIVERS, Mauro Carvalho Chehab, Robin Murphy

Hi Christoph

I have tested it in both arm and x86, since there are not significant
changes with the previous version I did not do a performance test.

Thanks!


On Tue, Feb 9, 2021 at 9:29 AM Ricardo Ribalda <ribalda@chromium.org> wrote:
>
> Hi Christoph
>
> On Tue, Feb 9, 2021 at 9:22 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > On Mon, Feb 08, 2021 at 08:33:50PM +0900, Tomasz Figa wrote:
> > > Sorry for the delay. The whole series looks very good to me. Thanks a lot.
> > >
> > > Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> >
> > Thanks.
> >
> > Ricardo, do the uvcvideo changes look good to you?  I'd like to queue
> > the series up for this merge window.

Tested-by: Ricardo Ribalda <ribalda@chromium.org>

>
> Let me test them in real hardware today.
>
> Thanks!
>
>
> --
> Ricardo Ribalda



-- 
Ricardo Ribalda
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: add a new dma_alloc_noncontiguous API v2
       [not found]           ` <20210209170217.GA10199@lst.de>
@ 2021-02-11  9:08             ` Ricardo Ribalda
  2021-02-11 13:06               ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Ricardo Ribalda @ 2021-02-11  9:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sergey Senozhatsky, Linux Doc Mailing List,
	Linux Kernel Mailing List, Linux Media Mailing List,
	open list:IOMMU DRIVERS, Laurent Pinchart, Mauro Carvalho Chehab,
	Robin Murphy

Hi Christoph

What are your merge plans for the uvc change?
http://git.infradead.org/users/hch/dma-mapping.git/commit/3dc47131f8aacc2093f68a9971d24c754e435520

Are you going to remove the patch on your Merge request and then send
it for review to Laurent? or merge it through your tree with a S-o-B
him?

Thanks

On Tue, Feb 9, 2021 at 6:02 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Feb 09, 2021 at 03:46:13PM +0100, Ricardo Ribalda wrote:
> > Hi Christoph
> >
> > I have tested it in both arm and x86, since there are not significant
> > changes with the previous version I did not do a performance test.
>
> I'll take this as a Tested-by.



-- 
Ricardo Ribalda
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: add a new dma_alloc_noncontiguous API v2
  2021-02-11  9:08             ` Ricardo Ribalda
@ 2021-02-11 13:06               ` Christoph Hellwig
  2021-02-11 13:20                 ` Ricardo Ribalda
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-11 13:06 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Sergey Senozhatsky, Linux Doc Mailing List,
	Linux Kernel Mailing List, Linux Media Mailing List,
	open list:IOMMU DRIVERS, Laurent Pinchart, Mauro Carvalho Chehab,
	Robin Murphy, Christoph Hellwig

On Thu, Feb 11, 2021 at 10:08:18AM +0100, Ricardo Ribalda wrote:
> Hi Christoph
> 
> What are your merge plans for the uvc change?
> http://git.infradead.org/users/hch/dma-mapping.git/commit/3dc47131f8aacc2093f68a9971d24c754e435520
> 
> Are you going to remove the patch on your Merge request and then send
> it for review to Laurent? or merge it through your tree with a S-o-B
> him?

I though I had all the ACKs to queue it up.  Is that not what was
intended?  Queueing up the API without a user is generally a bad idea.

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: add a new dma_alloc_noncontiguous API v2
  2021-02-11 13:06               ` Christoph Hellwig
@ 2021-02-11 13:20                 ` Ricardo Ribalda
  2021-02-11 13:55                   ` Laurent Pinchart
  0 siblings, 1 reply; 25+ messages in thread
From: Ricardo Ribalda @ 2021-02-11 13:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sergey Senozhatsky, Linux Doc Mailing List,
	Linux Kernel Mailing List, Linux Media Mailing List,
	open list:IOMMU DRIVERS, Laurent Pinchart, Mauro Carvalho Chehab,
	Robin Murphy

HI Christoph

On Thu, Feb 11, 2021 at 2:06 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Feb 11, 2021 at 10:08:18AM +0100, Ricardo Ribalda wrote:
> > Hi Christoph
> >
> > What are your merge plans for the uvc change?
> > http://git.infradead.org/users/hch/dma-mapping.git/commit/3dc47131f8aacc2093f68a9971d24c754e435520
> >
> > Are you going to remove the patch on your Merge request and then send
> > it for review to Laurent? or merge it through your tree with a S-o-B
> > him?
>
> I though I had all the ACKs to queue it up.  Is that not what was
> intended?  Queueing up the API without a user is generally a bad idea.
>

I am pretty sure we need the ack from Laurent. He maintains uvc

@Laurent Pinchart what are your thoughts?

Thanks!


-- 
Ricardo Ribalda
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: add a new dma_alloc_noncontiguous API v2
  2021-02-11 13:20                 ` Ricardo Ribalda
@ 2021-02-11 13:55                   ` Laurent Pinchart
  0 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2021-02-11 13:55 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Sergey Senozhatsky, Linux Doc Mailing List,
	Linux Kernel Mailing List, Linux Media Mailing List,
	open list:IOMMU DRIVERS, Mauro Carvalho Chehab, Robin Murphy,
	Christoph Hellwig

Hi Ricardo,

On Thu, Feb 11, 2021 at 02:20:30PM +0100, Ricardo Ribalda wrote:
> On Thu, Feb 11, 2021 at 2:06 PM Christoph Hellwig <hch@lst.de> wrote:
> > On Thu, Feb 11, 2021 at 10:08:18AM +0100, Ricardo Ribalda wrote:
> > > Hi Christoph
> > >
> > > What are your merge plans for the uvc change?
> > > http://git.infradead.org/users/hch/dma-mapping.git/commit/3dc47131f8aacc2093f68a9971d24c754e435520
> > >
> > > Are you going to remove the patch on your Merge request and then send
> > > it for review to Laurent? or merge it through your tree with a S-o-B
> > > him?
> >
> > I though I had all the ACKs to queue it up.  Is that not what was
> > intended?  Queueing up the API without a user is generally a bad idea.
> >
> 
> I am pretty sure we need the ack from Laurent. He maintains uvc
> 
> @Laurent Pinchart what are your thoughts?

I think it would have been nice to CC me on the patch in the first place
:-) I won't have time to review the series before next week at the
earliest.

-- 
Regards,

Laurent Pinchart
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous
  2021-02-02  9:51 ` [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous Christoph Hellwig
@ 2021-02-16  8:14   ` Tomasz Figa
  2021-02-16  8:49     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Tomasz Figa @ 2021-02-16  8:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mauro Carvalho Chehab, Linux Media Mailing List,
	Linux Doc Mailing List, Linux Kernel Mailing List,
	open list:IOMMU DRIVERS, Ricardo Ribalda, Sergey Senozhatsky,
	Robin Murphy

Hi Christoph


On Tue, Feb 2, 2021 at 6:51 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Implement support for allocating a non-contiguous DMA region.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/iommu/dma-iommu.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 85cb004d7a44c6..4e0b170d38d57a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -718,6 +718,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
>                 goto out_free_sg;
>
>         sgt->sgl->dma_address = iova;
> +       sgt->sgl->dma_length = size;
>         return pages;
>
>  out_free_sg:
> @@ -755,6 +756,36 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
>         return NULL;
>  }
>
> +#ifdef CONFIG_DMA_REMAP
> +static struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev,
> +               size_t size, enum dma_data_direction dir, gfp_t gfp)
> +{
> +       struct dma_sgt_handle *sh;
> +
> +       sh = kmalloc(sizeof(*sh), gfp);
> +       if (!sh)
> +               return NULL;
> +
> +       sh->pages = __iommu_dma_alloc_noncontiguous(dev, size, &sh->sgt, gfp,
> +                                                   PAGE_KERNEL, 0);

When working on the videobuf2 integration with Sergey I noticed that
we always pass 0 as DMA attrs here, which removes the ability for
drivers to use DMA_ATTR_ALLOC_SINGLE_PAGES.

It's quite important from a system stability point of view, because by
default the iommu_dma allocator would prefer big order allocations for
TLB locality reasons. For many devices, though, it doesn't really
affect the performance, because of random access patterns, so single
pages are good enough and reduce the risk of allocation failures or
latency due to fragmentation.

Do you think we could add the attrs parameter to the
dma_alloc_noncontiguous() API?

Best regards,
Tomasz
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous
  2021-02-16  8:14   ` Tomasz Figa
@ 2021-02-16  8:49     ` Christoph Hellwig
  2021-03-01  7:17       ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-02-16  8:49 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Mauro Carvalho Chehab, Linux Media Mailing List,
	Linux Doc Mailing List, Linux Kernel Mailing List,
	open list:IOMMU DRIVERS, Ricardo Ribalda, Sergey Senozhatsky,
	Robin Murphy, Christoph Hellwig

On Tue, Feb 16, 2021 at 05:14:55PM +0900, Tomasz Figa wrote:
> When working on the videobuf2 integration with Sergey I noticed that
> we always pass 0 as DMA attrs here, which removes the ability for
> drivers to use DMA_ATTR_ALLOC_SINGLE_PAGES.
> 
> It's quite important from a system stability point of view, because by
> default the iommu_dma allocator would prefer big order allocations for
> TLB locality reasons. For many devices, though, it doesn't really
> affect the performance, because of random access patterns, so single
> pages are good enough and reduce the risk of allocation failures or
> latency due to fragmentation.
> 
> Do you think we could add the attrs parameter to the
> dma_alloc_noncontiguous() API?

Yes, we could probably do that.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API
  2021-02-02  9:51 ` [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API Christoph Hellwig
@ 2021-02-16 18:55   ` Robin Murphy
  2021-03-01  8:09     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Robin Murphy @ 2021-02-16 18:55 UTC (permalink / raw)
  To: Christoph Hellwig, Mauro Carvalho Chehab, Marek Szyprowski,
	Tomasz Figa, Ricardo Ribalda, Sergey Senozhatsky, iommu
  Cc: linux-media, linux-kernel, linux-doc

On 2021-02-02 09:51, Christoph Hellwig wrote:
> Add a new API that returns a potentiall virtually non-contigous sg_table
> and a DMA address.  This API is only properly implemented for dma-iommu
> and will simply return a contigious chunk as a fallback.
> 
> The intent is that media drivers can use this API if either:
> 
>   - no kernel mapping or only temporary kernel mappings are required.
>     That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
>   - a kernel mapping is required for cached and DMA mapped pages, but
>     the driver also needs the pages to e.g. map them to userspace.
>     In that sense it is a replacement for some aspects of the recently
>     removed and never fully implemented DMA_ATTR_NON_CONSISTENT
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   Documentation/core-api/dma-api.rst |  74 +++++++++++++++++++++
>   include/linux/dma-map-ops.h        |  18 +++++
>   include/linux/dma-mapping.h        |  31 +++++++++
>   kernel/dma/mapping.c               | 103 +++++++++++++++++++++++++++++
>   4 files changed, 226 insertions(+)
> 
> diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
> index 157a474ae54416..e24b2447f4bfe6 100644
> --- a/Documentation/core-api/dma-api.rst
> +++ b/Documentation/core-api/dma-api.rst
> @@ -594,6 +594,80 @@ dev, size, dma_handle and dir must all be the same as those passed into
>   dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
>   dma_alloc_noncoherent().
>   
> +::
> +
> +	struct sg_table *
> +	dma_alloc_noncontiguous(struct device *dev, size_t size,
> +				enum dma_data_direction dir, gfp_t gfp)
> +
> +This routine allocates  <size> bytes of non-coherent and possibly non-contiguous
> +memory.  It returns a pointer to struct sg_table that describes the allocated
> +and DMA mapped memory, or NULL if the allocation failed. The resulting memory
> +can be used for struct page mapped into a scatterlist are suitable for.
> +
> +The return sg_table is guaranteed to have 1 single DMA mapped segment as
> +indicated by sgt->nents, but it might have multiple CPU side segments as
> +indicated by sgt->orig_nents.
> +
> +The dir parameter specified if data is read and/or written by the device,
> +see dma_map_single() for details.
> +
> +The gfp parameter allows the caller to specify the ``GFP_`` flags (see
> +kmalloc()) for the allocation, but rejects flags used to specify a memory
> +zone such as GFP_DMA or GFP_HIGHMEM.
> +
> +Before giving the memory to the device, dma_sync_sgtable_for_device() needs
> +to be called, and before reading memory written by the device,
> +dma_sync_sgtable_for_cpu(), just like for streaming DMA mappings that are
> +reused.
> +
> +::
> +
> +	void
> +	dma_free_noncontiguous(struct device *dev, size_t size,
> +			       struct sg_table *sgt,
> +			       enum dma_data_direction dir)
> +
> +Free memory previously allocated using dma_alloc_noncontiguous().  dev, size,
> +and dir must all be the same as those passed into dma_alloc_noncontiguous().
> +sgt must be the pointer returned by dma_alloc_noncontiguous().
> +
> +::
> +
> +	void *
> +	dma_vmap_noncontiguous(struct device *dev, size_t size,
> +		struct sg_table *sgt)
> +
> +Return a contiguous kernel mapping for an allocation returned from
> +dma_alloc_noncontiguous().  dev and size must be the same as those passed into
> +dma_alloc_noncontiguous().  sgt must be the pointer returned by
> +dma_alloc_noncontiguous().
> +
> +Once a non-contiguous allocation is mapped using this function, the
> +flush_kernel_vmap_range() and invalidate_kernel_vmap_range() APIs must be used
> +to manage the coherency of the kernel mapping.

Maybe say something like "coherency between the kernel mapping and any 
userspace mappings"? Otherwise people like me may be easily confused and 
think it's referring to coherency between the kernel mapping and the 
device, where in most cases those APIs won't help at all :)

> +
> +::
> +
> +	void
> +	dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
> +
> +Unmap a kernel mapping returned by dma_vmap_noncontiguous().  dev must be the
> +same the one passed into dma_alloc_noncontiguous().  vaddr must be the pointer
> +returned by dma_vmap_noncontiguous().
> +
> +
> +::
> +
> +	int
> +	dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
> +			       size_t size, struct sg_table *sgt)
> +
> +Map an allocation returned from dma_alloc_noncontiguous() into a user address
> +space.  dev and size must be the same as those passed into
> +dma_alloc_noncontiguous().  sgt must be the pointer returned by
> +dma_alloc_noncontiguous().
> +
>   ::
>   
>   	int
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 11e02537b9e01b..fe46a41130e662 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -22,6 +22,10 @@ struct dma_map_ops {
>   			gfp_t gfp);
>   	void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
>   			dma_addr_t dma_handle, enum dma_data_direction dir);
> +	struct sg_table *(*alloc_noncontiguous)(struct device *dev, size_t size,
> +			enum dma_data_direction dir, gfp_t gfp);
> +	void (*free_noncontiguous)(struct device *dev, size_t size,
> +			struct sg_table *sgt, enum dma_data_direction dir);
>   	int (*mmap)(struct device *, struct vm_area_struct *,
>   			void *, dma_addr_t, size_t, unsigned long attrs);
>   
> @@ -198,6 +202,20 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
>   }
>   #endif /* CONFIG_DMA_DECLARE_COHERENT */
>   
> +/*
> + * This is the actual return value from the ->alloc_noncontiguous method.
> + * The users of the DMA API should only care about the sg_table, but to make
> + * the DMA-API internal vmaping and freeing easier we stash away the page
> + * array as well (except for the fallback case).  This can go away any time,
> + * e.g. when a vmap-variant that takes a scatterlist comes along.
> + */
> +struct dma_sgt_handle {
> +	struct sg_table sgt;
> +	struct page **pages;
> +};
> +#define sgt_handle(sgt) \
> +	container_of((sgt), struct dma_sgt_handle, sgt)
> +
>   int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
>   		void *cpu_addr, dma_addr_t dma_addr, size_t size,
>   		unsigned long attrs);
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 4977a748cb9483..6f4d34739f5cc6 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -144,6 +144,15 @@ u64 dma_get_required_mask(struct device *dev);
>   size_t dma_max_mapping_size(struct device *dev);
>   bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
>   unsigned long dma_get_merge_boundary(struct device *dev);
> +struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
> +		enum dma_data_direction dir, gfp_t gfp);
> +void dma_free_noncontiguous(struct device *dev, size_t size,
> +		struct sg_table *sgt, enum dma_data_direction dir);
> +void *dma_vmap_noncontiguous(struct device *dev, size_t size,
> +		struct sg_table *sgt);
> +void dma_vunmap_noncontiguous(struct device *dev, void *vaddr);
> +int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
> +		size_t size, struct sg_table *sgt);
>   #else /* CONFIG_HAS_DMA */
>   static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>   		struct page *page, size_t offset, size_t size,
> @@ -257,6 +266,28 @@ static inline unsigned long dma_get_merge_boundary(struct device *dev)
>   {
>   	return 0;
>   }
> +static inline struct sg_table *dma_alloc_noncontiguous(struct device *dev,
> +		size_t size, enum dma_data_direction dir, gfp_t gfp)
> +{
> +	return NULL;
> +}
> +static inline void dma_free_noncontiguous(struct device *dev, size_t size,
> +		struct sg_table *sgt, enum dma_data_direction dir)
> +{
> +}
> +static inline void *dma_vmap_noncontiguous(struct device *dev, size_t size,
> +		struct sg_table *sgt)
> +{
> +	return NULL;
> +}
> +static inline void dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
> +{
> +}
> +static inline int dma_mmap_noncontiguous(struct device *dev,
> +		struct vm_area_struct *vma, size_t size, struct sg_table *sgt)
> +{
> +	return -EINVAL;
> +}
>   #endif /* CONFIG_HAS_DMA */
>   
>   struct page *dma_alloc_pages(struct device *dev, size_t size,
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 5e87dac6cc6d9a..5a62439ed483af 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -537,6 +537,109 @@ int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
>   }
>   EXPORT_SYMBOL_GPL(dma_mmap_pages);
>   
> +static struct sg_table *alloc_single_sgt(struct device *dev, size_t size,
> +		enum dma_data_direction dir, gfp_t gfp)
> +{
> +	struct sg_table *sgt;
> +	struct page *page;
> +
> +	sgt = kmalloc(sizeof(*sgt), gfp);

It might be nice to allocate a dma_sgt_handle here for consistency.

> +	if (!sgt)
> +		return NULL;
> +	if (sg_alloc_table(sgt, 1, gfp))
> +		goto out_free_sgt;
> +	page = __dma_alloc_pages(dev, size, &sgt->sgl->dma_address, dir, gfp);
> +	if (!page)
> +		goto out_free_table;
> +	sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
> +	sg_dma_len(sgt->sgl) = sgt->sgl->length;
> +	return sgt;
> +out_free_table:
> +	sg_free_table(sgt);
> +out_free_sgt:
> +	kfree(sgt);
> +	return NULL;
> +}
> +
> +struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
> +		enum dma_data_direction dir, gfp_t gfp)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +	struct sg_table *sgt;
> +
> +	if (ops && ops->alloc_noncontiguous)
> +		sgt = ops->alloc_noncontiguous(dev, size, dir, gfp);
> +	else
> +		sgt = alloc_single_sgt(dev, size, dir, gfp);
> +
> +	if (sgt) {
> +		sgt->nents = 1;
> +		debug_dma_map_sg(dev, sgt->sgl, sgt->orig_nents, 1, dir);
> +	}
> +	return sgt;
> +}
> +EXPORT_SYMBOL_GPL(dma_alloc_noncontiguous);
> +
> +static void free_single_sgt(struct device *dev, size_t size,
> +		struct sg_table *sgt, enum dma_data_direction dir)
> +{
> +	__dma_free_pages(dev, size, sg_page(sgt->sgl), sgt->sgl->dma_address,
> +			 dir);
> +	sg_free_table(sgt);
> +	kfree(sgt);
> +}
> +
> +void dma_free_noncontiguous(struct device *dev, size_t size,
> +		struct sg_table *sgt, enum dma_data_direction dir)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +
> +	debug_dma_unmap_sg(dev, sgt->sgl, sgt->orig_nents, dir);
> +	if (ops && ops->free_noncontiguous)
> +		ops->free_noncontiguous(dev, size, sgt, dir);
> +	else
> +		free_single_sgt(dev, size, sgt, dir);
> +}
> +EXPORT_SYMBOL_GPL(dma_free_noncontiguous);
> +
> +void *dma_vmap_noncontiguous(struct device *dev, size_t size,
> +		struct sg_table *sgt)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +	unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> +	if (ops && ops->alloc_noncontiguous)
> +		return vmap(sgt_handle(sgt)->pages, count, VM_MAP, PAGE_KERNEL);
> +	return page_address(sg_page(sgt->sgl));

If the fallback case was consistent, you could simply look at whether 
sgt_handle(sgt)->pages was set and avoid having to poke at the ops at 
all. Or even better, just look at sgt->orig_nents, where if it's 1 you 
don't need to vmap either way, and if it's larger then you can assume 
pages is valid because it couldn't have come from the fallback case.

FWIW I still think this deserves to be a common sg_vmap_table() helper 
rather than pretending the operation is somehow exclusive to the DMA 
API. Even if it has to effectively perform sg_alloc_table_from_pages() 
in reverse, it's not like we don't have precedent for that sort of 
thing, e.g. dma_common_contiguous_remap(). That particular point is just 
a grumble though, and I'm not going to let it stand in the way of this 
series which I know is addressing actual issues :)

> +}
> +EXPORT_SYMBOL_GPL(dma_vmap_noncontiguous);
> +
> +void dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +
> +	if (ops && ops->alloc_noncontiguous)

is_vmalloc_addr(vaddr) seems like an even more logical condition, and at 
that point it really becomes a very generic helper that could already be 
used in a few other places.

> +		vunmap(vaddr);
> +}
> +EXPORT_SYMBOL_GPL(dma_vunmap_noncontiguous);
> +
> +int dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
> +		size_t size, struct sg_table *sgt)
> +{
> +	const struct dma_map_ops *ops = get_dma_ops(dev);
> +
> +	if (ops && ops->alloc_noncontiguous) {
> +		unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> +		if (vma->vm_pgoff >= count ||
> +		    vma_pages(vma) > count - vma->vm_pgoff)
> +			return -ENXIO;

If you're calling vm_map_pages() you shouldn't need to duplicate these 
checks.

> +		return vm_map_pages(vma, sgt_handle(sgt)->pages, count);
> +	}
> +	return dma_mmap_pages(dev, vma, size, sg_page(sgt->sgl));

Same comment about pages/orig_nents as above. Although does this 
function even need to have a split personality at all - how hard would 
it really be to call remap_pfn_range() in a for_each_sgtable_sg() loop? 
(Or perhaps for_each_sgtable_page()/vm_insert_page() would be even 
closer to what you already have here...)

Robin.

> +}
> +EXPORT_SYMBOL_GPL(dma_mmap_noncontiguous);
> +
>   int dma_supported(struct device *dev, u64 mask)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous
  2021-02-16  8:49     ` Christoph Hellwig
@ 2021-03-01  7:17       ` Sergey Senozhatsky
  2021-03-01  7:21         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-03-01  7:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sergey Senozhatsky, Linux Media Mailing List,
	Linux Doc Mailing List, Linux Kernel Mailing List,
	open list:IOMMU DRIVERS, Ricardo Ribalda, Mauro Carvalho Chehab,
	Robin Murphy

On (21/02/16 09:49), Christoph Hellwig wrote:
> > When working on the videobuf2 integration with Sergey I noticed that
> > we always pass 0 as DMA attrs here, which removes the ability for
> > drivers to use DMA_ATTR_ALLOC_SINGLE_PAGES.
> > 
> > It's quite important from a system stability point of view, because by
> > default the iommu_dma allocator would prefer big order allocations for
> > TLB locality reasons. For many devices, though, it doesn't really
> > affect the performance, because of random access patterns, so single
> > pages are good enough and reduce the risk of allocation failures or
> > latency due to fragmentation.
> > 
> > Do you think we could add the attrs parameter to the
> > dma_alloc_noncontiguous() API?
> 
> Yes, we could probably do that.

I can cook a patch, unless somebody is already looking into it.

	-ss
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous
  2021-03-01  7:17       ` Sergey Senozhatsky
@ 2021-03-01  7:21         ` Christoph Hellwig
  2021-03-01  8:02           ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-03-01  7:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Linux Media Mailing List,
	Linux Doc Mailing List, Linux Kernel Mailing List,
	open list:IOMMU DRIVERS, Ricardo Ribalda, Mauro Carvalho Chehab,
	Robin Murphy, Christoph Hellwig

On Mon, Mar 01, 2021 at 04:17:42PM +0900, Sergey Senozhatsky wrote:
> > > Do you think we could add the attrs parameter to the
> > > dma_alloc_noncontiguous() API?
> > 
> > Yes, we could probably do that.
> 
> I can cook a patch, unless somebody is already looking into it.

I plan to resend the whole series with the comments very soon.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous
  2021-03-01  7:21         ` Christoph Hellwig
@ 2021-03-01  8:02           ` Sergey Senozhatsky
  2021-03-01  8:11             ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-03-01  8:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Linux Doc Mailing List,
	Linux Kernel Mailing List, open list:IOMMU DRIVERS,
	Linux Media Mailing List, Ricardo Ribalda, Mauro Carvalho Chehab,
	Robin Murphy

On (21/03/01 08:21), Christoph Hellwig wrote:
> On Mon, Mar 01, 2021 at 04:17:42PM +0900, Sergey Senozhatsky wrote:
> > > > Do you think we could add the attrs parameter to the
> > > > dma_alloc_noncontiguous() API?
> > > 
> > > Yes, we could probably do that.
> > 
> > I can cook a patch, unless somebody is already looking into it.
> 
> I plan to resend the whole series with the comments very soon.

Oh, OK.

I thought the series was in linux-next already so a single patch
would do.

	-ss
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API
  2021-02-16 18:55   ` Robin Murphy
@ 2021-03-01  8:09     ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-03-01  8:09 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Sergey Senozhatsky, linux-media, linux-doc, linux-kernel, iommu,
	Ricardo Ribalda, Mauro Carvalho Chehab, Christoph Hellwig

On Tue, Feb 16, 2021 at 06:55:39PM +0000, Robin Murphy wrote:
> On 2021-02-02 09:51, Christoph Hellwig wrote:
>> Add a new API that returns a potentiall virtually non-contigous sg_table
>> and a DMA address.  This API is only properly implemented for dma-iommu
>> and will simply return a contigious chunk as a fallback.
>>
>> The intent is that media drivers can use this API if either:
>>
>>   - no kernel mapping or only temporary kernel mappings are required.
>>     That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
>>   - a kernel mapping is required for cached and DMA mapped pages, but
>>     the driver also needs the pages to e.g. map them to userspace.
>>     In that sense it is a replacement for some aspects of the recently
>>     removed and never fully implemented DMA_ATTR_NON_CONSISTENT
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   Documentation/core-api/dma-api.rst |  74 +++++++++++++++++++++
>>   include/linux/dma-map-ops.h        |  18 +++++
>>   include/linux/dma-mapping.h        |  31 +++++++++
>>   kernel/dma/mapping.c               | 103 +++++++++++++++++++++++++++++
>>   4 files changed, 226 insertions(+)
>>
>> diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
>> index 157a474ae54416..e24b2447f4bfe6 100644
>> --- a/Documentation/core-api/dma-api.rst
>> +++ b/Documentation/core-api/dma-api.rst
>> @@ -594,6 +594,80 @@ dev, size, dma_handle and dir must all be the same as those passed into
>>   dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
>>   dma_alloc_noncoherent().
>>   +::
>> +
>> +	struct sg_table *
>> +	dma_alloc_noncontiguous(struct device *dev, size_t size,
>> +				enum dma_data_direction dir, gfp_t gfp)
>> +
>> +This routine allocates  <size> bytes of non-coherent and possibly non-contiguous
>> +memory.  It returns a pointer to struct sg_table that describes the allocated
>> +and DMA mapped memory, or NULL if the allocation failed. The resulting memory
>> +can be used for struct page mapped into a scatterlist are suitable for.
>> +
>> +The return sg_table is guaranteed to have 1 single DMA mapped segment as
>> +indicated by sgt->nents, but it might have multiple CPU side segments as
>> +indicated by sgt->orig_nents.
>> +
>> +The dir parameter specified if data is read and/or written by the device,
>> +see dma_map_single() for details.
>> +
>> +The gfp parameter allows the caller to specify the ``GFP_`` flags (see
>> +kmalloc()) for the allocation, but rejects flags used to specify a memory
>> +zone such as GFP_DMA or GFP_HIGHMEM.
>> +
>> +Before giving the memory to the device, dma_sync_sgtable_for_device() needs
>> +to be called, and before reading memory written by the device,
>> +dma_sync_sgtable_for_cpu(), just like for streaming DMA mappings that are
>> +reused.
>> +
>> +::
>> +
>> +	void
>> +	dma_free_noncontiguous(struct device *dev, size_t size,
>> +			       struct sg_table *sgt,
>> +			       enum dma_data_direction dir)
>> +
>> +Free memory previously allocated using dma_alloc_noncontiguous().  dev, size,
>> +and dir must all be the same as those passed into dma_alloc_noncontiguous().
>> +sgt must be the pointer returned by dma_alloc_noncontiguous().
>> +
>> +::
>> +
>> +	void *
>> +	dma_vmap_noncontiguous(struct device *dev, size_t size,
>> +		struct sg_table *sgt)
>> +
>> +Return a contiguous kernel mapping for an allocation returned from
>> +dma_alloc_noncontiguous().  dev and size must be the same as those passed into
>> +dma_alloc_noncontiguous().  sgt must be the pointer returned by
>> +dma_alloc_noncontiguous().
>> +
>> +Once a non-contiguous allocation is mapped using this function, the
>> +flush_kernel_vmap_range() and invalidate_kernel_vmap_range() APIs must be used
>> +to manage the coherency of the kernel mapping.
>
> Maybe say something like "coherency between the kernel mapping and any 
> userspace mappings"? Otherwise people like me may be easily confused and 
> think it's referring to coherency between the kernel mapping and the 
> device, where in most cases those APIs won't help at all :)

Well, it is all of the above for a VIVT cache setup.  I've ammended
it to:

Once a non-contiguous allocation is mapped using this function, the
flush_kernel_vmap_range() and invalidate_kernel_vmap_range() APIs must be used
to manage the coherency between the kernel mapping, the device and user space
mappings (if any).

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous
  2021-03-01  8:02           ` Sergey Senozhatsky
@ 2021-03-01  8:11             ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-03-01  8:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Linux Media Mailing List,
	Linux Doc Mailing List, Linux Kernel Mailing List,
	open list:IOMMU DRIVERS, Ricardo Ribalda, Mauro Carvalho Chehab,
	Robin Murphy, Christoph Hellwig

On Mon, Mar 01, 2021 at 05:02:43PM +0900, Sergey Senozhatsky wrote:
> > I plan to resend the whole series with the comments very soon.
> 
> Oh, OK.
> 
> I thought the series was in linux-next already so a single patch
> would do.

It was, with an emphasys on was.  I hadn't realized I need an ack
from Laurent for uvcvideo, and he didn't have time to review it by the
time we noticed.  So I'll repost it with him in the receipients list and
the small fixups accumulated now that -rc1 is out.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-03-01  8:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-02  9:51 add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
2021-02-02  9:51 ` [PATCH 1/7] dma-mapping: remove the {alloc,free}_noncoherent methods Christoph Hellwig
2021-02-02  9:51 ` [PATCH 2/7] dma-mapping: add a dma_mmap_pages helper Christoph Hellwig
2021-02-02  9:51 ` [PATCH 3/7] dma-mapping: refactor dma_{alloc,free}_pages Christoph Hellwig
2021-02-02  9:51 ` [PATCH 4/7] dma-mapping: add a dma_alloc_noncontiguous API Christoph Hellwig
2021-02-16 18:55   ` Robin Murphy
2021-03-01  8:09     ` Christoph Hellwig
2021-02-02  9:51 ` [PATCH 5/7] dma-iommu: refactor iommu_dma_alloc_remap Christoph Hellwig
2021-02-02  9:51 ` [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous Christoph Hellwig
2021-02-16  8:14   ` Tomasz Figa
2021-02-16  8:49     ` Christoph Hellwig
2021-03-01  7:17       ` Sergey Senozhatsky
2021-03-01  7:21         ` Christoph Hellwig
2021-03-01  8:02           ` Sergey Senozhatsky
2021-03-01  8:11             ` Christoph Hellwig
2021-02-02  9:51 ` [PATCH 7/7] media: uvcvideo: Use dma_alloc_noncontiguos API Christoph Hellwig
2021-02-04  7:39   ` Hillf Danton
2021-02-07 18:48 ` add a new dma_alloc_noncontiguous API v2 Christoph Hellwig
2021-02-08 11:33   ` Tomasz Figa
     [not found]     ` <20210209082213.GA31902@lst.de>
2021-02-09  8:29       ` Ricardo Ribalda
2021-02-09 14:46         ` Ricardo Ribalda
     [not found]           ` <20210209170217.GA10199@lst.de>
2021-02-11  9:08             ` Ricardo Ribalda
2021-02-11 13:06               ` Christoph Hellwig
2021-02-11 13:20                 ` Ricardo Ribalda
2021-02-11 13:55                   ` Laurent Pinchart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).