linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* move swiotlb noncoherent dma support from arm64 to generic code
@ 2018-09-17 15:38 Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 1/9] swiotlb: remove a pointless comment Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

Hi all,

this series starts with various swiotlb cleanups, then adds support for
non-cache coherent devices to the generic swiotlb support, and finally
switches arm64 to use the generic code.

Given that this series depends on patches in the dma-mapping tree, or
pending for it I've also published a git tree here:

    git://git.infradead.org/users/hch/misc.git swiotlb-noncoherent

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-noncoherent

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/9] swiotlb: remove a pointless comment
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 2/9] swiotlb: mark is_swiotlb_buffer static Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

This comments describes an aspect of the map_sg interface that isn't
even exploited by swiotlb.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 4f8a6dbf0b60..9062b14bc7f4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -925,12 +925,6 @@ swiotlb_sync_single_for_device(struct device *hwdev, dma_addr_t dev_addr,
  * appropriate dma address and length.  They are obtained via
  * sg_dma_{address,length}(SG).
  *
- * NOTE: An implementation may be able to use a smaller number of
- *       DMA address/length pairs than there are SG table elements.
- *       (for example via virtual mapping capabilities)
- *       The routine returns the number of addr/length pairs actually
- *       used, at most nents.
- *
  * Device ownership issues as mentioned above for swiotlb_map_page are the
  * same here.
  */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/9] swiotlb: mark is_swiotlb_buffer static
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 1/9] swiotlb: remove a pointless comment Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 3/9] swiotlb: do not panic on mapping failures Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/swiotlb.h | 1 -
 kernel/dma/swiotlb.c    | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 965be92c33b5..7ef541ce8f34 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -121,7 +121,6 @@ static inline unsigned int swiotlb_max_segment(void) { return 0; }
 #endif
 
 extern void swiotlb_print_info(void);
-extern int is_swiotlb_buffer(phys_addr_t paddr);
 extern void swiotlb_set_max_segment(unsigned int);
 
 extern const struct dma_map_ops swiotlb_dma_ops;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9062b14bc7f4..26d3af52956f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -429,7 +429,7 @@ void __init swiotlb_exit(void)
 	max_segment = 0;
 }
 
-int is_swiotlb_buffer(phys_addr_t paddr)
+static int is_swiotlb_buffer(phys_addr_t paddr)
 {
 	return paddr >= io_tlb_start && paddr < io_tlb_end;
 }
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/9] swiotlb: do not panic on mapping failures
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 1/9] swiotlb: remove a pointless comment Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 2/9] swiotlb: mark is_swiotlb_buffer static Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 4/9] swiotlb: remove the overflow buffer Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

All properly written drivers now have error handling in the
dma_map_single / dma_map_page callers.  As swiotlb_tbl_map_single already
prints a useful warning when running out of swiotlb pool swace we can
also remove swiotlb_full entirely as it serves no purpose now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 33 +--------------------------------
 1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 26d3af52956f..69bf305ee5f8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -761,34 +761,6 @@ static bool swiotlb_free_buffer(struct device *dev, size_t size,
 	return true;
 }
 
-static void
-swiotlb_full(struct device *dev, size_t size, enum dma_data_direction dir,
-	     int do_panic)
-{
-	if (swiotlb_force == SWIOTLB_NO_FORCE)
-		return;
-
-	/*
-	 * Ran out of IOMMU space for this operation. This is very bad.
-	 * Unfortunately the drivers cannot handle this operation properly.
-	 * unless they check for dma_mapping_error (most don't)
-	 * When the mapping is small enough return a static buffer to limit
-	 * the damage, or panic when the transfer is too big.
-	 */
-	dev_err_ratelimited(dev, "DMA: Out of SW-IOMMU space for %zu bytes\n",
-			    size);
-
-	if (size <= io_tlb_overflow || !do_panic)
-		return;
-
-	if (dir == DMA_BIDIRECTIONAL)
-		panic("DMA: Random memory could be DMA accessed\n");
-	if (dir == DMA_FROM_DEVICE)
-		panic("DMA: Random memory could be DMA written\n");
-	if (dir == DMA_TO_DEVICE)
-		panic("DMA: Random memory could be DMA read\n");
-}
-
 /*
  * Map a single buffer of the indicated size for DMA in streaming mode.  The
  * physical address to use is returned.
@@ -817,10 +789,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 
 	/* Oh well, have to allocate and map a bounce buffer. */
 	map = map_single(dev, phys, size, dir, attrs);
-	if (map == SWIOTLB_MAP_ERROR) {
-		swiotlb_full(dev, size, dir, 1);
+	if (map == SWIOTLB_MAP_ERROR)
 		return __phys_to_dma(dev, io_tlb_overflow_buffer);
-	}
 
 	dev_addr = __phys_to_dma(dev, map);
 
@@ -948,7 +918,6 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
 			if (map == SWIOTLB_MAP_ERROR) {
 				/* Don't panic here, we expect map_sg users
 				   to do proper error handling. */
-				swiotlb_full(hwdev, sg->length, dir, 0);
 				attrs |= DMA_ATTR_SKIP_CPU_SYNC;
 				swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
 						       attrs);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/9] swiotlb: remove the overflow buffer
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-09-17 15:38 ` [PATCH 3/9] swiotlb: do not panic on mapping failures Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 5/9] swiotlb: merge swiotlb_unmap_page and unmap_single Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

Like all other dma mapping drivers just return an error code instead
of an actual memory buffer.  The reason for the overflow buffer was
that at the time swiotlb was invented there was no way to check for
dma mapping errors, but this has long been fixed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/mm/dma-mapping.c       |  2 +-
 arch/powerpc/kernel/dma-swiotlb.c |  4 +--
 include/linux/dma-direct.h        |  2 ++
 include/linux/swiotlb.h           |  3 --
 kernel/dma/direct.c               |  2 --
 kernel/dma/swiotlb.c              | 59 ++-----------------------------
 6 files changed, 8 insertions(+), 64 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index cdcb73db9ea2..abcae73eea50 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -324,7 +324,7 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 static int __swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t addr)
 {
 	if (swiotlb)
-		return swiotlb_dma_mapping_error(hwdev, addr);
+		return dma_direct_mapping_error(hwdev, addr);
 	return 0;
 }
 
diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index 88f3963ca30f..5fc335f4d9cd 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -11,7 +11,7 @@
  *
  */
 
-#include <linux/dma-mapping.h>
+#include <linux/dma-direct.h>
 #include <linux/memblock.h>
 #include <linux/pfn.h>
 #include <linux/of_platform.h>
@@ -59,7 +59,7 @@ const struct dma_map_ops powerpc_swiotlb_dma_ops = {
 	.sync_single_for_device = swiotlb_sync_single_for_device,
 	.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
 	.sync_sg_for_device = swiotlb_sync_sg_for_device,
-	.mapping_error = swiotlb_dma_mapping_error,
+	.mapping_error = dma_direct_mapping_error,
 	.get_required_mask = swiotlb_powerpc_get_required,
 };
 
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 86a59ba5a7f3..b7984fc78974 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,6 +5,8 @@
 #include <linux/dma-mapping.h>
 #include <linux/mem_encrypt.h>
 
+#define DIRECT_MAPPING_ERROR		0
+
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include <asm/dma-direct.h>
 #else
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ef541ce8f34..f847c1b265c4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -106,9 +106,6 @@ extern void
 swiotlb_sync_sg_for_device(struct device *hwdev, struct scatterlist *sg,
 			   int nelems, enum dma_data_direction dir);
 
-extern int
-swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr);
-
 extern int
 swiotlb_dma_supported(struct device *hwdev, u64 mask);
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index c954f0a6dc62..c22fa0f7c20d 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -13,8 +13,6 @@
 #include <linux/pfn.h>
 #include <linux/set_memory.h>
 
-#define DIRECT_MAPPING_ERROR		0
-
 /*
  * Most architectures use ZONE_DMA for the first 16 Megabytes, but
  * some use it for entirely different regions:
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 69bf305ee5f8..11dbcd80b4a6 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -72,13 +72,6 @@ static phys_addr_t io_tlb_start, io_tlb_end;
  */
 static unsigned long io_tlb_nslabs;
 
-/*
- * When the IOMMU overflows we return a fallback buffer. This sets the size.
- */
-static unsigned long io_tlb_overflow = 32*1024;
-
-static phys_addr_t io_tlb_overflow_buffer;
-
 /*
  * This is a free list describing the number of free entries available from
  * each index
@@ -126,7 +119,6 @@ setup_io_tlb_npages(char *str)
 	return 0;
 }
 early_param("swiotlb", setup_io_tlb_npages);
-/* make io_tlb_overflow tunable too? */
 
 unsigned long swiotlb_nr_tbl(void)
 {
@@ -194,16 +186,10 @@ void __init swiotlb_update_mem_attributes(void)
 	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
 	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
 	memset(vaddr, 0, bytes);
-
-	vaddr = phys_to_virt(io_tlb_overflow_buffer);
-	bytes = PAGE_ALIGN(io_tlb_overflow);
-	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-	memset(vaddr, 0, bytes);
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
-	void *v_overflow_buffer;
 	unsigned long i, bytes;
 
 	bytes = nslabs << IO_TLB_SHIFT;
@@ -212,17 +198,6 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 	io_tlb_start = __pa(tlb);
 	io_tlb_end = io_tlb_start + bytes;
 
-	/*
-	 * Get the overflow emergency buffer
-	 */
-	v_overflow_buffer = memblock_virt_alloc_low_nopanic(
-						PAGE_ALIGN(io_tlb_overflow),
-						PAGE_SIZE);
-	if (!v_overflow_buffer)
-		return -ENOMEM;
-
-	io_tlb_overflow_buffer = __pa(v_overflow_buffer);
-
 	/*
 	 * Allocate and initialize the free list array.  This array is used
 	 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
@@ -330,7 +305,6 @@ int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
 	unsigned long i, bytes;
-	unsigned char *v_overflow_buffer;
 
 	bytes = nslabs << IO_TLB_SHIFT;
 
@@ -341,19 +315,6 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 	set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
 	memset(tlb, 0, bytes);
 
-	/*
-	 * Get the overflow emergency buffer
-	 */
-	v_overflow_buffer = (void *)__get_free_pages(GFP_DMA,
-						     get_order(io_tlb_overflow));
-	if (!v_overflow_buffer)
-		goto cleanup2;
-
-	set_memory_decrypted((unsigned long)v_overflow_buffer,
-			io_tlb_overflow >> PAGE_SHIFT);
-	memset(v_overflow_buffer, 0, io_tlb_overflow);
-	io_tlb_overflow_buffer = virt_to_phys(v_overflow_buffer);
-
 	/*
 	 * Allocate and initialize the free list array.  This array is used
 	 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
@@ -390,10 +351,6 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 	                                                 sizeof(int)));
 	io_tlb_list = NULL;
 cleanup3:
-	free_pages((unsigned long)v_overflow_buffer,
-		   get_order(io_tlb_overflow));
-	io_tlb_overflow_buffer = 0;
-cleanup2:
 	io_tlb_end = 0;
 	io_tlb_start = 0;
 	io_tlb_nslabs = 0;
@@ -407,8 +364,6 @@ void __init swiotlb_exit(void)
 		return;
 
 	if (late_alloc) {
-		free_pages((unsigned long)phys_to_virt(io_tlb_overflow_buffer),
-			   get_order(io_tlb_overflow));
 		free_pages((unsigned long)io_tlb_orig_addr,
 			   get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
 		free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
@@ -416,8 +371,6 @@ void __init swiotlb_exit(void)
 		free_pages((unsigned long)phys_to_virt(io_tlb_start),
 			   get_order(io_tlb_nslabs << IO_TLB_SHIFT));
 	} else {
-		memblock_free_late(io_tlb_overflow_buffer,
-				   PAGE_ALIGN(io_tlb_overflow));
 		memblock_free_late(__pa(io_tlb_orig_addr),
 				   PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
 		memblock_free_late(__pa(io_tlb_list),
@@ -790,7 +743,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 	/* Oh well, have to allocate and map a bounce buffer. */
 	map = map_single(dev, phys, size, dir, attrs);
 	if (map == SWIOTLB_MAP_ERROR)
-		return __phys_to_dma(dev, io_tlb_overflow_buffer);
+		return DIRECT_MAPPING_ERROR;
 
 	dev_addr = __phys_to_dma(dev, map);
 
@@ -801,7 +754,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 	attrs |= DMA_ATTR_SKIP_CPU_SYNC;
 	swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
 
-	return __phys_to_dma(dev, io_tlb_overflow_buffer);
+	return DIRECT_MAPPING_ERROR;
 }
 
 /*
@@ -985,12 +938,6 @@ swiotlb_sync_sg_for_device(struct device *hwdev, struct scatterlist *sg,
 	swiotlb_sync_sg(hwdev, sg, nelems, dir, SYNC_FOR_DEVICE);
 }
 
-int
-swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
-{
-	return (dma_addr == __phys_to_dma(hwdev, io_tlb_overflow_buffer));
-}
-
 /*
  * Return whether the given device DMA address mask can be supported
  * properly.  For example, if your device can only drive the low 24-bits
@@ -1033,7 +980,7 @@ void swiotlb_free(struct device *dev, size_t size, void *vaddr,
 }
 
 const struct dma_map_ops swiotlb_dma_ops = {
-	.mapping_error		= swiotlb_dma_mapping_error,
+	.mapping_error		= dma_direct_mapping_error,
 	.alloc			= swiotlb_alloc,
 	.free			= swiotlb_free,
 	.sync_single_for_cpu	= swiotlb_sync_single_for_cpu,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/9] swiotlb: merge swiotlb_unmap_page and unmap_single
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-09-17 15:38 ` [PATCH 4/9] swiotlb: remove the overflow buffer Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 6/9] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 11dbcd80b4a6..15335f3a1bf3 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -765,9 +765,9 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
  * After this call, reads by the cpu to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static void unmap_single(struct device *hwdev, dma_addr_t dev_addr,
-			 size_t size, enum dma_data_direction dir,
-			 unsigned long attrs)
+void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
+			size_t size, enum dma_data_direction dir,
+			unsigned long attrs)
 {
 	phys_addr_t paddr = dma_to_phys(hwdev, dev_addr);
 
@@ -790,13 +790,6 @@ static void unmap_single(struct device *hwdev, dma_addr_t dev_addr,
 	dma_mark_clean(phys_to_virt(paddr), size);
 }
 
-void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
-			size_t size, enum dma_data_direction dir,
-			unsigned long attrs)
-{
-	unmap_single(hwdev, dev_addr, size, dir, attrs);
-}
-
 /*
  * Make physical memory consistent for a single streaming mode DMA translation
  * after a transfer.
@@ -900,7 +893,7 @@ swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
 	BUG_ON(dir == DMA_NONE);
 
 	for_each_sg(sgl, sg, nelems, i)
-		unmap_single(hwdev, sg->dma_address, sg_dma_len(sg), dir,
+		swiotlb_unmap_page(hwdev, sg->dma_address, sg_dma_len(sg), dir,
 			     attrs);
 }
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/9] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-09-17 15:38 ` [PATCH 5/9] swiotlb: merge swiotlb_unmap_page and unmap_single Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 7/9] swiotlb: refactor swiotlb_map_page Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

No need to duplicate the code - map_sg is equivalent to map_page
for each page in the scatterlist.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 15335f3a1bf3..15755d7a5242 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -845,37 +845,27 @@ swiotlb_sync_single_for_device(struct device *hwdev, dma_addr_t dev_addr,
  * same here.
  */
 int
-swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
+swiotlb_map_sg_attrs(struct device *dev, struct scatterlist *sgl, int nelems,
 		     enum dma_data_direction dir, unsigned long attrs)
 {
 	struct scatterlist *sg;
 	int i;
 
-	BUG_ON(dir == DMA_NONE);
-
 	for_each_sg(sgl, sg, nelems, i) {
-		phys_addr_t paddr = sg_phys(sg);
-		dma_addr_t dev_addr = phys_to_dma(hwdev, paddr);
-
-		if (swiotlb_force == SWIOTLB_FORCE ||
-		    !dma_capable(hwdev, dev_addr, sg->length)) {
-			phys_addr_t map = map_single(hwdev, sg_phys(sg),
-						     sg->length, dir, attrs);
-			if (map == SWIOTLB_MAP_ERROR) {
-				/* Don't panic here, we expect map_sg users
-				   to do proper error handling. */
-				attrs |= DMA_ATTR_SKIP_CPU_SYNC;
-				swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
-						       attrs);
-				sg_dma_len(sgl) = 0;
-				return 0;
-			}
-			sg->dma_address = __phys_to_dma(hwdev, map);
-		} else
-			sg->dma_address = dev_addr;
+		sg->dma_address = swiotlb_map_page(dev, sg_page(sg), sg->offset,
+				sg->length, dir, attrs);
+		if (sg->dma_address == DIRECT_MAPPING_ERROR)
+			goto out_error;
 		sg_dma_len(sg) = sg->length;
 	}
+
 	return nelems;
+
+out_error:
+	swiotlb_unmap_sg_attrs(dev, sgl, i, dir,
+			attrs | DMA_ATTR_SKIP_CPU_SYNC);
+	sg_dma_len(sgl) = 0;
+	return 0;
 }
 
 /*
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/9] swiotlb: refactor swiotlb_map_page
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-09-17 15:38 ` [PATCH 6/9] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 8/9] swiotlb: add support for non-coherent DMA Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

Remove the somewhat useless map_single function, and replace it with a
swiotlb_bounce_page handler that handles everything related to actually
bouncing a page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/swiotlb.c | 77 +++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 41 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 15755d7a5242..4d7a4d85d71e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -543,26 +543,6 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 	return tlb_addr;
 }
 
-/*
- * Allocates bounce buffer and returns its physical address.
- */
-static phys_addr_t
-map_single(struct device *hwdev, phys_addr_t phys, size_t size,
-	   enum dma_data_direction dir, unsigned long attrs)
-{
-	dma_addr_t start_dma_addr;
-
-	if (swiotlb_force == SWIOTLB_NO_FORCE) {
-		dev_warn_ratelimited(hwdev, "Cannot do DMA to address %pa\n",
-				     &phys);
-		return SWIOTLB_MAP_ERROR;
-	}
-
-	start_dma_addr = __phys_to_dma(hwdev, io_tlb_start);
-	return swiotlb_tbl_map_single(hwdev, start_dma_addr, phys, size,
-				      dir, attrs);
-}
-
 /*
  * tlb_addr is the physical address of the bounce buffer to unmap.
  */
@@ -714,6 +694,34 @@ static bool swiotlb_free_buffer(struct device *dev, size_t size,
 	return true;
 }
 
+static dma_addr_t swiotlb_bounce_page(struct device *dev, phys_addr_t *phys,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	dma_addr_t dma_addr;
+
+	if (unlikely(swiotlb_force == SWIOTLB_NO_FORCE)) {
+		dev_warn_ratelimited(dev,
+			"Cannot do DMA to address %pa\n", phys);
+		return DIRECT_MAPPING_ERROR;
+	}
+
+	/* Oh well, have to allocate and map a bounce buffer. */
+	*phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
+			*phys, size, dir, attrs);
+	if (*phys == SWIOTLB_MAP_ERROR)
+		return DIRECT_MAPPING_ERROR;
+
+	/* Ensure that the address returned is DMA'ble */
+	dma_addr = __phys_to_dma(dev, *phys);
+	if (unlikely(!dma_capable(dev, dma_addr, size))) {
+		swiotlb_tbl_unmap_single(dev, *phys, size, dir,
+			attrs | DMA_ATTR_SKIP_CPU_SYNC);
+		return DIRECT_MAPPING_ERROR;
+	}
+
+	return dma_addr;
+}
+
 /*
  * Map a single buffer of the indicated size for DMA in streaming mode.  The
  * physical address to use is returned.
@@ -726,8 +734,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 			    enum dma_data_direction dir,
 			    unsigned long attrs)
 {
-	phys_addr_t map, phys = page_to_phys(page) + offset;
-	dma_addr_t dev_addr = phys_to_dma(dev, phys);
+	phys_addr_t phys = page_to_phys(page) + offset;
+	dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
 	BUG_ON(dir == DMA_NONE);
 	/*
@@ -735,26 +743,13 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 	 * we can safely return the device addr and not worry about bounce
 	 * buffering it.
 	 */
-	if (dma_capable(dev, dev_addr, size) && swiotlb_force != SWIOTLB_FORCE)
-		return dev_addr;
-
-	trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
-
-	/* Oh well, have to allocate and map a bounce buffer. */
-	map = map_single(dev, phys, size, dir, attrs);
-	if (map == SWIOTLB_MAP_ERROR)
-		return DIRECT_MAPPING_ERROR;
-
-	dev_addr = __phys_to_dma(dev, map);
-
-	/* Ensure that the address returned is DMA'ble */
-	if (dma_capable(dev, dev_addr, size))
-		return dev_addr;
-
-	attrs |= DMA_ATTR_SKIP_CPU_SYNC;
-	swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
+	if (!dma_capable(dev, dma_addr, size) ||
+	    swiotlb_force == SWIOTLB_FORCE) {
+		trace_swiotlb_bounced(dev, dma_addr, size, swiotlb_force);
+		dma_addr = swiotlb_bounce_page(dev, &phys, size, dir, attrs);
+	}
 
-	return DIRECT_MAPPING_ERROR;
+	return dma_addr;
 }
 
 /*
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 8/9] swiotlb: add support for non-coherent DMA
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-09-17 15:38 ` [PATCH 7/9] swiotlb: refactor swiotlb_map_page Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-17 15:38 ` [PATCH 9/9] arm64: use the generic swiotlb_dma_ops Christoph Hellwig
  2018-09-18 13:28 ` move swiotlb noncoherent dma support from arm64 to generic code Robin Murphy
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

Handle architectures that are not cache coherent directly in the main
swiotlb code.  This involves two related changes:

 - call arch_sync_dma_for_{device,cpu} in all the right places from the
   various dma_map/unmap/sync methods when the device is non-coherent
 - call arch_dma_{alloc,free} for devices that are non-coherent

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/mm/dma-mapping.c |  6 ++---
 include/linux/swiotlb.h     |  4 ++--
 kernel/dma/swiotlb.c        | 44 ++++++++++++++++++++++++++++++-------
 3 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index abcae73eea50..07d9c2633f80 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -112,7 +112,7 @@ static void *__dma_alloc(struct device *dev, size_t size,
 		return addr;
 	}
 
-	ptr = swiotlb_alloc(dev, size, dma_handle, flags, attrs);
+	ptr = __swiotlb_alloc(dev, size, dma_handle, flags, attrs);
 	if (!ptr)
 		goto no_mem;
 
@@ -133,7 +133,7 @@ static void *__dma_alloc(struct device *dev, size_t size,
 	return coherent_ptr;
 
 no_map:
-	swiotlb_free(dev, size, ptr, *dma_handle, attrs);
+	__swiotlb_free(dev, size, ptr, *dma_handle, attrs);
 no_mem:
 	return NULL;
 }
@@ -151,7 +151,7 @@ static void __dma_free(struct device *dev, size_t size,
 			return;
 		vunmap(vaddr);
 	}
-	swiotlb_free(dev, size, swiotlb_addr, dma_handle, attrs);
+	__swiotlb_free(dev, size, swiotlb_addr, dma_handle, attrs);
 }
 
 static dma_addr_t __swiotlb_map_page(struct device *dev, struct page *page,
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f847c1b265c4..bc809d826d4f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -67,9 +67,9 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
 
 /* Accessory functions. */
 
-void *swiotlb_alloc(struct device *hwdev, size_t size, dma_addr_t *dma_handle,
+void *__swiotlb_alloc(struct device *hwdev, size_t size, dma_addr_t *dma_handle,
 		gfp_t flags, unsigned long attrs);
-void swiotlb_free(struct device *dev, size_t size, void *vaddr,
+void __swiotlb_free(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_addr, unsigned long attrs);
 
 extern dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 4d7a4d85d71e..83e597101c6a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -21,6 +21,7 @@
 
 #include <linux/cache.h>
 #include <linux/dma-direct.h>
+#include <linux/dma-noncoherent.h>
 #include <linux/mm.h>
 #include <linux/export.h>
 #include <linux/spinlock.h>
@@ -749,6 +750,10 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 		dma_addr = swiotlb_bounce_page(dev, &phys, size, dir, attrs);
 	}
 
+	if (!dev_is_dma_coherent(dev) &&
+	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
+		arch_sync_dma_for_device(dev, phys, size, dir);
+
 	return dma_addr;
 }
 
@@ -768,6 +773,10 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 
 	BUG_ON(dir == DMA_NONE);
 
+	if (!dev_is_dma_coherent(hwdev) &&
+	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
+		arch_sync_dma_for_cpu(hwdev, paddr, size, dir);
+
 	if (is_swiotlb_buffer(paddr)) {
 		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
 		return;
@@ -804,15 +813,17 @@ swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
 
 	BUG_ON(dir == DMA_NONE);
 
-	if (is_swiotlb_buffer(paddr)) {
+	if (!dev_is_dma_coherent(hwdev) && target == SYNC_FOR_CPU)
+		arch_sync_dma_for_cpu(hwdev, paddr, size, dir);
+
+	if (is_swiotlb_buffer(paddr))
 		swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
-		return;
-	}
 
-	if (dir != DMA_FROM_DEVICE)
-		return;
+	if (!dev_is_dma_coherent(hwdev) && target == SYNC_FOR_DEVICE)
+		arch_sync_dma_for_device(hwdev, paddr, size, dir);
 
-	dma_mark_clean(phys_to_virt(paddr), size);
+	if (!is_swiotlb_buffer(paddr) && dir == DMA_FROM_DEVICE)
+		dma_mark_clean(phys_to_virt(paddr), size);
 }
 
 void
@@ -928,7 +939,7 @@ swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return __phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
 }
 
-void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+void *__swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs)
 {
 	void *vaddr;
@@ -950,13 +961,30 @@ void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	return vaddr;
 }
 
-void swiotlb_free(struct device *dev, size_t size, void *vaddr,
+static void *swiotlb_alloc(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+	if (!dev_is_dma_coherent(dev))
+		return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
+	return __swiotlb_alloc(dev, size, dma_handle, gfp, attrs);
+}
+
+void __swiotlb_free(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_addr, unsigned long attrs)
 {
 	if (!swiotlb_free_buffer(dev, size, dma_addr))
 		dma_direct_free(dev, size, vaddr, dma_addr, attrs);
 }
 
+static void swiotlb_free(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_addr, unsigned long attrs)
+{
+	if (!dev_is_dma_coherent(dev))
+		arch_dma_free(dev, size, vaddr, dma_addr, attrs);
+	else
+		__swiotlb_free(dev, size, vaddr, dma_addr, attrs);
+}
+
 const struct dma_map_ops swiotlb_dma_ops = {
 	.mapping_error		= dma_direct_mapping_error,
 	.alloc			= swiotlb_alloc,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 9/9] arm64: use the generic swiotlb_dma_ops
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-09-17 15:38 ` [PATCH 8/9] swiotlb: add support for non-coherent DMA Christoph Hellwig
@ 2018-09-17 15:38 ` Christoph Hellwig
  2018-09-18 13:28 ` move swiotlb noncoherent dma support from arm64 to generic code Robin Murphy
  9 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-17 15:38 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, Robin Murphy, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

Now that the generic swiotlb code supports non-coherent DMA we can switch
to it for arm64.  For that we need to refactor the existing
alloc/free/mmap/pgprot helpers to be used as the architecture hooks,
and implement the standard arch_sync_dma_for_{device,cpu} hooks for
cache maintaincance in the streaming dma hooks, which also implies
using the generic dma_coherent flag in struct device.

Note that we need to keep the old is_device_dma_coherent function around
for now, so that the shared arm/arm64 Xen code keeps working.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/Kconfig                   |   4 +
 arch/arm64/include/asm/device.h      |   1 -
 arch/arm64/include/asm/dma-mapping.h |   7 +-
 arch/arm64/mm/dma-mapping.c          | 255 +++++----------------------
 4 files changed, 55 insertions(+), 212 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1b1a0e95c751..c4db5131d837 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,6 +11,8 @@ config ARM64
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
+	select ARCH_HAS_DMA_COHERENT_TO_PFN
+	select ARCH_HAS_DMA_MMAP_PGPROT
 	select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FAST_MULTIPLIER
@@ -24,6 +26,8 @@ config ARM64
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_STRICT_MODULE_RWX
+	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select ARCH_HAS_SYSCALL_WRAPPER
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 5a5fa47a6b18..3dd3d664c5c5 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -23,7 +23,6 @@ struct dev_archdata {
 #ifdef CONFIG_XEN
 	const struct dma_map_ops *dev_dma_ops;
 #endif
-	bool dma_coherent;
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index 0a2d13332545..d2414b540af1 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -39,10 +39,13 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			const struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
-/* do not use this function in a driver */
+/*
+ * Do not use this function in a driver, it is only provided for
+ * arch/arm/mm/xen.c, which is used by arm64 as well.
+ */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
-	return dev->archdata.dma_coherent;
+	return dev->dma_coherent;
 }
 
 #endif	/* __KERNEL__ */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 07d9c2633f80..b644b67bc333 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -25,6 +25,7 @@
 #include <linux/slab.h>
 #include <linux/genalloc.h>
 #include <linux/dma-direct.h>
+#include <linux/dma-noncoherent.h>
 #include <linux/dma-contiguous.h>
 #include <linux/vmalloc.h>
 #include <linux/swiotlb.h>
@@ -32,16 +33,6 @@
 
 #include <asm/cacheflush.h>
 
-static int swiotlb __ro_after_init;
-
-static pgprot_t __get_dma_pgprot(unsigned long attrs, pgprot_t prot,
-				 bool coherent)
-{
-	if (!coherent || (attrs & DMA_ATTR_WRITE_COMBINE))
-		return pgprot_writecombine(prot);
-	return prot;
-}
-
 static struct gen_pool *atomic_pool __ro_after_init;
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
@@ -91,18 +82,15 @@ static int __free_from_pool(void *start, size_t size)
 	return 1;
 }
 
-static void *__dma_alloc(struct device *dev, size_t size,
-			 dma_addr_t *dma_handle, gfp_t flags,
-			 unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		gfp_t flags, unsigned long attrs)
 {
 	struct page *page;
 	void *ptr, *coherent_ptr;
-	bool coherent = is_device_dma_coherent(dev);
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false);
 
 	size = PAGE_ALIGN(size);
 
-	if (!coherent && !gfpflags_allow_blocking(flags)) {
+	if (!gfpflags_allow_blocking(flags)) {
 		struct page *page = NULL;
 		void *addr = __alloc_from_pool(size, &page, flags);
 
@@ -116,17 +104,14 @@ static void *__dma_alloc(struct device *dev, size_t size,
 	if (!ptr)
 		goto no_mem;
 
-	/* no need for non-cacheable mapping if coherent */
-	if (coherent)
-		return ptr;
-
 	/* remove any dirty cache lines on the kernel alias */
 	__dma_flush_area(ptr, size);
 
 	/* create a coherent mapping */
 	page = virt_to_page(ptr);
 	coherent_ptr = dma_common_contiguous_remap(page, size, VM_USERMAP,
-						   prot, __builtin_return_address(0));
+			pgprot_writecombine(PAGE_KERNEL),
+			__builtin_return_address(0));
 	if (!coherent_ptr)
 		goto no_map;
 
@@ -138,125 +123,50 @@ static void *__dma_alloc(struct device *dev, size_t size,
 	return NULL;
 }
 
-static void __dma_free(struct device *dev, size_t size,
-		       void *vaddr, dma_addr_t dma_handle,
-		       unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, unsigned long attrs)
 {
-	void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle));
-
-	size = PAGE_ALIGN(size);
-
-	if (!is_device_dma_coherent(dev)) {
-		if (__free_from_pool(vaddr, size))
-			return;
+	if (!__free_from_pool(vaddr, size)) {
 		vunmap(vaddr);
+		__swiotlb_free(dev, size, vaddr, dma_handle, attrs);
 	}
-	__swiotlb_free(dev, size, swiotlb_addr, dma_handle, attrs);
 }
 
-static dma_addr_t __swiotlb_map_page(struct device *dev, struct page *page,
-				     unsigned long offset, size_t size,
-				     enum dma_data_direction dir,
-				     unsigned long attrs)
+long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
+		dma_addr_t dma_addr)
 {
-	dma_addr_t dev_addr;
-
-	dev_addr = swiotlb_map_page(dev, page, offset, size, dir, attrs);
-	if (!is_device_dma_coherent(dev) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		__dma_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
-
-	return dev_addr;
-}
-
-
-static void __swiotlb_unmap_page(struct device *dev, dma_addr_t dev_addr,
-				 size_t size, enum dma_data_direction dir,
-				 unsigned long attrs)
-{
-	if (!is_device_dma_coherent(dev) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		__dma_unmap_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
-	swiotlb_unmap_page(dev, dev_addr, size, dir, attrs);
+	return __phys_to_pfn(dma_to_phys(dev, dma_addr));
 }
 
-static int __swiotlb_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
-				  int nelems, enum dma_data_direction dir,
-				  unsigned long attrs)
+pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
+		unsigned long attrs)
 {
-	struct scatterlist *sg;
-	int i, ret;
-
-	ret = swiotlb_map_sg_attrs(dev, sgl, nelems, dir, attrs);
-	if (!is_device_dma_coherent(dev) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		for_each_sg(sgl, sg, ret, i)
-			__dma_map_area(phys_to_virt(dma_to_phys(dev, sg->dma_address)),
-				       sg->length, dir);
-
-	return ret;
-}
-
-static void __swiotlb_unmap_sg_attrs(struct device *dev,
-				     struct scatterlist *sgl, int nelems,
-				     enum dma_data_direction dir,
-				     unsigned long attrs)
-{
-	struct scatterlist *sg;
-	int i;
-
-	if (!is_device_dma_coherent(dev) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		for_each_sg(sgl, sg, nelems, i)
-			__dma_unmap_area(phys_to_virt(dma_to_phys(dev, sg->dma_address)),
-					 sg->length, dir);
-	swiotlb_unmap_sg_attrs(dev, sgl, nelems, dir, attrs);
+	if (!dev_is_dma_coherent(dev) || (attrs & DMA_ATTR_WRITE_COMBINE))
+		return pgprot_writecombine(prot);
+	return prot;
 }
 
-static void __swiotlb_sync_single_for_cpu(struct device *dev,
-					  dma_addr_t dev_addr, size_t size,
-					  enum dma_data_direction dir)
+void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir)
 {
-	if (!is_device_dma_coherent(dev))
-		__dma_unmap_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
-	swiotlb_sync_single_for_cpu(dev, dev_addr, size, dir);
+	__dma_map_area(phys_to_virt(paddr), size, dir);
 }
 
-static void __swiotlb_sync_single_for_device(struct device *dev,
-					     dma_addr_t dev_addr, size_t size,
-					     enum dma_data_direction dir)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir)
 {
-	swiotlb_sync_single_for_device(dev, dev_addr, size, dir);
-	if (!is_device_dma_coherent(dev))
-		__dma_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
+	__dma_unmap_area(phys_to_virt(paddr), size, dir);
 }
 
-static void __swiotlb_sync_sg_for_cpu(struct device *dev,
-				      struct scatterlist *sgl, int nelems,
-				      enum dma_data_direction dir)
+static int __swiotlb_get_sgtable_page(struct sg_table *sgt,
+				      struct page *page, size_t size)
 {
-	struct scatterlist *sg;
-	int i;
-
-	if (!is_device_dma_coherent(dev))
-		for_each_sg(sgl, sg, nelems, i)
-			__dma_unmap_area(phys_to_virt(dma_to_phys(dev, sg->dma_address)),
-					 sg->length, dir);
-	swiotlb_sync_sg_for_cpu(dev, sgl, nelems, dir);
-}
+	int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
 
-static void __swiotlb_sync_sg_for_device(struct device *dev,
-					 struct scatterlist *sgl, int nelems,
-					 enum dma_data_direction dir)
-{
-	struct scatterlist *sg;
-	int i;
+	if (!ret)
+		sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
 
-	swiotlb_sync_sg_for_device(dev, sgl, nelems, dir);
-	if (!is_device_dma_coherent(dev))
-		for_each_sg(sgl, sg, nelems, i)
-			__dma_map_area(phys_to_virt(dma_to_phys(dev, sg->dma_address)),
-				       sg->length, dir);
+	return ret;
 }
 
 static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
@@ -277,74 +187,6 @@ static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
 	return ret;
 }
 
-static int __swiotlb_mmap(struct device *dev,
-			  struct vm_area_struct *vma,
-			  void *cpu_addr, dma_addr_t dma_addr, size_t size,
-			  unsigned long attrs)
-{
-	int ret;
-	unsigned long pfn = dma_to_phys(dev, dma_addr) >> PAGE_SHIFT;
-
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
-					     is_device_dma_coherent(dev));
-
-	if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
-		return ret;
-
-	return __swiotlb_mmap_pfn(vma, pfn, size);
-}
-
-static int __swiotlb_get_sgtable_page(struct sg_table *sgt,
-				      struct page *page, size_t size)
-{
-	int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-
-	if (!ret)
-		sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-
-	return ret;
-}
-
-static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
-				 void *cpu_addr, dma_addr_t handle, size_t size,
-				 unsigned long attrs)
-{
-	struct page *page = phys_to_page(dma_to_phys(dev, handle));
-
-	return __swiotlb_get_sgtable_page(sgt, page, size);
-}
-
-static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
-{
-	if (swiotlb)
-		return swiotlb_dma_supported(hwdev, mask);
-	return 1;
-}
-
-static int __swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t addr)
-{
-	if (swiotlb)
-		return dma_direct_mapping_error(hwdev, addr);
-	return 0;
-}
-
-static const struct dma_map_ops arm64_swiotlb_dma_ops = {
-	.alloc = __dma_alloc,
-	.free = __dma_free,
-	.mmap = __swiotlb_mmap,
-	.get_sgtable = __swiotlb_get_sgtable,
-	.map_page = __swiotlb_map_page,
-	.unmap_page = __swiotlb_unmap_page,
-	.map_sg = __swiotlb_map_sg_attrs,
-	.unmap_sg = __swiotlb_unmap_sg_attrs,
-	.sync_single_for_cpu = __swiotlb_sync_single_for_cpu,
-	.sync_single_for_device = __swiotlb_sync_single_for_device,
-	.sync_sg_for_cpu = __swiotlb_sync_sg_for_cpu,
-	.sync_sg_for_device = __swiotlb_sync_sg_for_device,
-	.dma_supported = __swiotlb_dma_supported,
-	.mapping_error = __swiotlb_dma_mapping_error,
-};
-
 static int __init atomic_pool_init(void)
 {
 	pgprot_t prot = __pgprot(PROT_NORMAL_NC);
@@ -500,10 +342,6 @@ EXPORT_SYMBOL(dummy_dma_ops);
 
 static int __init arm64_dma_init(void)
 {
-	if (swiotlb_force == SWIOTLB_FORCE ||
-	    max_pfn > (arm64_dma_phys_limit >> PAGE_SHIFT))
-		swiotlb = 1;
-
 	WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
 		   TAINT_CPU_OUT_OF_SPEC,
 		   "ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
@@ -528,7 +366,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 				 dma_addr_t *handle, gfp_t gfp,
 				 unsigned long attrs)
 {
-	bool coherent = is_device_dma_coherent(dev);
+	bool coherent = dev_is_dma_coherent(dev);
 	int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
 	size_t iosize = size;
 	void *addr;
@@ -569,7 +407,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 			addr = NULL;
 		}
 	} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-		pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
+		pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
 		struct page *page;
 
 		page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
@@ -596,7 +434,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 						    size >> PAGE_SHIFT);
 		}
 	} else {
-		pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
+		pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
 		struct page **pages;
 
 		pages = iommu_dma_alloc(dev, iosize, gfp, attrs, ioprot,
@@ -658,8 +496,7 @@ static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 	struct vm_struct *area;
 	int ret;
 
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
-					     is_device_dma_coherent(dev));
+	vma->vm_page_prot = arch_dma_mmap_pgprot(dev, vma->vm_page_prot, attrs);
 
 	if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
 		return ret;
@@ -709,11 +546,11 @@ static void __iommu_sync_single_for_cpu(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (is_device_dma_coherent(dev))
+	if (dev_is_dma_coherent(dev))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
-	__dma_unmap_area(phys_to_virt(phys), size, dir);
+	arch_sync_dma_for_cpu(dev, phys, size, dir);
 }
 
 static void __iommu_sync_single_for_device(struct device *dev,
@@ -722,11 +559,11 @@ static void __iommu_sync_single_for_device(struct device *dev,
 {
 	phys_addr_t phys;
 
-	if (is_device_dma_coherent(dev))
+	if (dev_is_dma_coherent(dev))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
-	__dma_map_area(phys_to_virt(phys), size, dir);
+	arch_sync_dma_for_device(dev, phys, size, dir);
 }
 
 static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
@@ -734,7 +571,7 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
 				   enum dma_data_direction dir,
 				   unsigned long attrs)
 {
-	bool coherent = is_device_dma_coherent(dev);
+	bool coherent = dev_is_dma_coherent(dev);
 	int prot = dma_info_to_prot(dir, coherent, attrs);
 	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
@@ -762,11 +599,11 @@ static void __iommu_sync_sg_for_cpu(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (is_device_dma_coherent(dev))
+	if (dev_is_dma_coherent(dev))
 		return;
 
 	for_each_sg(sgl, sg, nelems, i)
-		__dma_unmap_area(sg_virt(sg), sg->length, dir);
+		arch_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, dir);
 }
 
 static void __iommu_sync_sg_for_device(struct device *dev,
@@ -776,18 +613,18 @@ static void __iommu_sync_sg_for_device(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	if (is_device_dma_coherent(dev))
+	if (dev_is_dma_coherent(dev))
 		return;
 
 	for_each_sg(sgl, sg, nelems, i)
-		__dma_map_area(sg_virt(sg), sg->length, dir);
+		arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
 
 static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
 				int nelems, enum dma_data_direction dir,
 				unsigned long attrs)
 {
-	bool coherent = is_device_dma_coherent(dev);
+	bool coherent = dev_is_dma_coherent(dev);
 
 	if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
 		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
@@ -874,9 +711,9 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			const struct iommu_ops *iommu, bool coherent)
 {
 	if (!dev->dma_ops)
-		dev->dma_ops = &arm64_swiotlb_dma_ops;
+		dev->dma_ops = &swiotlb_dma_ops;
 
-	dev->archdata.dma_coherent = coherent;
+	dev->dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 
 #ifdef CONFIG_XEN
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: move swiotlb noncoherent dma support from arm64 to generic code
  2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-09-17 15:38 ` [PATCH 9/9] arm64: use the generic swiotlb_dma_ops Christoph Hellwig
@ 2018-09-18 13:28 ` Robin Murphy
  2018-09-18 16:11   ` Christoph Hellwig
  9 siblings, 1 reply; 12+ messages in thread
From: Robin Murphy @ 2018-09-18 13:28 UTC (permalink / raw)
  To: Christoph Hellwig, Will Deacon, Catalin Marinas, Konrad Rzeszutek Wilk
  Cc: linux-arm-kernel, iommu, linux-kernel

Hi Christoph,

On 17/09/18 16:38, Christoph Hellwig wrote:
> Hi all,
> 
> this series starts with various swiotlb cleanups, then adds support for
> non-cache coherent devices to the generic swiotlb support, and finally
> switches arm64 to use the generic code.

I think there's going to be an issue with the embedded folks' grubby 
hack in arm64's mem_init() which skips initialising SWIOTLB at all with 
sufficiently little DRAM. I've been waiting for 
dma-direct-noncoherent-merge so that I could fix that case to swizzle in 
dma_direct_ops and avoid swiotlb_dma_ops entirely.

> Given that this series depends on patches in the dma-mapping tree, or
> pending for it I've also published a git tree here:
> 
>      git://git.infradead.org/users/hch/misc.git swiotlb-noncoherent

However, upon sitting down to eagerly write that patch I've just 
boot-tested the above branch as-is for a baseline and discovered a 
rather more significant problem: arch_dma_alloc() is recursing back into 
__swiotlb_alloc() and blowing the stack. Not good :(

Robin.

----->8-----
[    4.032760] Insufficient stack space to handle exception!
[    4.032765] ESR: 0x96000047 -- DABT (current EL)
[    4.042666] FAR: 0xffff00000a937fb0
[    4.046113] Task stack:     [0xffff00000a938000..0xffff00000a93c000]
[    4.052399] IRQ stack:      [0xffff000008008000..0xffff00000800c000]
[    4.058684] Overflow stack: [0xffff80097ff4b290..0xffff80097ff4c290]
[    4.064972] CPU: 1 PID: 130 Comm: kworker/1:1 Not tainted 4.19.0-rc2+ 
#681
[    4.071775] Hardware name: ARM LTD ARM Juno Development Platform/ARM 
Juno Development Platform, BIOS EDK II Jul 10 2018
[    4.082456] Workqueue: events deferred_probe_work_func
[    4.087542] pstate: 00000005 (nzcv daif -PAN -UAO)
[    4.092283] pc : arch_dma_alloc+0x0/0x198
[    4.096250] lr : dma_direct_alloc+0x20/0x28
[    4.100385] sp : ffff00000a938010
[    4.103660] x29: ffff00000a938010 x28: ffff800974e15238
[    4.108918] x27: ffff000008bf30d8 x26: ffff80097543d400
[    4.114176] x25: 0000000000000300 x24: ffff80097543d400
[    4.119434] x23: ffff800974e15238 x22: 0000000000001000
[    4.124691] x21: ffff80097543d400 x20: 0000000000001000
[    4.129948] x19: 0000000000000300 x18: ffffffffffffffff
[    4.135206] x17: 0000000000000000 x16: ffff000008bf1b58
[    4.140463] x15: ffff0000091eb688 x14: ffff00008a93ba1d
[    4.145720] x13: ffffff0000000000 x12: 0000000000000000
[    4.150977] x11: 0000000000000001 x10: ffffff7f7fff7fff
[    4.156235] x9 : 0000000000000000 x8 : 0000000000000000
[    4.161492] x7 : ffff800974df9810 x6 : 0000000000000000
[    4.166749] x5 : 0000000000000000 x4 : 0000000000000300
[    4.172006] x3 : 00000000006002c0 x2 : ffff800974e15238
[    4.177263] x1 : 0000000000001000 x0 : ffff80097543d400
[    4.182521] Kernel panic - not syncing: kernel stack overflow
[    4.188207] CPU: 1 PID: 130 Comm: kworker/1:1 Not tainted 4.19.0-rc2+ 
#681
[    4.195008] Hardware name: ARM LTD ARM Juno Development Platform/ARM 
Juno Development Platform, BIOS EDK II Jul 10 2018
[    4.205681] Workqueue: events deferred_probe_work_func
[    4.210765] Call trace:
[    4.213183]  dump_backtrace+0x0/0x1f0
[    4.216805]  show_stack+0x14/0x20
[    4.220084]  dump_stack+0x9c/0xbc
[    4.223362]  panic+0x138/0x294
[    4.226381]  __stack_chk_fail+0x0/0x18
[    4.230088]  handle_bad_stack+0x11c/0x130
[    4.234052]  __bad_stack+0x88/0x8c
[    4.237415]  arch_dma_alloc+0x0/0x198
[    4.241036]  __swiotlb_alloc+0x3c/0x178
[    4.244828]  arch_dma_alloc+0xd0/0x198
[    4.248534]  dma_direct_alloc+0x20/0x28
[    4.252327]  __swiotlb_alloc+0x3c/0x178
[    4.256119]  arch_dma_alloc+0xd0/0x198
[    4.259825]  dma_direct_alloc+0x20/0x28
[    4.263617]  __swiotlb_alloc+0x3c/0x178
[    4.267409]  arch_dma_alloc+0xd0/0x198
[    4.271115]  dma_direct_alloc+0x20/0x28
[    4.274907]  __swiotlb_alloc+0x3c/0x178
[    4.278700]  arch_dma_alloc+0xd0/0x198
[    4.282405]  dma_direct_alloc+0x20/0x28
[    4.286198]  __swiotlb_alloc+0x3c/0x178
[    4.289990]  arch_dma_alloc+0xd0/0x198
[    4.293696]  dma_direct_alloc+0x20/0x28
[    4.297489]  __swiotlb_alloc+0x3c/0x178
[    4.301281]  arch_dma_alloc+0xd0/0x198
[    4.304987]  dma_direct_alloc+0x20/0x28
[    4.308779]  __swiotlb_alloc+0x3c/0x178
[    4.312571]  arch_dma_alloc+0xd0/0x198
[    4.316277]  dma_direct_alloc+0x20/0x28
[    4.320069]  __swiotlb_alloc+0x3c/0x178
[    4.323861]  arch_dma_alloc+0xd0/0x198
[    4.327567]  dma_direct_alloc+0x20/0x28
[    4.331359]  __swiotlb_alloc+0x3c/0x178
[    4.335151]  arch_dma_alloc+0xd0/0x198
[    4.338857]  dma_direct_alloc+0x20/0x28
[    4.342650]  __swiotlb_alloc+0x3c/0x178
[    4.346442]  arch_dma_alloc+0xd0/0x198
[    4.350148]  dma_direct_alloc+0x20/0x28
[    4.353940]  __swiotlb_alloc+0x3c/0x178
[    4.357732]  arch_dma_alloc+0xd0/0x198
[    4.361438]  dma_direct_alloc+0x20/0x28
[    4.365230]  __swiotlb_alloc+0x3c/0x178
[    4.369022]  arch_dma_alloc+0xd0/0x198
[    4.372728]  dma_direct_alloc+0x20/0x28
[    4.376520]  __swiotlb_alloc+0x3c/0x178
[    4.380313]  arch_dma_alloc+0xd0/0x198
[    4.384018]  dma_direct_alloc+0x20/0x28
[    4.387811]  __swiotlb_alloc+0x3c/0x178
[    4.391603]  arch_dma_alloc+0xd0/0x198
[    4.395309]  dma_direct_alloc+0x20/0x28
[    4.399101]  __swiotlb_alloc+0x3c/0x178
[    4.402893]  arch_dma_alloc+0xd0/0x198
[    4.406599]  dma_direct_alloc+0x20/0x28
[    4.410391]  __swiotlb_alloc+0x3c/0x178
[    4.414183]  arch_dma_alloc+0xd0/0x198
[    4.417889]  dma_direct_alloc+0x20/0x28
[    4.421682]  __swiotlb_alloc+0x3c/0x178
[    4.425474]  arch_dma_alloc+0xd0/0x198
[    4.429180]  dma_direct_alloc+0x20/0x28
[    4.432972]  __swiotlb_alloc+0x3c/0x178
[    4.436764]  arch_dma_alloc+0xd0/0x198
[    4.440470]  dma_direct_alloc+0x20/0x28
[    4.444262]  __swiotlb_alloc+0x3c/0x178
[    4.448055]  arch_dma_alloc+0xd0/0x198
[    4.451760]  dma_direct_alloc+0x20/0x28
[    4.455553]  __swiotlb_alloc+0x3c/0x178
[    4.459345]  arch_dma_alloc+0xd0/0x198
[    4.463051]  dma_direct_alloc+0x20/0x28
[    4.466843]  __swiotlb_alloc+0x3c/0x178
[    4.470635]  arch_dma_alloc+0xd0/0x198
[    4.474341]  dma_direct_alloc+0x20/0x28
[    4.478134]  __swiotlb_alloc+0x3c/0x178
[    4.481926]  arch_dma_alloc+0xd0/0x198
[    4.485632]  dma_direct_alloc+0x20/0x28
[    4.489424]  __swiotlb_alloc+0x3c/0x178
[    4.493216]  arch_dma_alloc+0xd0/0x198
[    4.496922]  dma_direct_alloc+0x20/0x28
[    4.500714]  __swiotlb_alloc+0x3c/0x178
[    4.504506]  arch_dma_alloc+0xd0/0x198
[    4.508212]  dma_direct_alloc+0x20/0x28
[    4.512004]  __swiotlb_alloc+0x3c/0x178
[    4.515797]  arch_dma_alloc+0xd0/0x198
[    4.519502]  dma_direct_alloc+0x20/0x28
[    4.523295]  __swiotlb_alloc+0x3c/0x178
[    4.527087]  arch_dma_alloc+0xd0/0x198
[    4.530793]  dma_direct_alloc+0x20/0x28
[    4.534585]  __swiotlb_alloc+0x3c/0x178
[    4.538378]  arch_dma_alloc+0xd0/0x198
[    4.542083]  dma_direct_alloc+0x20/0x28
[    4.545876]  __swiotlb_alloc+0x3c/0x178
[    4.549668]  arch_dma_alloc+0xd0/0x198
[    4.553374]  dma_direct_alloc+0x20/0x28
[    4.557166]  __swiotlb_alloc+0x3c/0x178
[    4.560958]  arch_dma_alloc+0xd0/0x198
[    4.564664]  dma_direct_alloc+0x20/0x28
[    4.568456]  __swiotlb_alloc+0x3c/0x178
[    4.572248]  arch_dma_alloc+0xd0/0x198
[    4.575954]  dma_direct_alloc+0x20/0x28
[    4.579747]  __swiotlb_alloc+0x3c/0x178
[    4.583539]  arch_dma_alloc+0xd0/0x198
[    4.587245]  dma_direct_alloc+0x20/0x28
[    4.591037]  __swiotlb_alloc+0x3c/0x178
[    4.594829]  arch_dma_alloc+0xd0/0x198
[    4.598535]  dma_direct_alloc+0x20/0x28
[    4.602327]  __swiotlb_alloc+0x3c/0x178
[    4.606120]  arch_dma_alloc+0xd0/0x198
[    4.609826]  dma_direct_alloc+0x20/0x28
[    4.613618]  __swiotlb_alloc+0x3c/0x178
[    4.617410]  arch_dma_alloc+0xd0/0x198
[    4.621116]  dma_direct_alloc+0x20/0x28
[    4.624908]  __swiotlb_alloc+0x3c/0x178
[    4.628701]  arch_dma_alloc+0xd0/0x198
[    4.632407]  dma_direct_alloc+0x20/0x28
[    4.636199]  __swiotlb_alloc+0x3c/0x178
[    4.639991]  arch_dma_alloc+0xd0/0x198
[    4.643697]  dma_direct_alloc+0x20/0x28
[    4.647490]  __swiotlb_alloc+0x3c/0x178
[    4.651282]  arch_dma_alloc+0xd0/0x198
[    4.654988]  dma_direct_alloc+0x20/0x28
[    4.658780]  __swiotlb_alloc+0x3c/0x178
[    4.662572]  arch_dma_alloc+0xd0/0x198
[    4.666278]  dma_direct_alloc+0x20/0x28
[    4.670070]  __swiotlb_alloc+0x3c/0x178
[    4.673862]  arch_dma_alloc+0xd0/0x198
[    4.677568]  dma_direct_alloc+0x20/0x28
[    4.681361]  __swiotlb_alloc+0x3c/0x178
[    4.685153]  arch_dma_alloc+0xd0/0x198
[    4.688859]  dma_direct_alloc+0x20/0x28
[    4.692651]  __swiotlb_alloc+0x3c/0x178
[    4.696443]  arch_dma_alloc+0xd0/0x198
[    4.700148]  dma_direct_alloc+0x20/0x28
[    4.703941]  __swiotlb_alloc+0x3c/0x178
[    4.707733]  arch_dma_alloc+0xd0/0x198
[    4.711439]  dma_direct_alloc+0x20/0x28
[    4.715231]  __swiotlb_alloc+0x3c/0x178
[    4.719023]  arch_dma_alloc+0xd0/0x198
[    4.722729]  dma_direct_alloc+0x20/0x28
[    4.726522]  __swiotlb_alloc+0x3c/0x178
[    4.730313]  arch_dma_alloc+0xd0/0x198
[    4.734019]  dma_direct_alloc+0x20/0x28
[    4.737811]  __swiotlb_alloc+0x3c/0x178
[    4.741604]  arch_dma_alloc+0xd0/0x198
[    4.745310]  dma_direct_alloc+0x20/0x28
[    4.749102]  __swiotlb_alloc+0x3c/0x178
[    4.752894]  arch_dma_alloc+0xd0/0x198
[    4.756600]  dma_direct_alloc+0x20/0x28
[    4.760393]  __swiotlb_alloc+0x3c/0x178
[    4.764185]  arch_dma_alloc+0xd0/0x198
[    4.767891]  dma_direct_alloc+0x20/0x28
[    4.771683]  __swiotlb_alloc+0x3c/0x178
[    4.775475]  arch_dma_alloc+0xd0/0x198
[    4.779181]  dma_direct_alloc+0x20/0x28
[    4.782973]  __swiotlb_alloc+0x3c/0x178
[    4.786765]  arch_dma_alloc+0xd0/0x198
[    4.790471]  dma_direct_alloc+0x20/0x28
[    4.794263]  __swiotlb_alloc+0x3c/0x178
[    4.798055]  arch_dma_alloc+0xd0/0x198
[    4.801761]  dma_direct_alloc+0x20/0x28
[    4.805553]  __swiotlb_alloc+0x3c/0x178
[    4.809345]  arch_dma_alloc+0xd0/0x198
[    4.813051]  dma_direct_alloc+0x20/0x28
[    4.816844]  __swiotlb_alloc+0x3c/0x178
[    4.820636]  arch_dma_alloc+0xd0/0x198
[    4.824342]  dma_direct_alloc+0x20/0x28
[    4.828134]  __swiotlb_alloc+0x3c/0x178
[    4.831926]  arch_dma_alloc+0xd0/0x198
[    4.835632]  dma_direct_alloc+0x20/0x28
[    4.839425]  __swiotlb_alloc+0x3c/0x178
[    4.843217]  arch_dma_alloc+0xd0/0x198
[    4.846923]  dma_direct_alloc+0x20/0x28
[    4.850715]  __swiotlb_alloc+0x3c/0x178
[    4.854507]  arch_dma_alloc+0xd0/0x198
[    4.858212]  dma_direct_alloc+0x20/0x28
[    4.862005]  __swiotlb_alloc+0x3c/0x178
[    4.865797]  arch_dma_alloc+0xd0/0x198
[    4.869503]  dma_direct_alloc+0x20/0x28
[    4.873295]  __swiotlb_alloc+0x3c/0x178
[    4.877087]  arch_dma_alloc+0xd0/0x198
[    4.880793]  dma_direct_alloc+0x20/0x28
[    4.884586]  __swiotlb_alloc+0x3c/0x178
[    4.888377]  arch_dma_alloc+0xd0/0x198
[    4.892083]  dma_direct_alloc+0x20/0x28
[    4.895875]  __swiotlb_alloc+0x3c/0x178
[    4.899667]  arch_dma_alloc+0xd0/0x198
[    4.903373]  dma_direct_alloc+0x20/0x28
[    4.907165]  __swiotlb_alloc+0x3c/0x178
[    4.910957]  arch_dma_alloc+0xd0/0x198
[    4.914663]  dma_direct_alloc+0x20/0x28
[    4.918455]  __swiotlb_alloc+0x3c/0x178
[    4.922247]  arch_dma_alloc+0xd0/0x198
[    4.925953]  dma_direct_alloc+0x20/0x28
[    4.929746]  __swiotlb_alloc+0x3c/0x178
[    4.933538]  arch_dma_alloc+0xd0/0x198
[    4.937244]  dma_direct_alloc+0x20/0x28
[    4.941036]  __swiotlb_alloc+0x3c/0x178
[    4.944828]  arch_dma_alloc+0xd0/0x198
[    4.948534]  dma_direct_alloc+0x20/0x28
[    4.952327]  __swiotlb_alloc+0x3c/0x178
[    4.956119]  arch_dma_alloc+0xd0/0x198
[    4.959825]  dma_direct_alloc+0x20/0x28
[    4.963617]  __swiotlb_alloc+0x3c/0x178
[    4.967409]  arch_dma_alloc+0xd0/0x198
[    4.971115]  dma_direct_alloc+0x20/0x28
[    4.974907]  __swiotlb_alloc+0x3c/0x178
[    4.978699]  arch_dma_alloc+0xd0/0x198
[    4.982405]  dma_direct_alloc+0x20/0x28
[    4.986197]  __swiotlb_alloc+0x3c/0x178
[    4.989990]  arch_dma_alloc+0xd0/0x198
[    4.993695]  dma_direct_alloc+0x20/0x28
[    4.997488]  __swiotlb_alloc+0x3c/0x178
[    5.001280]  arch_dma_alloc+0xd0/0x198
[    5.004986]  dma_direct_alloc+0x20/0x28
[    5.008778]  __swiotlb_alloc+0x3c/0x178
[    5.012570]  arch_dma_alloc+0xd0/0x198
[    5.016276]  dma_direct_alloc+0x20/0x28
[    5.020069]  __swiotlb_alloc+0x3c/0x178
[    5.023861]  arch_dma_alloc+0xd0/0x198
[    5.027567]  dma_direct_alloc+0x20/0x28
[    5.031359]  __swiotlb_alloc+0x3c/0x178
[    5.035151]  arch_dma_alloc+0xd0/0x198
[    5.038857]  dma_direct_alloc+0x20/0x28
[    5.042650]  __swiotlb_alloc+0x3c/0x178
[    5.046442]  arch_dma_alloc+0xd0/0x198
[    5.050148]  dma_direct_alloc+0x20/0x28
[    5.053940]  __swiotlb_alloc+0x3c/0x178
[    5.057732]  arch_dma_alloc+0xd0/0x198
[    5.061438]  dma_direct_alloc+0x20/0x28
[    5.065230]  __swiotlb_alloc+0x3c/0x178
[    5.069023]  arch_dma_alloc+0xd0/0x198
[    5.072728]  dma_direct_alloc+0x20/0x28
[    5.076521]  __swiotlb_alloc+0x3c/0x178
[    5.080313]  arch_dma_alloc+0xd0/0x198
[    5.084019]  dma_direct_alloc+0x20/0x28
[    5.087811]  __swiotlb_alloc+0x3c/0x178
[    5.091604]  arch_dma_alloc+0xd0/0x198
[    5.095309]  dma_direct_alloc+0x20/0x28
[    5.099102]  __swiotlb_alloc+0x3c/0x178
[    5.102894]  arch_dma_alloc+0xd0/0x198
[    5.106600]  dma_direct_alloc+0x20/0x28
[    5.110392]  __swiotlb_alloc+0x3c/0x178
[    5.114184]  arch_dma_alloc+0xd0/0x198
[    5.117890]  dma_direct_alloc+0x20/0x28
[    5.121682]  __swiotlb_alloc+0x3c/0x178
[    5.125474]  arch_dma_alloc+0xd0/0x198
[    5.129180]  dma_direct_alloc+0x20/0x28
[    5.132973]  __swiotlb_alloc+0x3c/0x178
[    5.136765]  arch_dma_alloc+0xd0/0x198
[    5.140470]  dma_direct_alloc+0x20/0x28
[    5.144262]  __swiotlb_alloc+0x3c/0x178
[    5.148054]  arch_dma_alloc+0xd0/0x198
[    5.151760]  dma_direct_alloc+0x20/0x28
[    5.155552]  __swiotlb_alloc+0x3c/0x178
[    5.159345]  arch_dma_alloc+0xd0/0x198
[    5.163050]  dma_direct_alloc+0x20/0x28
[    5.166843]  __swiotlb_alloc+0x3c/0x178
[    5.170635]  arch_dma_alloc+0xd0/0x198
[    5.174341]  dma_direct_alloc+0x20/0x28
[    5.178133]  __swiotlb_alloc+0x3c/0x178
[    5.181925]  arch_dma_alloc+0xd0/0x198
[    5.185631]  dma_direct_alloc+0x20/0x28
[    5.189423]  __swiotlb_alloc+0x3c/0x178
[    5.193215]  arch_dma_alloc+0xd0/0x198
[    5.196921]  dma_direct_alloc+0x20/0x28
[    5.200714]  __swiotlb_alloc+0x3c/0x178
[    5.204506]  arch_dma_alloc+0xd0/0x198
[    5.208212]  swiotlb_alloc+0x20/0x28
[    5.211748]  pl330_probe+0x344/0xaf0
[    5.215283]  amba_probe+0xe8/0x1b8
[    5.218646]  really_probe+0xdc/0x3d0
[    5.222181]  driver_probe_device+0x5c/0x148
[    5.226318]  __device_attach_driver+0xa8/0x160
[    5.230712]  bus_for_each_drv+0x64/0xc8
[    5.234505]  __device_attach+0xd8/0x158
[    5.238298]  device_initial_probe+0x10/0x18
[    5.242434]  bus_probe_device+0x90/0x98
[    5.246227]  deferred_probe_work_func+0x88/0xe0
[    5.250708]  process_one_work+0x1e0/0x330
[    5.254673]  worker_thread+0x238/0x460
[    5.258380]  kthread+0x128/0x130
[    5.261571]  ret_from_fork+0x10/0x1c
[    5.265109] SMP: stopping secondary CPUs
[    5.268990] Kernel Offset: disabled
[    5.272438] CPU features: 0x0,25806004
[    5.276143] Memory Limit: none
[    5.279168] ---[ end Kernel panic - not syncing: kernel stack 
overflow ]---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: move swiotlb noncoherent dma support from arm64 to generic code
  2018-09-18 13:28 ` move swiotlb noncoherent dma support from arm64 to generic code Robin Murphy
@ 2018-09-18 16:11   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2018-09-18 16:11 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Will Deacon, Catalin Marinas,
	Konrad Rzeszutek Wilk, linux-arm-kernel, iommu, linux-kernel

On Tue, Sep 18, 2018 at 02:28:42PM +0100, Robin Murphy wrote:
> On 17/09/18 16:38, Christoph Hellwig wrote:
>> Hi all,
>>
>> this series starts with various swiotlb cleanups, then adds support for
>> non-cache coherent devices to the generic swiotlb support, and finally
>> switches arm64 to use the generic code.
>
> I think there's going to be an issue with the embedded folks' grubby hack 
> in arm64's mem_init() which skips initialising SWIOTLB at all with 
> sufficiently little DRAM. I've been waiting for 
> dma-direct-noncoherent-merge so that I could fix that case to swizzle in 
> dma_direct_ops and avoid swiotlb_dma_ops entirely.

I wait for your review of dma-direct-noncoherent-merge to put it
into dma-mapping for-next..

That being said one thing I'm investigating is to eventually further
merge dma_direct_ops and swiotlb_ops - the reason for that beeing that
I want to remove the indirect calls for the common direct mapping case,
and if we don't merge them that will get complicated.  Note that
swiotlb will generally just work if you don't initialize the buffer
as long as we never see a physical address large enough to cause bounce
buffering.

>
>> Given that this series depends on patches in the dma-mapping tree, or
>> pending for it I've also published a git tree here:
>>
>>      git://git.infradead.org/users/hch/misc.git swiotlb-noncoherent
>
> However, upon sitting down to eagerly write that patch I've just 
> boot-tested the above branch as-is for a baseline and discovered a rather 
> more significant problem: arch_dma_alloc() is recursing back into 
> __swiotlb_alloc() and blowing the stack. Not good :(

Oops, I messed up when renaming things.  Try this patch on top:

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 83e597101c6a..c75c721eb74e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -955,7 +955,7 @@ void *__swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	 */
 	gfp |= __GFP_NOWARN;
 
-	vaddr = dma_direct_alloc(dev, size, dma_handle, gfp, attrs);
+	vaddr = dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
 	if (!vaddr)
 		vaddr = swiotlb_alloc_buffer(dev, size, dma_handle, attrs);
 	return vaddr;
@@ -973,7 +973,7 @@ void __swiotlb_free(struct device *dev, size_t size, void *vaddr,
 		dma_addr_t dma_addr, unsigned long attrs)
 {
 	if (!swiotlb_free_buffer(dev, size, dma_addr))
-		dma_direct_free(dev, size, vaddr, dma_addr, attrs);
+		dma_direct_free_pages(dev, size, vaddr, dma_addr, attrs);
 }
 
 static void swiotlb_free(struct device *dev, size_t size, void *vaddr,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-18 16:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-17 15:38 move swiotlb noncoherent dma support from arm64 to generic code Christoph Hellwig
2018-09-17 15:38 ` [PATCH 1/9] swiotlb: remove a pointless comment Christoph Hellwig
2018-09-17 15:38 ` [PATCH 2/9] swiotlb: mark is_swiotlb_buffer static Christoph Hellwig
2018-09-17 15:38 ` [PATCH 3/9] swiotlb: do not panic on mapping failures Christoph Hellwig
2018-09-17 15:38 ` [PATCH 4/9] swiotlb: remove the overflow buffer Christoph Hellwig
2018-09-17 15:38 ` [PATCH 5/9] swiotlb: merge swiotlb_unmap_page and unmap_single Christoph Hellwig
2018-09-17 15:38 ` [PATCH 6/9] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs Christoph Hellwig
2018-09-17 15:38 ` [PATCH 7/9] swiotlb: refactor swiotlb_map_page Christoph Hellwig
2018-09-17 15:38 ` [PATCH 8/9] swiotlb: add support for non-coherent DMA Christoph Hellwig
2018-09-17 15:38 ` [PATCH 9/9] arm64: use the generic swiotlb_dma_ops Christoph Hellwig
2018-09-18 13:28 ` move swiotlb noncoherent dma support from arm64 to generic code Robin Murphy
2018-09-18 16:11   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).