LKML Archive on lore.kernel.org
 help / color / Atom feed
* [RFC] avoid indirect calls for DMA direct mappings v2
@ 2018-12-07 19:07 Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 01/15] swiotlb: remove SWIOTLB_MAP_ERROR Christoph Hellwig
                   ` (17 more replies)
  0 siblings, 18 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

Hi all,

a while ago Jesper reported major performance regressions due to the
spectre v2 mitigations in his XDP forwarding workloads.  A large part
of that is due to the DMA mapping API indirect calls.

It turns out that the most common implementation of the DMA API is the
direct mapping case, and now that we have merged almost all duplicate
implementations of that into a single generic one is easily feasily to
direct calls for this fast path.

This series adds consolidate the DMA mapping code by merging the
swiotlb case into the dma direct case, and then treats NULL dma_ops
as an indicator that that we should directly call the direct mapping
case.  This recovers a large part of the retpoline induces XDP slowdown.

This works is based on the dma-mapping tree, so you probably want to
want this git tree for testing:

    git://git.infradead.org/users/hch/misc.git dma-direct-calls.2

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2

Changes since v1:
 - now also includes all the prep patches relative to the dma-mapping
   for-next tree
 - move various slow path functions out of line
 - use a NULL dma ops as the indicate to use the direct mapping path
 - remove dma_direct_ops now that we always call it without the indirection
 - move the dummy dma ops to common code
 - explicitly st the dummy dma ops for devices that are indicates as not
   DMA capable by firmware

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/15] swiotlb: remove SWIOTLB_MAP_ERROR
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 02/15] swiotlb: remove dma_mark_clean Christoph Hellwig
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

We can use DMA_MAPPING_ERROR instead, which already maps to the same
value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/xen/swiotlb-xen.c | 4 ++--
 include/linux/swiotlb.h   | 3 ---
 kernel/dma/swiotlb.c      | 4 ++--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 6dc969d5ea2f..833e80b46eb2 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -403,7 +403,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 
 	map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
 				     attrs);
-	if (map == SWIOTLB_MAP_ERROR)
+	if (map == DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
 	dev_addr = xen_phys_to_bus(map);
@@ -572,7 +572,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
 								 sg_phys(sg),
 								 sg->length,
 								 dir, attrs);
-			if (map == SWIOTLB_MAP_ERROR) {
+			if (map == DMA_MAPPING_ERROR) {
 				dev_warn(hwdev, "swiotlb buffer is full\n");
 				/* Don't panic here, we expect map_sg users
 				   to do proper error handling. */
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a387b59640a4..14aec0b70dd9 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -46,9 +46,6 @@ enum dma_sync_target {
 	SYNC_FOR_DEVICE = 1,
 };
 
-/* define the last possible byte of physical address space as a mapping error */
-#define SWIOTLB_MAP_ERROR (~(phys_addr_t)0x0)
-
 extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 					  dma_addr_t tbl_dma_addr,
 					  phys_addr_t phys, size_t size,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ff1ce81bb623..19ba8e473d71 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -526,7 +526,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
 	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
 		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", size);
-	return SWIOTLB_MAP_ERROR;
+	return DMA_MAPPING_ERROR;
 found:
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
 
@@ -637,7 +637,7 @@ static dma_addr_t swiotlb_bounce_page(struct device *dev, phys_addr_t *phys,
 	/* Oh well, have to allocate and map a bounce buffer. */
 	*phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
 			*phys, size, dir, attrs);
-	if (*phys == SWIOTLB_MAP_ERROR)
+	if (*phys == DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
 	/* Ensure that the address returned is DMA'ble */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 02/15] swiotlb: remove dma_mark_clean
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 01/15] swiotlb: remove SWIOTLB_MAP_ERROR Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2019-01-02 21:53   ` Tony Luck
  2018-12-07 19:07 ` [PATCH 03/15] dma-direct: improve addressability error reporting Christoph Hellwig
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

Instead of providing a special dma_mark_clean hook just for ia64, switch
ia64 to use the normal arch_sync_dma_for_cpu hooks instead.

This means that we now also set the PG_arch_1 bit for pages in the
swiotlb buffer, which isn't stricly needed as we will never execute code
out of the swiotlb buffer, but otherwise harmless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/ia64/Kconfig              |  3 ++-
 arch/ia64/kernel/dma-mapping.c | 20 +++++++++++++++++++-
 arch/ia64/mm/init.c            | 18 +++++++-----------
 drivers/xen/swiotlb-xen.c      | 20 +-------------------
 include/linux/dma-direct.h     |  8 --------
 kernel/dma/swiotlb.c           | 18 +-----------------
 6 files changed, 30 insertions(+), 57 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index d6f203658994..c587e3316c38 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -28,7 +28,8 @@ config IA64
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_VIRT_CPU_ACCOUNTING
-	select ARCH_HAS_DMA_MARK_CLEAN
+	select ARCH_HAS_DMA_COHERENT_TO_PFN
+	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select VIRT_TO_BUS
 	select ARCH_DISCARD_MEMBLOCK
 	select GENERIC_IRQ_PROBE
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 7a471d8d67d4..36dd6aa6d759 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <linux/dma-mapping.h>
+#include <linux/dma-direct.h>
 #include <linux/swiotlb.h>
 #include <linux/export.h>
 
@@ -16,6 +16,24 @@ const struct dma_map_ops *dma_get_ops(struct device *dev)
 EXPORT_SYMBOL(dma_get_ops);
 
 #ifdef CONFIG_SWIOTLB
+void *arch_dma_alloc(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+	return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
+}
+
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+		dma_addr_t dma_addr, unsigned long attrs)
+{
+	dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
+}
+
+long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
+		dma_addr_t dma_addr)
+{
+	return page_to_pfn(virt_to_page(cpu_addr));
+}
+
 void __init swiotlb_dma_init(void)
 {
 	dma_ops = &swiotlb_dma_ops;
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index d5e12ff1d73c..2c51733f1dfd 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -71,18 +71,14 @@ __ia64_sync_icache_dcache (pte_t pte)
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
-void
-dma_mark_clean(void *addr, size_t size)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir)
 {
-	unsigned long pg_addr, end;
-
-	pg_addr = PAGE_ALIGN((unsigned long) addr);
-	end = (unsigned long) addr + size;
-	while (pg_addr + PAGE_SIZE <= end) {
-		struct page *page = virt_to_page(pg_addr);
-		set_bit(PG_arch_1, &page->flags);
-		pg_addr += PAGE_SIZE;
-	}
+	unsigned long pfn = __phys_to_pfn(paddr);
+
+	do {
+		set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
+	} while (++pfn <= __phys_to_pfn(paddr + size - 1));
 }
 
 inline void
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 833e80b46eb2..989cf872b98c 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -441,21 +441,8 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
 	xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs);
 
 	/* NOTE: We use dev_addr here, not paddr! */
-	if (is_xen_swiotlb_buffer(dev_addr)) {
+	if (is_xen_swiotlb_buffer(dev_addr))
 		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
-		return;
-	}
-
-	if (dir != DMA_FROM_DEVICE)
-		return;
-
-	/*
-	 * phys_to_virt doesn't work with hihgmem page but we could
-	 * call dma_mark_clean() with hihgmem page here. However, we
-	 * are fine since dma_mark_clean() is null on POWERPC. We can
-	 * make dma_mark_clean() take a physical address if necessary.
-	 */
-	dma_mark_clean(phys_to_virt(paddr), size);
 }
 
 static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
@@ -493,11 +480,6 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
 
 	if (target == SYNC_FOR_DEVICE)
 		xen_dma_sync_single_for_device(hwdev, dev_addr, size, dir);
-
-	if (dir != DMA_FROM_DEVICE)
-		return;
-
-	dma_mark_clean(phys_to_virt(paddr), size);
 }
 
 void
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6e5a47ae7d64..1aa73f4907ae 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -48,14 +48,6 @@ static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 	return __sme_clr(__dma_to_phys(dev, daddr));
 }
 
-#ifdef CONFIG_ARCH_HAS_DMA_MARK_CLEAN
-void dma_mark_clean(void *addr, size_t size);
-#else
-static inline void dma_mark_clean(void *addr, size_t size)
-{
-}
-#endif /* CONFIG_ARCH_HAS_DMA_MARK_CLEAN */
-
 u64 dma_direct_get_required_mask(struct device *dev);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		gfp_t gfp, unsigned long attrs);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 19ba8e473d71..2e126bac5d7d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -706,21 +706,8 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
 		arch_sync_dma_for_cpu(hwdev, paddr, size, dir);
 
-	if (is_swiotlb_buffer(paddr)) {
+	if (is_swiotlb_buffer(paddr))
 		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
-		return;
-	}
-
-	if (dir != DMA_FROM_DEVICE)
-		return;
-
-	/*
-	 * phys_to_virt doesn't work with hihgmem page but we could
-	 * call dma_mark_clean() with hihgmem page here. However, we
-	 * are fine since dma_mark_clean() is null on POWERPC. We can
-	 * make dma_mark_clean() take a physical address if necessary.
-	 */
-	dma_mark_clean(phys_to_virt(paddr), size);
 }
 
 /*
@@ -750,9 +737,6 @@ swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
 
 	if (!dev_is_dma_coherent(hwdev) && target == SYNC_FOR_DEVICE)
 		arch_sync_dma_for_device(hwdev, paddr, size, dir);
-
-	if (!is_swiotlb_buffer(paddr) && dir == DMA_FROM_DEVICE)
-		dma_mark_clean(phys_to_virt(paddr), size);
 }
 
 void
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 03/15] dma-direct: improve addressability error reporting
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 01/15] swiotlb: remove SWIOTLB_MAP_ERROR Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 02/15] swiotlb: remove dma_mark_clean Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 04/15] dma-direct: use dma_direct_map_page to implement dma_direct_map_sg Christoph Hellwig
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

Only report report a DMA addressability report once to avoid spewing the
kernel log with repeated message.  Also provide a stack trace to make it
easy to find the actual caller that caused the problem.

Last but not least move the actual check into the fast path and only
leave the error reporting in a helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/direct.c | 36 +++++++++++++++---------------------
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 308f88a750c8..edb24f94ea1e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -30,27 +30,16 @@ static inline bool force_dma_unencrypted(void)
 	return sev_active();
 }
 
-static bool
-check_addr(struct device *dev, dma_addr_t dma_addr, size_t size,
-		const char *caller)
+static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
-	if (unlikely(dev && !dma_capable(dev, dma_addr, size))) {
-		if (!dev->dma_mask) {
-			dev_err(dev,
-				"%s: call on device without dma_mask\n",
-				caller);
-			return false;
-		}
-
-		if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
-			dev_err(dev,
-				"%s: overflow %pad+%zu of device mask %llx bus mask %llx\n",
-				caller, &dma_addr, size,
-				*dev->dma_mask, dev->bus_dma_mask);
-		}
-		return false;
+	if (!dev->dma_mask) {
+		dev_err_once(dev, "DMA map on device without dma_mask\n");
+	} else if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
+		dev_err_once(dev,
+			"overflow %pad+%zu of DMA mask %llx bus mask %llx\n",
+			&dma_addr, size, *dev->dma_mask, dev->bus_dma_mask);
 	}
-	return true;
+	WARN_ON_ONCE(1);
 }
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
@@ -288,8 +277,10 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 	phys_addr_t phys = page_to_phys(page) + offset;
 	dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-	if (!check_addr(dev, dma_addr, size, __func__))
+	if (unlikely(dev && !dma_capable(dev, dma_addr, size))) {
+		report_addr(dev, dma_addr, size);
 		return DMA_MAPPING_ERROR;
+	}
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_device(dev, dma_addr, size, dir);
@@ -306,8 +297,11 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 		BUG_ON(!sg_page(sg));
 
 		sg_dma_address(sg) = phys_to_dma(dev, sg_phys(sg));
-		if (!check_addr(dev, sg_dma_address(sg), sg->length, __func__))
+		if (unlikely(dev && !dma_capable(dev, sg_dma_address(sg),
+				sg->length))) {
+			report_addr(dev, sg_dma_address(sg), sg->length);
 			return 0;
+		}
 		sg_dma_len(sg) = sg->length;
 	}
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 04/15] dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 03/15] dma-direct: improve addressability error reporting Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 05/15] dma-direct: merge swiotlb_dma_ops into the dma_direct code Christoph Hellwig
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

No need to duplicate the mapping logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 kernel/dma/direct.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index edb24f94ea1e..d45306473c90 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -217,6 +217,7 @@ static void dma_direct_sync_single_for_device(struct device *dev,
 	arch_sync_dma_for_device(dev, dma_to_phys(dev, addr), size, dir);
 }
 
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE)
 static void dma_direct_sync_sg_for_device(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
@@ -229,6 +230,7 @@ static void dma_direct_sync_sg_for_device(struct device *dev,
 	for_each_sg(sgl, sg, nents, i)
 		arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
+#endif
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
@@ -294,19 +296,13 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 	struct scatterlist *sg;
 
 	for_each_sg(sgl, sg, nents, i) {
-		BUG_ON(!sg_page(sg));
-
-		sg_dma_address(sg) = phys_to_dma(dev, sg_phys(sg));
-		if (unlikely(dev && !dma_capable(dev, sg_dma_address(sg),
-				sg->length))) {
-			report_addr(dev, sg_dma_address(sg), sg->length);
+		sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
+				sg->offset, sg->length, dir, attrs);
+		if (sg->dma_address == DMA_MAPPING_ERROR)
 			return 0;
-		}
 		sg_dma_len(sg) = sg->length;
 	}
 
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		dma_direct_sync_sg_for_device(dev, sgl, nents, dir);
 	return nents;
 }
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 05/15] dma-direct: merge swiotlb_dma_ops into the dma_direct code
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 04/15] dma-direct: use dma_direct_map_page to implement dma_direct_map_sg Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 06/15] dma-mapping: simplify the dma_sync_single_range_for_{cpu,device} implementation Christoph Hellwig
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

While the dma-direct code is (relatively) clean and simple we actually
have to use the swiotlb ops for the mapping on many architectures due
to devices with addressing limits.  Instead of keeping two
implementations around this commit allows the dma-direct
implementation to call the swiotlb bounce buffering functions and
thus share the guts of the mapping implementation.  This also
simplified the dma-mapping setup on a few architectures where we
don't have to differenciate which implementation to use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/mm/dma-mapping.c          |   2 +-
 arch/ia64/hp/common/hwsw_iommu.c     |   2 +-
 arch/ia64/hp/common/sba_iommu.c      |   6 +-
 arch/ia64/kernel/dma-mapping.c       |   2 +-
 arch/mips/include/asm/dma-mapping.h  |   2 -
 arch/powerpc/kernel/dma-swiotlb.c    |  16 +-
 arch/riscv/include/asm/dma-mapping.h |  15 --
 arch/x86/kernel/pci-swiotlb.c        |   4 +-
 arch/x86/mm/mem_encrypt.c            |   7 -
 arch/x86/pci/sta2x11-fixup.c         |   1 -
 include/linux/dma-direct.h           |  12 ++
 include/linux/swiotlb.h              |  74 ++++-----
 kernel/dma/direct.c                  | 113 +++++++++----
 kernel/dma/swiotlb.c                 | 232 ++-------------------------
 14 files changed, 150 insertions(+), 338 deletions(-)
 delete mode 100644 arch/riscv/include/asm/dma-mapping.h

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4c0f498069e8..e4effbb243b1 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -549,7 +549,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			const struct iommu_ops *iommu, bool coherent)
 {
 	if (!dev->dma_ops)
-		dev->dma_ops = &swiotlb_dma_ops;
+		dev->dma_ops = &dma_direct_ops;
 
 	dev->dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
index 58969039bed2..f40ca499b246 100644
--- a/arch/ia64/hp/common/hwsw_iommu.c
+++ b/arch/ia64/hp/common/hwsw_iommu.c
@@ -38,7 +38,7 @@ static inline int use_swiotlb(struct device *dev)
 const struct dma_map_ops *hwsw_dma_get_ops(struct device *dev)
 {
 	if (use_swiotlb(dev))
-		return &swiotlb_dma_ops;
+		return &dma_direct_ops;
 	return &sba_dma_ops;
 }
 EXPORT_SYMBOL(hwsw_dma_get_ops);
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 0d21c0b5b23d..5ee74820a0f6 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2065,8 +2065,6 @@ static int __init acpi_sba_ioc_init_acpi(void)
 /* This has to run before acpi_scan_init(). */
 arch_initcall(acpi_sba_ioc_init_acpi);
 
-extern const struct dma_map_ops swiotlb_dma_ops;
-
 static int __init
 sba_init(void)
 {
@@ -2080,7 +2078,7 @@ sba_init(void)
 	 * a successful kdump kernel boot is to use the swiotlb.
 	 */
 	if (is_kdump_kernel()) {
-		dma_ops = &swiotlb_dma_ops;
+		dma_ops = &dma_direct_ops;
 		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
 			panic("Unable to initialize software I/O TLB:"
 				  " Try machvec=dig boot option");
@@ -2102,7 +2100,7 @@ sba_init(void)
 		 * If we didn't find something sba_iommu can claim, we
 		 * need to setup the swiotlb and switch to the dig machvec.
 		 */
-		dma_ops = &swiotlb_dma_ops;
+		dma_ops = &dma_direct_ops;
 		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
 			panic("Unable to find SBA IOMMU or initialize "
 			      "software I/O TLB: Try machvec=dig boot option");
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 36dd6aa6d759..80cd3e1ea95a 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -36,7 +36,7 @@ long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
 
 void __init swiotlb_dma_init(void)
 {
-	dma_ops = &swiotlb_dma_ops;
+	dma_ops = &dma_direct_ops;
 	swiotlb_init(1);
 }
 #endif
diff --git a/arch/mips/include/asm/dma-mapping.h b/arch/mips/include/asm/dma-mapping.h
index b4c477eb46ce..69f914667f3e 100644
--- a/arch/mips/include/asm/dma-mapping.h
+++ b/arch/mips/include/asm/dma-mapping.h
@@ -10,8 +10,6 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
 #if defined(CONFIG_MACH_JAZZ)
 	return &jazz_dma_ops;
-#elif defined(CONFIG_SWIOTLB)
-	return &swiotlb_dma_ops;
 #else
 	return &dma_direct_ops;
 #endif
diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c
index 3d8df2cf8be9..430a7d0aa2cb 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -50,15 +50,15 @@ const struct dma_map_ops powerpc_swiotlb_dma_ops = {
 	.alloc = __dma_nommu_alloc_coherent,
 	.free = __dma_nommu_free_coherent,
 	.mmap = dma_nommu_mmap_coherent,
-	.map_sg = swiotlb_map_sg_attrs,
-	.unmap_sg = swiotlb_unmap_sg_attrs,
+	.map_sg = dma_direct_map_sg,
+	.unmap_sg = dma_direct_unmap_sg,
 	.dma_supported = swiotlb_dma_supported,
-	.map_page = swiotlb_map_page,
-	.unmap_page = swiotlb_unmap_page,
-	.sync_single_for_cpu = swiotlb_sync_single_for_cpu,
-	.sync_single_for_device = swiotlb_sync_single_for_device,
-	.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
-	.sync_sg_for_device = swiotlb_sync_sg_for_device,
+	.map_page = dma_direct_map_page,
+	.unmap_page = dma_direct_unmap_page,
+	.sync_single_for_cpu = dma_direct_sync_single_for_cpu,
+	.sync_single_for_device = dma_direct_sync_single_for_device,
+	.sync_sg_for_cpu = dma_direct_sync_sg_for_cpu,
+	.sync_sg_for_device = dma_direct_sync_sg_for_device,
 	.get_required_mask = swiotlb_powerpc_get_required,
 };
 
diff --git a/arch/riscv/include/asm/dma-mapping.h b/arch/riscv/include/asm/dma-mapping.h
deleted file mode 100644
index 8facc1c8fa05..000000000000
--- a/arch/riscv/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,15 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#ifndef _RISCV_ASM_DMA_MAPPING_H
-#define _RISCV_ASM_DMA_MAPPING_H 1
-
-#ifdef CONFIG_SWIOTLB
-#include <linux/swiotlb.h>
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-	return &swiotlb_dma_ops;
-}
-#else
-#include <asm-generic/dma-mapping.h>
-#endif /* CONFIG_SWIOTLB */
-
-#endif /* _RISCV_ASM_DMA_MAPPING_H */
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index bd08b9e1c9e2..5f5302028a9a 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -62,10 +62,8 @@ IOMMU_INIT(pci_swiotlb_detect_4gb,
 
 void __init pci_swiotlb_init(void)
 {
-	if (swiotlb) {
+	if (swiotlb)
 		swiotlb_init(0);
-		dma_ops = &swiotlb_dma_ops;
-	}
 }
 
 void __init pci_swiotlb_late_init(void)
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 006f373f54ab..385afa2b9e17 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -380,13 +380,6 @@ void __init mem_encrypt_init(void)
 	/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
 	swiotlb_update_mem_attributes();
 
-	/*
-	 * With SEV, DMA operations cannot use encryption, we need to use
-	 * SWIOTLB to bounce buffer DMA operation.
-	 */
-	if (sev_active())
-		dma_ops = &swiotlb_dma_ops;
-
 	/*
 	 * With SEV, we need to unroll the rep string I/O instructions.
 	 */
diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
index 7a5bafb76d77..3cdafea55ab6 100644
--- a/arch/x86/pci/sta2x11-fixup.c
+++ b/arch/x86/pci/sta2x11-fixup.c
@@ -168,7 +168,6 @@ static void sta2x11_setup_pdev(struct pci_dev *pdev)
 		return;
 	pci_set_consistent_dma_mask(pdev, STA2X11_AMBA_SIZE - 1);
 	pci_set_dma_mask(pdev, STA2X11_AMBA_SIZE - 1);
-	pdev->dev.dma_ops = &swiotlb_dma_ops;
 	pdev->dev.archdata.is_sta2x11 = true;
 
 	/* We must enable all devices as master, for audio DMA to work */
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 1aa73f4907ae..3b0a3ea3876d 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -63,7 +63,19 @@ void __dma_direct_free_pages(struct device *dev, size_t size, struct page *page)
 dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, enum dma_data_direction dir,
 		unsigned long attrs);
+void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs);
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 		enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
+		int nents, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_sync_single_for_device(struct device *dev,
+		dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_device(struct device *dev,
+		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
+void dma_direct_sync_single_for_cpu(struct device *dev,
+		dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_cpu(struct device *dev,
+		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
 int dma_direct_supported(struct device *dev, u64 mask);
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 14aec0b70dd9..7c007ed7505f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -16,8 +16,6 @@ enum swiotlb_force {
 	SWIOTLB_NO_FORCE,	/* swiotlb=noforce */
 };
 
-extern enum swiotlb_force swiotlb_force;
-
 /*
  * Maximum allowable number of contiguous slabs to map,
  * must be a power of 2.  What is the appropriate value ?
@@ -62,56 +60,44 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
 				    size_t size, enum dma_data_direction dir,
 				    enum dma_sync_target target);
 
-/* Accessory functions. */
-
-extern dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
-				   unsigned long offset, size_t size,
-				   enum dma_data_direction dir,
-				   unsigned long attrs);
-extern void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
-			       size_t size, enum dma_data_direction dir,
-			       unsigned long attrs);
-
-extern int
-swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
-		     enum dma_data_direction dir,
-		     unsigned long attrs);
-
-extern void
-swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
-		       int nelems, enum dma_data_direction dir,
-		       unsigned long attrs);
-
-extern void
-swiotlb_sync_single_for_cpu(struct device *hwdev, dma_addr_t dev_addr,
-			    size_t size, enum dma_data_direction dir);
-
-extern void
-swiotlb_sync_sg_for_cpu(struct device *hwdev, struct scatterlist *sg,
-			int nelems, enum dma_data_direction dir);
-
-extern void
-swiotlb_sync_single_for_device(struct device *hwdev, dma_addr_t dev_addr,
-			       size_t size, enum dma_data_direction dir);
-
-extern void
-swiotlb_sync_sg_for_device(struct device *hwdev, struct scatterlist *sg,
-			   int nelems, enum dma_data_direction dir);
-
 extern int
 swiotlb_dma_supported(struct device *hwdev, u64 mask);
 
 #ifdef CONFIG_SWIOTLB
-extern void __init swiotlb_exit(void);
+extern enum swiotlb_force swiotlb_force;
+extern phys_addr_t io_tlb_start, io_tlb_end;
+
+static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+{
+	return paddr >= io_tlb_start && paddr < io_tlb_end;
+}
+
+bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs);
+void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 #else
-static inline void swiotlb_exit(void) { }
-static inline unsigned int swiotlb_max_segment(void) { return 0; }
-#endif
+#define swiotlb_force SWIOTLB_NO_FORCE
+static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+{
+	return false;
+}
+static inline bool swiotlb_map(struct device *dev, phys_addr_t *phys,
+		dma_addr_t *dma_addr, size_t size, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	return false;
+}
+static inline void swiotlb_exit(void)
+{
+}
+static inline unsigned int swiotlb_max_segment(void)
+{
+	return 0;
+}
+#endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
 extern void swiotlb_set_max_segment(unsigned int);
 
-extern const struct dma_map_ops swiotlb_dma_ops;
-
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index d45306473c90..85d8286a0ba2 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -13,6 +13,7 @@
 #include <linux/dma-noncoherent.h>
 #include <linux/pfn.h>
 #include <linux/set_memory.h>
+#include <linux/swiotlb.h>
 
 /*
  * Most architectures use ZONE_DMA for the first 16 Megabytes, but
@@ -209,69 +210,110 @@ void dma_direct_free(struct device *dev, size_t size,
 		dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
 }
 
-static void dma_direct_sync_single_for_device(struct device *dev,
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+    defined(CONFIG_SWIOTLB)
+void dma_direct_sync_single_for_device(struct device *dev,
 		dma_addr_t addr, size_t size, enum dma_data_direction dir)
 {
-	if (dev_is_dma_coherent(dev))
-		return;
-	arch_sync_dma_for_device(dev, dma_to_phys(dev, addr), size, dir);
+	phys_addr_t paddr = dma_to_phys(dev, addr);
+
+	if (unlikely(is_swiotlb_buffer(paddr)))
+		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_DEVICE);
+
+	if (!dev_is_dma_coherent(dev))
+		arch_sync_dma_for_device(dev, paddr, size, dir);
 }
 
-#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE)
-static void dma_direct_sync_sg_for_device(struct device *dev,
+void dma_direct_sync_sg_for_device(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_is_dma_coherent(dev))
-		return;
+	for_each_sg(sgl, sg, nents, i) {
+		if (unlikely(is_swiotlb_buffer(sg_phys(sg))))
+			swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length,
+					dir, SYNC_FOR_DEVICE);
 
-	for_each_sg(sgl, sg, nents, i)
-		arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
+		if (!dev_is_dma_coherent(dev))
+			arch_sync_dma_for_device(dev, sg_phys(sg), sg->length,
+					dir);
+	}
 }
 #endif
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
-    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
-static void dma_direct_sync_single_for_cpu(struct device *dev,
+    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
+    defined(CONFIG_SWIOTLB)
+void dma_direct_sync_single_for_cpu(struct device *dev,
 		dma_addr_t addr, size_t size, enum dma_data_direction dir)
 {
-	if (dev_is_dma_coherent(dev))
-		return;
-	arch_sync_dma_for_cpu(dev, dma_to_phys(dev, addr), size, dir);
-	arch_sync_dma_for_cpu_all(dev);
+	phys_addr_t paddr = dma_to_phys(dev, addr);
+
+	if (!dev_is_dma_coherent(dev)) {
+		arch_sync_dma_for_cpu(dev, paddr, size, dir);
+		arch_sync_dma_for_cpu_all(dev);
+	}
+
+	if (unlikely(is_swiotlb_buffer(paddr)))
+		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
 }
 
-static void dma_direct_sync_sg_for_cpu(struct device *dev,
+void dma_direct_sync_sg_for_cpu(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *sg;
 	int i;
 
-	if (dev_is_dma_coherent(dev))
-		return;
+	for_each_sg(sgl, sg, nents, i) {
+		if (!dev_is_dma_coherent(dev))
+			arch_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, dir);
+	
+		if (unlikely(is_swiotlb_buffer(sg_phys(sg))))
+			swiotlb_tbl_sync_single(dev, sg_phys(sg), sg->length, dir,
+					SYNC_FOR_CPU);
+	}
 
-	for_each_sg(sgl, sg, nents, i)
-		arch_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, dir);
-	arch_sync_dma_for_cpu_all(dev);
+	if (!dev_is_dma_coherent(dev))
+		arch_sync_dma_for_cpu_all(dev);
 }
 
-static void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
+	phys_addr_t phys = dma_to_phys(dev, addr);
+
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
+
+	if (unlikely(is_swiotlb_buffer(phys)))
+		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
-static void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
+void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
+		int nents, enum dma_data_direction dir, unsigned long attrs)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sgl, sg, nents, i)
+		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
+			     attrs);
+}
+#else
+void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		dma_direct_sync_sg_for_cpu(dev, sgl, nents, dir);
 }
 #endif
 
+static inline bool dma_direct_possible(struct device *dev, dma_addr_t dma_addr,
+		size_t size)
+{
+	return swiotlb_force != SWIOTLB_FORCE &&
+		(!dev || dma_capable(dev, dma_addr, size));
+}
+
 dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, enum dma_data_direction dir,
 		unsigned long attrs)
@@ -279,13 +321,14 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 	phys_addr_t phys = page_to_phys(page) + offset;
 	dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-	if (unlikely(dev && !dma_capable(dev, dma_addr, size))) {
+	if (unlikely(!dma_direct_possible(dev, dma_addr, size)) &&
+	    !swiotlb_map(dev, &phys, &dma_addr, size, dir, attrs)) {
 		report_addr(dev, dma_addr, size);
 		return DMA_MAPPING_ERROR;
 	}
 
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		dma_direct_sync_single_for_device(dev, dma_addr, size, dir);
+	if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		arch_sync_dma_for_device(dev, phys, size, dir);
 	return dma_addr;
 }
 
@@ -299,11 +342,15 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 		sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
 				sg->offset, sg->length, dir, attrs);
 		if (sg->dma_address == DMA_MAPPING_ERROR)
-			return 0;
+			goto out_unmap;
 		sg_dma_len(sg) = sg->length;
 	}
 
 	return nents;
+
+out_unmap:
+	dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
+	return 0;
 }
 
 /*
@@ -331,12 +378,14 @@ const struct dma_map_ops dma_direct_ops = {
 	.free			= dma_direct_free,
 	.map_page		= dma_direct_map_page,
 	.map_sg			= dma_direct_map_sg,
-#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE)
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+    defined(CONFIG_SWIOTLB)
 	.sync_single_for_device	= dma_direct_sync_single_for_device,
 	.sync_sg_for_device	= dma_direct_sync_sg_for_device,
 #endif
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
-    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
+    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
+    defined(CONFIG_SWIOTLB)
 	.sync_single_for_cpu	= dma_direct_sync_single_for_cpu,
 	.sync_sg_for_cpu	= dma_direct_sync_sg_for_cpu,
 	.unmap_page		= dma_direct_unmap_page,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2e126bac5d7d..d6361776dc5c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -21,7 +21,6 @@
 
 #include <linux/cache.h>
 #include <linux/dma-direct.h>
-#include <linux/dma-noncoherent.h>
 #include <linux/mm.h>
 #include <linux/export.h>
 #include <linux/spinlock.h>
@@ -65,7 +64,7 @@ enum swiotlb_force swiotlb_force;
  * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
  * API.
  */
-static phys_addr_t io_tlb_start, io_tlb_end;
+phys_addr_t io_tlb_start, io_tlb_end;
 
 /*
  * The number of IO TLB blocks (in groups of 64) between io_tlb_start and
@@ -383,11 +382,6 @@ void __init swiotlb_exit(void)
 	max_segment = 0;
 }
 
-static int is_swiotlb_buffer(phys_addr_t paddr)
-{
-	return paddr >= io_tlb_start && paddr < io_tlb_end;
-}
-
 /*
  * Bounce: copy the swiotlb buffer back to the original dma location
  */
@@ -623,221 +617,36 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 	}
 }
 
-static dma_addr_t swiotlb_bounce_page(struct device *dev, phys_addr_t *phys,
+/*
+ * Create a swiotlb mapping for the buffer at @phys, and in case of DMAing
+ * to the device copy the data into it as well.
+ */
+bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	dma_addr_t dma_addr;
+	trace_swiotlb_bounced(dev, *dma_addr, size, swiotlb_force);
 
 	if (unlikely(swiotlb_force == SWIOTLB_NO_FORCE)) {
 		dev_warn_ratelimited(dev,
 			"Cannot do DMA to address %pa\n", phys);
-		return DMA_MAPPING_ERROR;
+		return false;
 	}
 
 	/* Oh well, have to allocate and map a bounce buffer. */
 	*phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
 			*phys, size, dir, attrs);
 	if (*phys == DMA_MAPPING_ERROR)
-		return DMA_MAPPING_ERROR;
+		return false;
 
 	/* Ensure that the address returned is DMA'ble */
-	dma_addr = __phys_to_dma(dev, *phys);
-	if (unlikely(!dma_capable(dev, dma_addr, size))) {
+	*dma_addr = __phys_to_dma(dev, *phys);
+	if (unlikely(!dma_capable(dev, *dma_addr, size))) {
 		swiotlb_tbl_unmap_single(dev, *phys, size, dir,
 			attrs | DMA_ATTR_SKIP_CPU_SYNC);
-		return DMA_MAPPING_ERROR;
-	}
-
-	return dma_addr;
-}
-
-/*
- * Map a single buffer of the indicated size for DMA in streaming mode.  The
- * physical address to use is returned.
- *
- * Once the device is given the dma address, the device owns this memory until
- * either swiotlb_unmap_page or swiotlb_dma_sync_single is performed.
- */
-dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
-			    unsigned long offset, size_t size,
-			    enum dma_data_direction dir,
-			    unsigned long attrs)
-{
-	phys_addr_t phys = page_to_phys(page) + offset;
-	dma_addr_t dev_addr = phys_to_dma(dev, phys);
-
-	BUG_ON(dir == DMA_NONE);
-	/*
-	 * If the address happens to be in the device's DMA window,
-	 * we can safely return the device addr and not worry about bounce
-	 * buffering it.
-	 */
-	if (!dma_capable(dev, dev_addr, size) ||
-	    swiotlb_force == SWIOTLB_FORCE) {
-		trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
-		dev_addr = swiotlb_bounce_page(dev, &phys, size, dir, attrs);
+		return false;
 	}
 
-	if (!dev_is_dma_coherent(dev) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0 &&
-	    dev_addr != DMA_MAPPING_ERROR)
-		arch_sync_dma_for_device(dev, phys, size, dir);
-
-	return dev_addr;
-}
-
-/*
- * Unmap a single streaming mode DMA translation.  The dma_addr and size must
- * match what was provided for in a previous swiotlb_map_page call.  All
- * other usages are undefined.
- *
- * After this call, reads by the cpu to the buffer are guaranteed to see
- * whatever the device wrote there.
- */
-void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
-			size_t size, enum dma_data_direction dir,
-			unsigned long attrs)
-{
-	phys_addr_t paddr = dma_to_phys(hwdev, dev_addr);
-
-	BUG_ON(dir == DMA_NONE);
-
-	if (!dev_is_dma_coherent(hwdev) &&
-	    (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-		arch_sync_dma_for_cpu(hwdev, paddr, size, dir);
-
-	if (is_swiotlb_buffer(paddr))
-		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
-}
-
-/*
- * Make physical memory consistent for a single streaming mode DMA translation
- * after a transfer.
- *
- * If you perform a swiotlb_map_page() but wish to interrogate the buffer
- * using the cpu, yet do not wish to teardown the dma mapping, you must
- * call this function before doing so.  At the next point you give the dma
- * address back to the card, you must first perform a
- * swiotlb_dma_sync_for_device, and then the device again owns the buffer
- */
-static void
-swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
-		    size_t size, enum dma_data_direction dir,
-		    enum dma_sync_target target)
-{
-	phys_addr_t paddr = dma_to_phys(hwdev, dev_addr);
-
-	BUG_ON(dir == DMA_NONE);
-
-	if (!dev_is_dma_coherent(hwdev) && target == SYNC_FOR_CPU)
-		arch_sync_dma_for_cpu(hwdev, paddr, size, dir);
-
-	if (is_swiotlb_buffer(paddr))
-		swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
-
-	if (!dev_is_dma_coherent(hwdev) && target == SYNC_FOR_DEVICE)
-		arch_sync_dma_for_device(hwdev, paddr, size, dir);
-}
-
-void
-swiotlb_sync_single_for_cpu(struct device *hwdev, dma_addr_t dev_addr,
-			    size_t size, enum dma_data_direction dir)
-{
-	swiotlb_sync_single(hwdev, dev_addr, size, dir, SYNC_FOR_CPU);
-}
-
-void
-swiotlb_sync_single_for_device(struct device *hwdev, dma_addr_t dev_addr,
-			       size_t size, enum dma_data_direction dir)
-{
-	swiotlb_sync_single(hwdev, dev_addr, size, dir, SYNC_FOR_DEVICE);
-}
-
-/*
- * Map a set of buffers described by scatterlist in streaming mode for DMA.
- * This is the scatter-gather version of the above swiotlb_map_page
- * interface.  Here the scatter gather list elements are each tagged with the
- * appropriate dma address and length.  They are obtained via
- * sg_dma_{address,length}(SG).
- *
- * Device ownership issues as mentioned above for swiotlb_map_page are the
- * same here.
- */
-int
-swiotlb_map_sg_attrs(struct device *dev, struct scatterlist *sgl, int nelems,
-		     enum dma_data_direction dir, unsigned long attrs)
-{
-	struct scatterlist *sg;
-	int i;
-
-	for_each_sg(sgl, sg, nelems, i) {
-		sg->dma_address = swiotlb_map_page(dev, sg_page(sg), sg->offset,
-				sg->length, dir, attrs);
-		if (sg->dma_address == DMA_MAPPING_ERROR)
-			goto out_error;
-		sg_dma_len(sg) = sg->length;
-	}
-
-	return nelems;
-
-out_error:
-	swiotlb_unmap_sg_attrs(dev, sgl, i, dir,
-			attrs | DMA_ATTR_SKIP_CPU_SYNC);
-	sg_dma_len(sgl) = 0;
-	return 0;
-}
-
-/*
- * Unmap a set of streaming mode DMA translations.  Again, cpu read rules
- * concerning calls here are the same as for swiotlb_unmap_page() above.
- */
-void
-swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
-		       int nelems, enum dma_data_direction dir,
-		       unsigned long attrs)
-{
-	struct scatterlist *sg;
-	int i;
-
-	BUG_ON(dir == DMA_NONE);
-
-	for_each_sg(sgl, sg, nelems, i)
-		swiotlb_unmap_page(hwdev, sg->dma_address, sg_dma_len(sg), dir,
-			     attrs);
-}
-
-/*
- * Make physical memory consistent for a set of streaming mode DMA translations
- * after a transfer.
- *
- * The same as swiotlb_sync_single_* but for a scatter-gather list, same rules
- * and usage.
- */
-static void
-swiotlb_sync_sg(struct device *hwdev, struct scatterlist *sgl,
-		int nelems, enum dma_data_direction dir,
-		enum dma_sync_target target)
-{
-	struct scatterlist *sg;
-	int i;
-
-	for_each_sg(sgl, sg, nelems, i)
-		swiotlb_sync_single(hwdev, sg->dma_address,
-				    sg_dma_len(sg), dir, target);
-}
-
-void
-swiotlb_sync_sg_for_cpu(struct device *hwdev, struct scatterlist *sg,
-			int nelems, enum dma_data_direction dir)
-{
-	swiotlb_sync_sg(hwdev, sg, nelems, dir, SYNC_FOR_CPU);
-}
-
-void
-swiotlb_sync_sg_for_device(struct device *hwdev, struct scatterlist *sg,
-			   int nelems, enum dma_data_direction dir)
-{
-	swiotlb_sync_sg(hwdev, sg, nelems, dir, SYNC_FOR_DEVICE);
+	return true;
 }
 
 /*
@@ -851,18 +660,3 @@ swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
 	return __phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
 }
-
-const struct dma_map_ops swiotlb_dma_ops = {
-	.alloc			= dma_direct_alloc,
-	.free			= dma_direct_free,
-	.sync_single_for_cpu	= swiotlb_sync_single_for_cpu,
-	.sync_single_for_device	= swiotlb_sync_single_for_device,
-	.sync_sg_for_cpu	= swiotlb_sync_sg_for_cpu,
-	.sync_sg_for_device	= swiotlb_sync_sg_for_device,
-	.map_sg			= swiotlb_map_sg_attrs,
-	.unmap_sg		= swiotlb_unmap_sg_attrs,
-	.map_page		= swiotlb_map_page,
-	.unmap_page		= swiotlb_unmap_page,
-	.dma_supported		= dma_direct_supported,
-};
-EXPORT_SYMBOL(swiotlb_dma_ops);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 06/15] dma-mapping: simplify the dma_sync_single_range_for_{cpu,device} implementation
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 05/15] dma-direct: merge swiotlb_dma_ops into the dma_direct code Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 07/15] dma-mapping: merge dma_unmap_page_attrs and dma_unmap_single_attrs Christoph Hellwig
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

We can just call the regular calls after adding offset the the address instead
of reimplementing them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/dma-debug.h   | 27 ------------------------
 include/linux/dma-mapping.h | 34 +++++++++---------------------
 kernel/dma/debug.c          | 42 -------------------------------------
 3 files changed, 10 insertions(+), 93 deletions(-)

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 30213adbb6b9..c85e097a984c 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -72,17 +72,6 @@ extern void debug_dma_sync_single_for_device(struct device *dev,
 					     dma_addr_t dma_handle,
 					     size_t size, int direction);
 
-extern void debug_dma_sync_single_range_for_cpu(struct device *dev,
-						dma_addr_t dma_handle,
-						unsigned long offset,
-						size_t size,
-						int direction);
-
-extern void debug_dma_sync_single_range_for_device(struct device *dev,
-						   dma_addr_t dma_handle,
-						   unsigned long offset,
-						   size_t size, int direction);
-
 extern void debug_dma_sync_sg_for_cpu(struct device *dev,
 				      struct scatterlist *sg,
 				      int nelems, int direction);
@@ -174,22 +163,6 @@ static inline void debug_dma_sync_single_for_device(struct device *dev,
 {
 }
 
-static inline void debug_dma_sync_single_range_for_cpu(struct device *dev,
-						       dma_addr_t dma_handle,
-						       unsigned long offset,
-						       size_t size,
-						       int direction)
-{
-}
-
-static inline void debug_dma_sync_single_range_for_device(struct device *dev,
-							  dma_addr_t dma_handle,
-							  unsigned long offset,
-							  size_t size,
-							  int direction)
-{
-}
-
 static inline void debug_dma_sync_sg_for_cpu(struct device *dev,
 					     struct scatterlist *sg,
 					     int nelems, int direction)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 7799c2b27849..8916499d2805 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -360,6 +360,13 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
 }
 
+static inline void dma_sync_single_range_for_cpu(struct device *dev,
+		dma_addr_t addr, unsigned long offset, size_t size,
+		enum dma_data_direction dir)
+{
+	return dma_sync_single_for_cpu(dev, addr + offset, size, dir);
+}
+
 static inline void dma_sync_single_for_device(struct device *dev,
 					      dma_addr_t addr, size_t size,
 					      enum dma_data_direction dir)
@@ -372,32 +379,11 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	debug_dma_sync_single_for_device(dev, addr, size, dir);
 }
 
-static inline void dma_sync_single_range_for_cpu(struct device *dev,
-						 dma_addr_t addr,
-						 unsigned long offset,
-						 size_t size,
-						 enum dma_data_direction dir)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_single_for_cpu)
-		ops->sync_single_for_cpu(dev, addr + offset, size, dir);
-	debug_dma_sync_single_range_for_cpu(dev, addr, offset, size, dir);
-}
-
 static inline void dma_sync_single_range_for_device(struct device *dev,
-						    dma_addr_t addr,
-						    unsigned long offset,
-						    size_t size,
-						    enum dma_data_direction dir)
+		dma_addr_t addr, unsigned long offset, size_t size,
+		enum dma_data_direction dir)
 {
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_single_for_device)
-		ops->sync_single_for_device(dev, addr + offset, size, dir);
-	debug_dma_sync_single_range_for_device(dev, addr, offset, size, dir);
+	return dma_sync_single_for_device(dev, addr + offset, size, dir);
 }
 
 static inline void
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 231ca4628062..3214833b47e2 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -1662,48 +1662,6 @@ void debug_dma_sync_single_for_device(struct device *dev,
 }
 EXPORT_SYMBOL(debug_dma_sync_single_for_device);
 
-void debug_dma_sync_single_range_for_cpu(struct device *dev,
-					 dma_addr_t dma_handle,
-					 unsigned long offset, size_t size,
-					 int direction)
-{
-	struct dma_debug_entry ref;
-
-	if (unlikely(dma_debug_disabled()))
-		return;
-
-	ref.type         = dma_debug_single;
-	ref.dev          = dev;
-	ref.dev_addr     = dma_handle;
-	ref.size         = offset + size;
-	ref.direction    = direction;
-	ref.sg_call_ents = 0;
-
-	check_sync(dev, &ref, true);
-}
-EXPORT_SYMBOL(debug_dma_sync_single_range_for_cpu);
-
-void debug_dma_sync_single_range_for_device(struct device *dev,
-					    dma_addr_t dma_handle,
-					    unsigned long offset,
-					    size_t size, int direction)
-{
-	struct dma_debug_entry ref;
-
-	if (unlikely(dma_debug_disabled()))
-		return;
-
-	ref.type         = dma_debug_single;
-	ref.dev          = dev;
-	ref.dev_addr     = dma_handle;
-	ref.size         = offset + size;
-	ref.direction    = direction;
-	ref.sg_call_ents = 0;
-
-	check_sync(dev, &ref, false);
-}
-EXPORT_SYMBOL(debug_dma_sync_single_range_for_device);
-
 void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 			       int nelems, int direction)
 {
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 07/15] dma-mapping: merge dma_unmap_page_attrs and dma_unmap_single_attrs
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 06/15] dma-mapping: simplify the dma_sync_single_range_for_{cpu,device} implementation Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 08/15] dma-mapping: move dma_get_required_mask to kernel/dma Christoph Hellwig
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

The two functions are exactly the same, so don't bother implementing
them twice.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/dma-mapping.h | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 8916499d2805..3b431cc58794 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -253,6 +253,12 @@ static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
 	debug_dma_unmap_page(dev, addr, size, dir, true);
 }
 
+static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	return dma_unmap_single_attrs(dev, addr, size, dir, attrs);
+}
+
 /*
  * dma_maps_sg_attrs returns 0 on error and > 0 on success.
  * It should never return a value < 0.
@@ -300,19 +306,6 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	return addr;
 }
 
-static inline void dma_unmap_page_attrs(struct device *dev,
-					dma_addr_t addr, size_t size,
-					enum dma_data_direction dir,
-					unsigned long attrs)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!valid_dma_direction(dir));
-	if (ops->unmap_page)
-		ops->unmap_page(dev, addr, size, dir, attrs);
-	debug_dma_unmap_page(dev, addr, size, dir, false);
-}
-
 static inline dma_addr_t dma_map_resource(struct device *dev,
 					  phys_addr_t phys_addr,
 					  size_t size,
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 08/15] dma-mapping: move dma_get_required_mask to kernel/dma
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 07/15] dma-mapping: merge dma_unmap_page_attrs and dma_unmap_single_attrs Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 09/15] dma-mapping: move various slow path functions out of line Christoph Hellwig
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

dma_get_required_mask should really be with the rest of the DMA mapping
implementation instead of in drivers/base as a lone outlier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/base/platform.c | 31 -------------------------------
 kernel/dma/mapping.c    | 34 +++++++++++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 41b91af95afb..eae841935a45 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1179,37 +1179,6 @@ int __init platform_bus_init(void)
 	return error;
 }
 
-#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
-static u64 dma_default_get_required_mask(struct device *dev)
-{
-	u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
-	u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
-	u64 mask;
-
-	if (!high_totalram) {
-		/* convert to mask just covering totalram */
-		low_totalram = (1 << (fls(low_totalram) - 1));
-		low_totalram += low_totalram - 1;
-		mask = low_totalram;
-	} else {
-		high_totalram = (1 << (fls(high_totalram) - 1));
-		high_totalram += high_totalram - 1;
-		mask = (((u64)high_totalram) << 32) + 0xffffffff;
-	}
-	return mask;
-}
-
-u64 dma_get_required_mask(struct device *dev)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	if (ops->get_required_mask)
-		return ops->get_required_mask(dev);
-	return dma_default_get_required_mask(dev);
-}
-EXPORT_SYMBOL_GPL(dma_get_required_mask);
-#endif
-
 static __initdata LIST_HEAD(early_platform_driver_list);
 static __initdata LIST_HEAD(early_platform_device_list);
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index dfbc3deb95cd..dfe29d18dba1 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -5,7 +5,7 @@
  * Copyright (c) 2006  SUSE Linux Products GmbH
  * Copyright (c) 2006  Tejun Heo <teheo@suse.de>
  */
-
+#include <linux/memblock.h> /* for max_pfn */
 #include <linux/acpi.h>
 #include <linux/dma-noncoherent.h>
 #include <linux/export.h>
@@ -262,3 +262,35 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
 #endif /* !CONFIG_ARCH_NO_COHERENT_DMA_MMAP */
 }
 EXPORT_SYMBOL(dma_common_mmap);
+
+#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
+static u64 dma_default_get_required_mask(struct device *dev)
+{
+	u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
+	u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
+	u64 mask;
+
+	if (!high_totalram) {
+		/* convert to mask just covering totalram */
+		low_totalram = (1 << (fls(low_totalram) - 1));
+		low_totalram += low_totalram - 1;
+		mask = low_totalram;
+	} else {
+		high_totalram = (1 << (fls(high_totalram) - 1));
+		high_totalram += high_totalram - 1;
+		mask = (((u64)high_totalram) << 32) + 0xffffffff;
+	}
+	return mask;
+}
+
+u64 dma_get_required_mask(struct device *dev)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	if (ops->get_required_mask)
+		return ops->get_required_mask(dev);
+	return dma_default_get_required_mask(dev);
+}
+EXPORT_SYMBOL_GPL(dma_get_required_mask);
+#endif
+
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 09/15] dma-mapping: move various slow path functions out of line
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 08/15] dma-mapping: move dma_get_required_mask to kernel/dma Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 10/15] dma-mapping: move dma_cache_sync " Christoph Hellwig
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

There is no need to have all setup and coherent allocation / freeing
routines inline.  Move them out of line to keep the implemeation
nicely encapsulated and save some kernel text size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/include/asm/dma-mapping.h |   1 -
 include/linux/dma-mapping.h            | 150 +++----------------------
 kernel/dma/mapping.c                   | 140 ++++++++++++++++++++++-
 3 files changed, 151 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 8fa394520af6..5201f2b7838c 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -108,7 +108,6 @@ static inline void set_dma_offset(struct device *dev, dma_addr_t off)
 }
 
 #define HAVE_ARCH_DMA_SET_MASK 1
-extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
 extern u64 __dma_get_required_mask(struct device *dev);
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 3b431cc58794..0bbce52606c2 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -440,107 +440,24 @@ bool dma_in_atomic_pool(void *start, size_t size);
 void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags);
 bool dma_free_from_pool(void *start, size_t size);
 
-/**
- * dma_mmap_attrs - map a coherent DMA allocation into user space
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @vma: vm_area_struct describing requested user mapping
- * @cpu_addr: kernel CPU-view address returned from dma_alloc_attrs
- * @handle: device-view address returned from dma_alloc_attrs
- * @size: size of memory originally requested in dma_alloc_attrs
- * @attrs: attributes of mapping properties requested in dma_alloc_attrs
- *
- * Map a coherent DMA buffer previously allocated by dma_alloc_attrs
- * into user space.  The coherent DMA buffer must not be freed by the
- * driver until the user space mapping has been released.
- */
-static inline int
-dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void *cpu_addr,
-	       dma_addr_t dma_addr, size_t size, unsigned long attrs)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-	BUG_ON(!ops);
-	if (ops->mmap)
-		return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
-	return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
-}
-
+int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+		void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		unsigned long attrs);
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
 int
 dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr,
 		dma_addr_t dma_addr, size_t size, unsigned long attrs);
 
-static inline int
-dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void *cpu_addr,
-		      dma_addr_t dma_addr, size_t size,
-		      unsigned long attrs)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-	BUG_ON(!ops);
-	if (ops->get_sgtable)
-		return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
-					attrs);
-	return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
-			attrs);
-}
-
+int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
+		void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		unsigned long attrs);
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
 
-#ifndef arch_dma_alloc_attrs
-#define arch_dma_alloc_attrs(dev)	(true)
-#endif
-
-static inline void *dma_alloc_attrs(struct device *dev, size_t size,
-				       dma_addr_t *dma_handle, gfp_t flag,
-				       unsigned long attrs)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-	void *cpu_addr;
-
-	BUG_ON(!ops);
-	WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
-
-	if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr))
-		return cpu_addr;
-
-	/* let the implementation decide on the zone to allocate from: */
-	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
-
-	if (!arch_dma_alloc_attrs(&dev))
-		return NULL;
-	if (!ops->alloc)
-		return NULL;
-
-	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
-	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
-	return cpu_addr;
-}
-
-static inline void dma_free_attrs(struct device *dev, size_t size,
-				     void *cpu_addr, dma_addr_t dma_handle,
-				     unsigned long attrs)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!ops);
-
-	if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr))
-		return;
-	/*
-	 * On non-coherent platforms which implement DMA-coherent buffers via
-	 * non-cacheable remaps, ops->free() may call vunmap(). Thus getting
-	 * this far in IRQ context is a) at risk of a BUG_ON() or trying to
-	 * sleep on some machines, and b) an indication that the driver is
-	 * probably misusing the coherent API anyway.
-	 */
-	WARN_ON(irqs_disabled());
-
-	if (!ops->free || !cpu_addr)
-		return;
-
-	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
-	ops->free(dev, size, cpu_addr, dma_handle, attrs);
-}
+void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		gfp_t flag, unsigned long attrs);
+void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+		dma_addr_t dma_handle, unsigned long attrs);
 
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp)
@@ -565,35 +482,9 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 	return 0;
 }
 
-static inline void dma_check_mask(struct device *dev, u64 mask)
-{
-	if (sme_active() && (mask < (((u64)sme_get_me_mask() << 1) - 1)))
-		dev_warn(dev, "SME is active, device will require DMA bounce buffers\n");
-}
-
-static inline int dma_supported(struct device *dev, u64 mask)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	if (!ops)
-		return 0;
-	if (!ops->dma_supported)
-		return 1;
-	return ops->dma_supported(dev, mask);
-}
-
-#ifndef HAVE_ARCH_DMA_SET_MASK
-static inline int dma_set_mask(struct device *dev, u64 mask)
-{
-	if (!dev->dma_mask || !dma_supported(dev, mask))
-		return -EIO;
-
-	dma_check_mask(dev, mask);
-
-	*dev->dma_mask = mask;
-	return 0;
-}
-#endif
+int dma_supported(struct device *dev, u64 mask);
+int dma_set_mask(struct device *dev, u64 mask);
+int dma_set_coherent_mask(struct device *dev, u64 mask);
 
 static inline u64 dma_get_mask(struct device *dev)
 {
@@ -602,21 +493,6 @@ static inline u64 dma_get_mask(struct device *dev)
 	return DMA_BIT_MASK(32);
 }
 
-#ifdef CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK
-int dma_set_coherent_mask(struct device *dev, u64 mask);
-#else
-static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
-{
-	if (!dma_supported(dev, mask))
-		return -EIO;
-
-	dma_check_mask(dev, mask);
-
-	dev->coherent_dma_mask = mask;
-	return 0;
-}
-#endif
-
 /*
  * Set both the DMA mask and the coherent DMA mask to the same thing.
  * Note that we don't check the return value from dma_set_coherent_mask()
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index dfe29d18dba1..176ae3e08916 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -223,7 +223,20 @@ int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
 		sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
 	return ret;
 }
-EXPORT_SYMBOL(dma_common_get_sgtable);
+
+int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
+		void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		unsigned long attrs)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	BUG_ON(!ops);
+	if (ops->get_sgtable)
+		return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
+					attrs);
+	return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
+			attrs);
+}
+EXPORT_SYMBOL(dma_get_sgtable_attrs);
 
 /*
  * Create userspace mapping for the DMA-coherent memory.
@@ -261,7 +274,31 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
 	return -ENXIO;
 #endif /* !CONFIG_ARCH_NO_COHERENT_DMA_MMAP */
 }
-EXPORT_SYMBOL(dma_common_mmap);
+
+/**
+ * dma_mmap_attrs - map a coherent DMA allocation into user space
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @vma: vm_area_struct describing requested user mapping
+ * @cpu_addr: kernel CPU-view address returned from dma_alloc_attrs
+ * @dma_addr: device-view address returned from dma_alloc_attrs
+ * @size: size of memory originally requested in dma_alloc_attrs
+ * @attrs: attributes of mapping properties requested in dma_alloc_attrs
+ *
+ * Map a coherent DMA buffer previously allocated by dma_alloc_attrs into user
+ * space.  The coherent DMA buffer must not be freed by the driver until the
+ * user space mapping has been released.
+ */
+int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+		void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		unsigned long attrs)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	BUG_ON(!ops);
+	if (ops->mmap)
+		return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
+	return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
+}
+EXPORT_SYMBOL(dma_mmap_attrs);
 
 #ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
 static u64 dma_default_get_required_mask(struct device *dev)
@@ -294,3 +331,102 @@ u64 dma_get_required_mask(struct device *dev)
 EXPORT_SYMBOL_GPL(dma_get_required_mask);
 #endif
 
+#ifndef arch_dma_alloc_attrs
+#define arch_dma_alloc_attrs(dev)	(true)
+#endif
+
+void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		gfp_t flag, unsigned long attrs)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+	void *cpu_addr;
+
+	BUG_ON(!ops);
+	WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
+
+	if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr))
+		return cpu_addr;
+
+	/* let the implementation decide on the zone to allocate from: */
+	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
+
+	if (!arch_dma_alloc_attrs(&dev))
+		return NULL;
+	if (!ops->alloc)
+		return NULL;
+
+	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
+	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
+	return cpu_addr;
+}
+EXPORT_SYMBOL(dma_alloc_attrs);
+
+void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+		dma_addr_t dma_handle, unsigned long attrs)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	BUG_ON(!ops);
+
+	if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr))
+		return;
+	/*
+	 * On non-coherent platforms which implement DMA-coherent buffers via
+	 * non-cacheable remaps, ops->free() may call vunmap(). Thus getting
+	 * this far in IRQ context is a) at risk of a BUG_ON() or trying to
+	 * sleep on some machines, and b) an indication that the driver is
+	 * probably misusing the coherent API anyway.
+	 */
+	WARN_ON(irqs_disabled());
+
+	if (!ops->free || !cpu_addr)
+		return;
+
+	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
+	ops->free(dev, size, cpu_addr, dma_handle, attrs);
+}
+EXPORT_SYMBOL(dma_free_attrs);
+
+static inline void dma_check_mask(struct device *dev, u64 mask)
+{
+	if (sme_active() && (mask < (((u64)sme_get_me_mask() << 1) - 1)))
+		dev_warn(dev, "SME is active, device will require DMA bounce buffers\n");
+}
+
+int dma_supported(struct device *dev, u64 mask)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	if (!ops)
+		return 0;
+	if (!ops->dma_supported)
+		return 1;
+	return ops->dma_supported(dev, mask);
+}
+EXPORT_SYMBOL(dma_supported);
+
+#ifndef HAVE_ARCH_DMA_SET_MASK
+int dma_set_mask(struct device *dev, u64 mask)
+{
+	if (!dev->dma_mask || !dma_supported(dev, mask))
+		return -EIO;
+
+	dma_check_mask(dev, mask);
+	*dev->dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_mask);
+#endif
+
+#ifndef CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+	if (!dma_supported(dev, mask))
+		return -EIO;
+
+	dma_check_mask(dev, mask);
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+#endif
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 10/15] dma-mapping: move dma_cache_sync out of line
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 09/15] dma-mapping: move various slow path functions out of line Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 11/15] dma-mapping: always build the direct mapping code Christoph Hellwig
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

This isn't exactly a slow path routine, but it is not super critical
either, and moving it out of line will help to keep the include chain
clean for the following DMA indirection bypass work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/dma-mapping.h | 12 ++----------
 kernel/dma/mapping.c        | 11 +++++++++++
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0bbce52606c2..0f0078490df4 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -411,16 +411,8 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
 #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
 
-static inline void
-dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-		enum dma_data_direction dir)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev);
-
-	BUG_ON(!valid_dma_direction(dir));
-	if (ops->cache_sync)
-		ops->cache_sync(dev, vaddr, size, dir);
-}
+void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+		enum dma_data_direction dir);
 
 extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 176ae3e08916..0b18cfbdde95 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -430,3 +430,14 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_set_coherent_mask);
 #endif
+
+void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+		enum dma_data_direction dir)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	BUG_ON(!valid_dma_direction(dir));
+	if (ops->cache_sync)
+		ops->cache_sync(dev, vaddr, size, dir);
+}
+EXPORT_SYMBOL(dma_cache_sync);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 11/15] dma-mapping: always build the direct mapping code
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 10/15] dma-mapping: move dma_cache_sync " Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 12/15] dma-mapping: factor out dummy DMA ops Christoph Hellwig
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

All architectures except for sparc64 use the dma-direct code in some
form, and even for sparc64 we had the discussion of a direct mapping
mode a while ago.  In preparation for directly calling the direct
mapping code don't bother having it optionally but always build the
code in.  This is a minor hardship for some powerpc and arm configs
that don't pull it in yet (although they should in a relase ot two),
and sparc64 which currently doesn't need it at all, but it will
reduce the ifdef mess we'd otherwise need significantly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/alpha/Kconfig      | 1 -
 arch/arc/Kconfig        | 1 -
 arch/arm/Kconfig        | 1 -
 arch/arm64/Kconfig      | 1 -
 arch/c6x/Kconfig        | 1 -
 arch/csky/Kconfig       | 1 -
 arch/h8300/Kconfig      | 1 -
 arch/hexagon/Kconfig    | 1 -
 arch/m68k/Kconfig       | 1 -
 arch/microblaze/Kconfig | 1 -
 arch/mips/Kconfig       | 1 -
 arch/nds32/Kconfig      | 1 -
 arch/nios2/Kconfig      | 1 -
 arch/openrisc/Kconfig   | 1 -
 arch/parisc/Kconfig     | 1 -
 arch/riscv/Kconfig      | 1 -
 arch/s390/Kconfig       | 1 -
 arch/sh/Kconfig         | 1 -
 arch/sparc/Kconfig      | 1 -
 arch/unicore32/Kconfig  | 1 -
 arch/x86/Kconfig        | 1 -
 arch/xtensa/Kconfig     | 1 -
 kernel/dma/Kconfig      | 7 -------
 kernel/dma/Makefile     | 3 +--
 24 files changed, 1 insertion(+), 31 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index a7e748a46c18..5da6ff54b3e7 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -203,7 +203,6 @@ config ALPHA_EIGER
 config ALPHA_JENSEN
 	bool "Jensen"
 	depends on BROKEN
-	select DMA_DIRECT_OPS
 	help
 	  DEC PC 150 AXP (aka Jensen): This is a very old Digital system - one
 	  of the first-generation Alpha systems. A number of these systems
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index fd48d698da29..7deaabeb531a 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -17,7 +17,6 @@ config ARC
 	select BUILDTIME_EXTABLE_SORT
 	select CLONE_BACKWARDS
 	select COMMON_CLK
-	select DMA_DIRECT_OPS
 	select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
 	select GENERIC_CLOCKEVENTS
 	select GENERIC_FIND_FIRST_BIT
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a858ee791ef0..586fc30b23bd 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -30,7 +30,6 @@ config ARM
 	select CLONE_BACKWARDS
 	select CPU_PM if (SUSPEND || CPU_IDLE)
 	select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
-	select DMA_DIRECT_OPS if !MMU
 	select DMA_REMAP if MMU
 	select EDAC_SUPPORT
 	select EDAC_ATOMIC_SCRUB
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 06cf0ef24367..2092080240b0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -80,7 +80,6 @@ config ARM64
 	select CPU_PM if (SUSPEND || CPU_IDLE)
 	select CRC32
 	select DCACHE_WORD_ACCESS
-	select DMA_DIRECT_OPS
 	select DMA_DIRECT_REMAP
 	select EDAC_SUPPORT
 	select FRAME_POINTER
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index 84420109113d..456e154674d1 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -9,7 +9,6 @@ config C6X
 	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
 	select CLKDEV_LOOKUP
-	select DMA_DIRECT_OPS
 	select GENERIC_ATOMIC64
 	select GENERIC_IRQ_SHOW
 	select HAVE_ARCH_TRACEHOOK
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index ea74f3a9eeaf..37bed8aadf95 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -7,7 +7,6 @@ config CSKY
 	select COMMON_CLK
 	select CLKSRC_MMIO
 	select CLKSRC_OF
-	select DMA_DIRECT_OPS
 	select DMA_DIRECT_REMAP
 	select IRQ_DOMAIN
 	select HANDLE_DOMAIN_IRQ
diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index d19c6b16cd5d..6472a0685470 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -22,7 +22,6 @@ config H8300
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_HASH
 	select CPU_NO_EFFICIENT_FFS
-	select DMA_DIRECT_OPS
 
 config CPU_BIG_ENDIAN
 	def_bool y
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 2b688af379e6..d71036c598de 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -31,7 +31,6 @@ config HEXAGON
 	select GENERIC_CLOCKEVENTS_BROADCAST
 	select MODULES_USE_ELF_RELA
 	select GENERIC_CPU_DEVICES
-	select DMA_DIRECT_OPS
 	---help---
 	  Qualcomm Hexagon is a processor architecture designed for high
 	  performance and low power across a wide variety of applications.
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 1bc9f1ba759a..8a5868e9a3a0 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -26,7 +26,6 @@ config M68K
 	select MODULES_USE_ELF_RELA
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
-	select DMA_DIRECT_OPS if HAS_DMA
 	select ARCH_DISCARD_MEMBLOCK
 
 config CPU_BIG_ENDIAN
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index effed2efd306..eda9e2315ef5 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -12,7 +12,6 @@ config MICROBLAZE
 	select TIMER_OF
 	select CLONE_BACKWARDS3
 	select COMMON_CLK
-	select DMA_DIRECT_OPS
 	select GENERIC_ATOMIC64
 	select GENERIC_CLOCKEVENTS
 	select GENERIC_CPU_DEVICES
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 8272ea4c7264..2993aa9842c0 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -16,7 +16,6 @@ config MIPS
 	select BUILDTIME_EXTABLE_SORT
 	select CLONE_BACKWARDS
 	select CPU_PM if CPU_IDLE
-	select DMA_DIRECT_OPS
 	select GENERIC_ATOMIC64 if !64BIT
 	select GENERIC_CLOCKEVENTS
 	select GENERIC_CMOS_UPDATE
diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 7a04adacb2f0..1af6bbae7220 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -11,7 +11,6 @@ config NDS32
 	select CLKSRC_MMIO
 	select CLONE_BACKWARDS
 	select COMMON_CLK
-	select DMA_DIRECT_OPS
 	select GENERIC_ATOMIC64
 	select GENERIC_CPU_DEVICES
 	select GENERIC_CLOCKEVENTS
diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index 7e95506e957a..f6c4b0f49997 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -4,7 +4,6 @@ config NIOS2
 	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
 	select ARCH_NO_SWAP
-	select DMA_DIRECT_OPS
 	select TIMER_OF
 	select GENERIC_ATOMIC64
 	select GENERIC_CLOCKEVENTS
diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 285f7d05c8ed..d0feebad5a8f 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -7,7 +7,6 @@
 config OPENRISC
 	def_bool y
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
-	select DMA_DIRECT_OPS
 	select OF
 	select OF_EARLY_FLATTREE
 	select IRQ_DOMAIN
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 428ee50fc3db..6e1b71da0e71 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -185,7 +185,6 @@ config PA11
 	depends on PA7000 || PA7100LC || PA7200 || PA7300LC
 	select ARCH_HAS_SYNC_DMA_FOR_CPU
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
-	select DMA_DIRECT_OPS
 	select DMA_NONCOHERENT_CACHE_SYNC
 
 config PREFETCH
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 55da93f4e818..51d89c4b1dca 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -19,7 +19,6 @@ config RISCV
 	select ARCH_WANT_FRAME_POINTERS
 	select CLONE_BACKWARDS
 	select COMMON_CLK
-	select DMA_DIRECT_OPS
 	select GENERIC_CLOCKEVENTS
 	select GENERIC_CPU_DEVICES
 	select GENERIC_IRQ_SHOW
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 5624e8607054..21d271d04ca6 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -139,7 +139,6 @@ config S390
 	select HAVE_COPY_THREAD_TLS
 	select HAVE_DEBUG_KMEMLEAK
 	select HAVE_DMA_CONTIGUOUS
-	select DMA_DIRECT_OPS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index f82a4da7adf3..10fd4e9c454b 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -7,7 +7,6 @@ config SUPERH
 	select ARCH_NO_COHERENT_DMA_MMAP if !MMU
 	select HAVE_PATA_PLATFORM
 	select CLKDEV_LOOKUP
-	select DMA_DIRECT_OPS
 	select HAVE_IDE if HAS_IOPORT_MAP
 	select HAVE_MEMBLOCK_NODE_MAP
 	select ARCH_DISCARD_MEMBLOCK
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 8853b6ceae17..f5bb9ded1d18 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -48,7 +48,6 @@ config SPARC
 config SPARC32
 	def_bool !64BIT
 	select ARCH_HAS_SYNC_DMA_FOR_CPU
-	select DMA_DIRECT_OPS
 	select GENERIC_ATOMIC64
 	select CLZ_TAB
 	select HAVE_UID16
diff --git a/arch/unicore32/Kconfig b/arch/unicore32/Kconfig
index a4c05159dca5..2681027d7bff 100644
--- a/arch/unicore32/Kconfig
+++ b/arch/unicore32/Kconfig
@@ -4,7 +4,6 @@ config UNICORE32
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
-	select DMA_DIRECT_OPS
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_BZIP2
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index adc845b66f01..c14d4a35be13 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -89,7 +89,6 @@ config X86
 	select CLOCKSOURCE_VALIDATE_LAST_CYCLE
 	select CLOCKSOURCE_WATCHDOG
 	select DCACHE_WORD_ACCESS
-	select DMA_DIRECT_OPS
 	select EDAC_ATOMIC_SCRUB
 	select EDAC_SUPPORT
 	select GENERIC_CLOCKEVENTS
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 75488b606edc..36338e7564a3 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -9,7 +9,6 @@ config XTENSA
 	select BUILDTIME_EXTABLE_SORT
 	select CLONE_BACKWARDS
 	select COMMON_CLK
-	select DMA_DIRECT_OPS
 	select DMA_REMAP if MMU
 	select GENERIC_ATOMIC64
 	select GENERIC_CLOCKEVENTS
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 41c3b1df70eb..ca88b867e7fe 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -35,13 +35,8 @@ config ARCH_HAS_DMA_COHERENT_TO_PFN
 config ARCH_HAS_DMA_MMAP_PGPROT
 	bool
 
-config DMA_DIRECT_OPS
-	bool
-	depends on HAS_DMA
-
 config DMA_NONCOHERENT_CACHE_SYNC
 	bool
-	depends on DMA_DIRECT_OPS
 
 config DMA_VIRT_OPS
 	bool
@@ -49,7 +44,6 @@ config DMA_VIRT_OPS
 
 config SWIOTLB
 	bool
-	select DMA_DIRECT_OPS
 	select NEED_DMA_MAP_STATE
 
 config DMA_REMAP
@@ -58,5 +52,4 @@ config DMA_REMAP
 
 config DMA_DIRECT_REMAP
 	bool
-	depends on DMA_DIRECT_OPS
 	select DMA_REMAP
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index f4feeceb8020..a626f643cd63 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -1,9 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-$(CONFIG_HAS_DMA)			+= mapping.o
+obj-$(CONFIG_HAS_DMA)			+= mapping.o direct.o
 obj-$(CONFIG_DMA_CMA)			+= contiguous.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += coherent.o
-obj-$(CONFIG_DMA_DIRECT_OPS)		+= direct.o
 obj-$(CONFIG_DMA_VIRT_OPS)		+= virt.o
 obj-$(CONFIG_DMA_API_DEBUG)		+= debug.o
 obj-$(CONFIG_SWIOTLB)			+= swiotlb.o
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 12/15] dma-mapping: factor out dummy DMA ops
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 11/15] dma-mapping: always build the direct mapping code Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-07 19:07 ` [PATCH 13/15] ACPI / scan: Refactor _CCA enforcement Christoph Hellwig
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

From: Robin Murphy <robin.murphy@arm.com>

The dummy DMA ops are currently used by arm64 for any device which has
an invalid ACPI description and is thus barred from using DMA due to not
knowing whether is is cache-coherent or not. Factor these out into
general dma-mapping code so that they can be referenced from other
common code paths. In the process, we can prune all the optional
callbacks which just do the same thing as the default behaviour, and
fill in .map_resource for completeness.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[hch: moved to a separate source file]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/arm64/include/asm/dma-mapping.h |  4 +-
 arch/arm64/mm/dma-mapping.c          | 86 ----------------------------
 include/linux/dma-mapping.h          |  1 +
 kernel/dma/Makefile                  |  2 +-
 kernel/dma/dummy.c                   | 39 +++++++++++++
 5 files changed, 42 insertions(+), 90 deletions(-)
 create mode 100644 kernel/dma/dummy.c

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index c41f3fb1446c..273e778f7de2 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -24,15 +24,13 @@
 #include <xen/xen.h>
 #include <asm/xen/hypervisor.h>
 
-extern const struct dma_map_ops dummy_dma_ops;
-
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
 	/*
 	 * We expect no ISA devices, and all other DMA masters are expected to
 	 * have someone call arch_setup_dma_ops at device creation time.
 	 */
-	return &dummy_dma_ops;
+	return &dma_dummy_ops;
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index e4effbb243b1..ab1e417204d0 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -89,92 +89,6 @@ static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_IOMMU_DMA */
 
-/********************************************
- * The following APIs are for dummy DMA ops *
- ********************************************/
-
-static void *__dummy_alloc(struct device *dev, size_t size,
-			   dma_addr_t *dma_handle, gfp_t flags,
-			   unsigned long attrs)
-{
-	return NULL;
-}
-
-static void __dummy_free(struct device *dev, size_t size,
-			 void *vaddr, dma_addr_t dma_handle,
-			 unsigned long attrs)
-{
-}
-
-static int __dummy_mmap(struct device *dev,
-			struct vm_area_struct *vma,
-			void *cpu_addr, dma_addr_t dma_addr, size_t size,
-			unsigned long attrs)
-{
-	return -ENXIO;
-}
-
-static dma_addr_t __dummy_map_page(struct device *dev, struct page *page,
-				   unsigned long offset, size_t size,
-				   enum dma_data_direction dir,
-				   unsigned long attrs)
-{
-	return DMA_MAPPING_ERROR;
-}
-
-static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr,
-			       size_t size, enum dma_data_direction dir,
-			       unsigned long attrs)
-{
-}
-
-static int __dummy_map_sg(struct device *dev, struct scatterlist *sgl,
-			  int nelems, enum dma_data_direction dir,
-			  unsigned long attrs)
-{
-	return 0;
-}
-
-static void __dummy_unmap_sg(struct device *dev,
-			     struct scatterlist *sgl, int nelems,
-			     enum dma_data_direction dir,
-			     unsigned long attrs)
-{
-}
-
-static void __dummy_sync_single(struct device *dev,
-				dma_addr_t dev_addr, size_t size,
-				enum dma_data_direction dir)
-{
-}
-
-static void __dummy_sync_sg(struct device *dev,
-			    struct scatterlist *sgl, int nelems,
-			    enum dma_data_direction dir)
-{
-}
-
-static int __dummy_dma_supported(struct device *hwdev, u64 mask)
-{
-	return 0;
-}
-
-const struct dma_map_ops dummy_dma_ops = {
-	.alloc                  = __dummy_alloc,
-	.free                   = __dummy_free,
-	.mmap                   = __dummy_mmap,
-	.map_page               = __dummy_map_page,
-	.unmap_page             = __dummy_unmap_page,
-	.map_sg                 = __dummy_map_sg,
-	.unmap_sg               = __dummy_unmap_sg,
-	.sync_single_for_cpu    = __dummy_sync_single,
-	.sync_single_for_device = __dummy_sync_single,
-	.sync_sg_for_cpu        = __dummy_sync_sg,
-	.sync_sg_for_device     = __dummy_sync_sg,
-	.dma_supported          = __dummy_dma_supported,
-};
-EXPORT_SYMBOL(dummy_dma_ops);
-
 static int __init arm64_dma_init(void)
 {
 	WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0f0078490df4..269ee27fc3d9 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -136,6 +136,7 @@ struct dma_map_ops {
 
 extern const struct dma_map_ops dma_direct_ops;
 extern const struct dma_map_ops dma_virt_ops;
+extern const struct dma_map_ops dma_dummy_ops;
 
 #define DMA_BIT_MASK(n)	(((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
 
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index a626f643cd63..72ff6e46aa86 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-$(CONFIG_HAS_DMA)			+= mapping.o direct.o
+obj-$(CONFIG_HAS_DMA)			+= mapping.o direct.o dummy.o
 obj-$(CONFIG_DMA_CMA)			+= contiguous.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += coherent.o
 obj-$(CONFIG_DMA_VIRT_OPS)		+= virt.o
diff --git a/kernel/dma/dummy.c b/kernel/dma/dummy.c
new file mode 100644
index 000000000000..05607642c888
--- /dev/null
+++ b/kernel/dma/dummy.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Dummy DMA ops that always fail.
+ */
+#include <linux/dma-mapping.h>
+
+static int dma_dummy_mmap(struct device *dev, struct vm_area_struct *vma,
+		void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		unsigned long attrs)
+{
+	return -ENXIO;
+}
+
+static dma_addr_t dma_dummy_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	return DMA_MAPPING_ERROR;
+}
+
+static int dma_dummy_map_sg(struct device *dev, struct scatterlist *sgl,
+		int nelems, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+	return 0;
+}
+
+static int dma_dummy_supported(struct device *hwdev, u64 mask)
+{
+	return 0;
+}
+
+const struct dma_map_ops dma_dummy_ops = {
+	.mmap                   = dma_dummy_mmap,
+	.map_page               = dma_dummy_map_page,
+	.map_sg                 = dma_dummy_map_sg,
+	.dma_supported          = dma_dummy_supported,
+};
+EXPORT_SYMBOL(dma_dummy_ops);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 13/15] ACPI / scan: Refactor _CCA enforcement
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 12/15] dma-mapping: factor out dummy DMA ops Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-14 21:15   ` Bjorn Helgaas
  2018-12-07 19:07 ` [PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls Christoph Hellwig
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

From: Robin Murphy <robin.murphy@arm.com>

Rather than checking the DMA attribute at each callsite, just pass it
through for acpi_dma_configure() to handle directly. That can then deal
with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly
installing dummy DMA ops instead of just skipping setup entirely. This
will then free up the dev->dma_ops == NULL case for some valuable
fastpath optimisations.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/acpi/scan.c      | 5 +++++
 drivers/base/platform.c  | 3 +--
 drivers/pci/pci-driver.c | 3 +--
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index bd1c59fb0e17..b75ae34ed188 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1456,6 +1456,11 @@ int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr)
 	const struct iommu_ops *iommu;
 	u64 dma_addr = 0, size = 0;
 
+	if (attr == DEV_DMA_NOT_SUPPORTED) {
+		set_dma_ops(dev, &dma_dummy_ops);
+		return 0;
+	}
+
 	iort_dma_setup(dev, &dma_addr, &size);
 
 	iommu = iort_iommu_configure(dev);
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index eae841935a45..c1ddf191711e 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1138,8 +1138,7 @@ int platform_dma_configure(struct device *dev)
 		ret = of_dma_configure(dev, dev->of_node, true);
 	} else if (has_acpi_companion(dev)) {
 		attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
-		if (attr != DEV_DMA_NOT_SUPPORTED)
-			ret = acpi_dma_configure(dev, attr);
+		ret = acpi_dma_configure(dev, attr);
 	}
 
 	return ret;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index bef17c3fca67..1b58e058b13f 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1602,8 +1602,7 @@ static int pci_dma_configure(struct device *dev)
 		struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
 		enum dev_dma_attr attr = acpi_get_dma_attr(adev);
 
-		if (attr != DEV_DMA_NOT_SUPPORTED)
-			ret = acpi_dma_configure(dev, attr);
+		ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
 	}
 
 	pci_put_host_bridge_device(bridge);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 13/15] ACPI / scan: Refactor _CCA enforcement Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-14 21:17   ` Bjorn Helgaas
  2018-12-07 19:07 ` [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct Christoph Hellwig
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

With the bypass support for the direct mapping we might not always have
methods to call, so use the proper APIs instead.  The only downside is
that we will create two dma-debug entries for each mapping if
CONFIG_DMA_DEBUG is enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/pci/controller/vmd.c | 42 +++++++++++++++---------------------
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 98ce79eac128..3890812cdf87 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -307,39 +307,32 @@ static struct device *to_vmd_dev(struct device *dev)
 	return &vmd->dev->dev;
 }
 
-static const struct dma_map_ops *vmd_dma_ops(struct device *dev)
-{
-	return get_dma_ops(to_vmd_dev(dev));
-}
-
 static void *vmd_alloc(struct device *dev, size_t size, dma_addr_t *addr,
 		       gfp_t flag, unsigned long attrs)
 {
-	return vmd_dma_ops(dev)->alloc(to_vmd_dev(dev), size, addr, flag,
-				       attrs);
+	return dma_alloc_attrs(to_vmd_dev(dev), size, addr, flag, attrs);
 }
 
 static void vmd_free(struct device *dev, size_t size, void *vaddr,
 		     dma_addr_t addr, unsigned long attrs)
 {
-	return vmd_dma_ops(dev)->free(to_vmd_dev(dev), size, vaddr, addr,
-				      attrs);
+	return dma_free_attrs(to_vmd_dev(dev), size, vaddr, addr, attrs);
 }
 
 static int vmd_mmap(struct device *dev, struct vm_area_struct *vma,
 		    void *cpu_addr, dma_addr_t addr, size_t size,
 		    unsigned long attrs)
 {
-	return vmd_dma_ops(dev)->mmap(to_vmd_dev(dev), vma, cpu_addr, addr,
-				      size, attrs);
+	return dma_mmap_attrs(to_vmd_dev(dev), vma, cpu_addr, addr, size,
+			attrs);
 }
 
 static int vmd_get_sgtable(struct device *dev, struct sg_table *sgt,
 			   void *cpu_addr, dma_addr_t addr, size_t size,
 			   unsigned long attrs)
 {
-	return vmd_dma_ops(dev)->get_sgtable(to_vmd_dev(dev), sgt, cpu_addr,
-					     addr, size, attrs);
+	return dma_get_sgtable_attrs(to_vmd_dev(dev), sgt, cpu_addr, addr, size,
+			attrs);
 }
 
 static dma_addr_t vmd_map_page(struct device *dev, struct page *page,
@@ -347,61 +340,60 @@ static dma_addr_t vmd_map_page(struct device *dev, struct page *page,
 			       enum dma_data_direction dir,
 			       unsigned long attrs)
 {
-	return vmd_dma_ops(dev)->map_page(to_vmd_dev(dev), page, offset, size,
-					  dir, attrs);
+	return dma_map_page_attrs(to_vmd_dev(dev), page, offset, size, dir,
+			attrs);
 }
 
 static void vmd_unmap_page(struct device *dev, dma_addr_t addr, size_t size,
 			   enum dma_data_direction dir, unsigned long attrs)
 {
-	vmd_dma_ops(dev)->unmap_page(to_vmd_dev(dev), addr, size, dir, attrs);
+	dma_unmap_page_attrs(to_vmd_dev(dev), addr, size, dir, attrs);
 }
 
 static int vmd_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		      enum dma_data_direction dir, unsigned long attrs)
 {
-	return vmd_dma_ops(dev)->map_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
+	return dma_map_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
 }
 
 static void vmd_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 			 enum dma_data_direction dir, unsigned long attrs)
 {
-	vmd_dma_ops(dev)->unmap_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
+	dma_unmap_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
 }
 
 static void vmd_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 				    size_t size, enum dma_data_direction dir)
 {
-	vmd_dma_ops(dev)->sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
+	dma_sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
 }
 
 static void vmd_sync_single_for_device(struct device *dev, dma_addr_t addr,
 				       size_t size, enum dma_data_direction dir)
 {
-	vmd_dma_ops(dev)->sync_single_for_device(to_vmd_dev(dev), addr, size,
-						 dir);
+	dma_sync_single_for_device(to_vmd_dev(dev), addr, size, dir);
 }
 
 static void vmd_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 				int nents, enum dma_data_direction dir)
 {
-	vmd_dma_ops(dev)->sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir);
+	dma_sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir);
 }
 
 static void vmd_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 				   int nents, enum dma_data_direction dir)
 {
-	vmd_dma_ops(dev)->sync_sg_for_device(to_vmd_dev(dev), sg, nents, dir);
+	dma_sync_sg_for_device(to_vmd_dev(dev), sg, nents, dir);
 }
 
 static int vmd_dma_supported(struct device *dev, u64 mask)
 {
-	return vmd_dma_ops(dev)->dma_supported(to_vmd_dev(dev), mask);
+	return dma_supported(to_vmd_dev(dev), mask);
 }
 
 static u64 vmd_get_required_mask(struct device *dev)
 {
-	return vmd_dma_ops(dev)->get_required_mask(to_vmd_dev(dev));
+	return dma_get_required_mask(to_vmd_dev(dev));
 }
 
 static void vmd_teardown_dma_ops(struct vmd_dev *vmd)
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls Christoph Hellwig
@ 2018-12-07 19:07 ` Christoph Hellwig
  2018-12-14 14:11   ` Marek Szyprowski
                     ` (3 more replies)
  2018-12-08 16:06 ` [RFC] avoid indirect calls for DMA direct mappings v2 Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  17 siblings, 4 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-07 19:07 UTC (permalink / raw)
  To: iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/alpha/include/asm/dma-mapping.h |   2 +-
 arch/arc/mm/cache.c                  |   2 +-
 arch/arm/include/asm/dma-mapping.h   |   2 +-
 arch/arm/mm/dma-mapping-nommu.c      |  14 +---
 arch/arm64/mm/dma-mapping.c          |   3 -
 arch/ia64/hp/common/hwsw_iommu.c     |   2 +-
 arch/ia64/hp/common/sba_iommu.c      |   4 +-
 arch/ia64/kernel/dma-mapping.c       |   1 -
 arch/mips/include/asm/dma-mapping.h  |   2 +-
 arch/parisc/kernel/setup.c           |   4 -
 arch/sparc/include/asm/dma-mapping.h |   4 +-
 arch/x86/kernel/pci-dma.c            |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   2 +-
 drivers/iommu/amd_iommu.c            |  13 +---
 include/asm-generic/dma-mapping.h    |   2 +-
 include/linux/dma-direct.h           |  17 ----
 include/linux/dma-mapping.h          | 111 +++++++++++++++++++++++----
 include/linux/dma-noncoherent.h      |   5 +-
 kernel/dma/direct.c                  |  37 ++-------
 kernel/dma/mapping.c                 |  40 ++++++----
 20 files changed, 150 insertions(+), 119 deletions(-)

diff --git a/arch/alpha/include/asm/dma-mapping.h b/arch/alpha/include/asm/dma-mapping.h
index 8beeafd4f68e..0ee6a5c99b16 100644
--- a/arch/alpha/include/asm/dma-mapping.h
+++ b/arch/alpha/include/asm/dma-mapping.h
@@ -7,7 +7,7 @@ extern const struct dma_map_ops alpha_pci_ops;
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
 #ifdef CONFIG_ALPHA_JENSEN
-	return &dma_direct_ops;
+	return NULL;
 #else
 	return &alpha_pci_ops;
 #endif
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index f2701c13a66b..e188bb3ede53 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -1280,7 +1280,7 @@ void __init arc_cache_init_master(void)
 	/*
 	 * In case of IOC (say IOC+SLC case), pointers above could still be set
 	 * but end up not being relevant as the first function in chain is not
-	 * called at all for @dma_direct_ops
+	 * called at all for devices using coherent DMA.
 	 *     arch_sync_dma_for_cpu() -> dma_cache_*() -> __dma_cache_*()
 	 */
 }
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 965b7c846ecb..31d3b96f0f4b 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_direct_ops;
+	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : NULL;
 }
 
 #ifdef __arch_page_to_dma
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 712416ecd8e6..f304b10e23a4 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -22,7 +22,7 @@
 #include "dma.h"
 
 /*
- *  dma_direct_ops is used if
+ *  The generic direct mapping code is used if
  *   - MMU/MPU is off
  *   - cpu is v7m w/o cache support
  *   - device is coherent
@@ -209,16 +209,9 @@ const struct dma_map_ops arm_nommu_dma_ops = {
 };
 EXPORT_SYMBOL(arm_nommu_dma_ops);
 
-static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
-{
-	return coherent ? &dma_direct_ops : &arm_nommu_dma_ops;
-}
-
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			const struct iommu_ops *iommu, bool coherent)
 {
-	const struct dma_map_ops *dma_ops;
-
 	if (IS_ENABLED(CONFIG_CPU_V7M)) {
 		/*
 		 * Cache support for v7m is optional, so can be treated as
@@ -234,7 +227,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 		dev->archdata.dma_coherent = (get_cr() & CR_M) ? coherent : true;
 	}
 
-	dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
-
-	set_dma_ops(dev, dma_ops);
+	if (!dev->archdata.dma_coherent)
+		set_dma_ops(dev, &arm_nommu_dma_ops);
 }
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index ab1e417204d0..95eda81e3f2d 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -462,9 +462,6 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			const struct iommu_ops *iommu, bool coherent)
 {
-	if (!dev->dma_ops)
-		dev->dma_ops = &dma_direct_ops;
-
 	dev->dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 
diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
index f40ca499b246..8840ed97712f 100644
--- a/arch/ia64/hp/common/hwsw_iommu.c
+++ b/arch/ia64/hp/common/hwsw_iommu.c
@@ -38,7 +38,7 @@ static inline int use_swiotlb(struct device *dev)
 const struct dma_map_ops *hwsw_dma_get_ops(struct device *dev)
 {
 	if (use_swiotlb(dev))
-		return &dma_direct_ops;
+		return NULL;
 	return &sba_dma_ops;
 }
 EXPORT_SYMBOL(hwsw_dma_get_ops);
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 5ee74820a0f6..5a361e51cb1e 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2078,7 +2078,7 @@ sba_init(void)
 	 * a successful kdump kernel boot is to use the swiotlb.
 	 */
 	if (is_kdump_kernel()) {
-		dma_ops = &dma_direct_ops;
+		dma_ops = NULL;
 		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
 			panic("Unable to initialize software I/O TLB:"
 				  " Try machvec=dig boot option");
@@ -2100,7 +2100,7 @@ sba_init(void)
 		 * If we didn't find something sba_iommu can claim, we
 		 * need to setup the swiotlb and switch to the dig machvec.
 		 */
-		dma_ops = &dma_direct_ops;
+		dma_ops = NULL;
 		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
 			panic("Unable to find SBA IOMMU or initialize "
 			      "software I/O TLB: Try machvec=dig boot option");
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 80cd3e1ea95a..ad7d9963de34 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -36,7 +36,6 @@ long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
 
 void __init swiotlb_dma_init(void)
 {
-	dma_ops = &dma_direct_ops;
 	swiotlb_init(1);
 }
 #endif
diff --git a/arch/mips/include/asm/dma-mapping.h b/arch/mips/include/asm/dma-mapping.h
index 69f914667f3e..20dfaad3a55d 100644
--- a/arch/mips/include/asm/dma-mapping.h
+++ b/arch/mips/include/asm/dma-mapping.h
@@ -11,7 +11,7 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 #if defined(CONFIG_MACH_JAZZ)
 	return &jazz_dma_ops;
 #else
-	return &dma_direct_ops;
+	return NULL;
 #endif
 }
 
diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
index cd227f1cf629..54818cd78bd0 100644
--- a/arch/parisc/kernel/setup.c
+++ b/arch/parisc/kernel/setup.c
@@ -99,10 +99,6 @@ void __init dma_ops_init(void)
 
 	case pcxl2:
 		pa7300lc_init();
-	case pcxl: /* falls through */
-	case pcxs:
-	case pcxt:
-		hppa_dma_ops = &dma_direct_ops;
 		break;
 	default:
 		break;
diff --git a/arch/sparc/include/asm/dma-mapping.h b/arch/sparc/include/asm/dma-mapping.h
index b0bb2fcaf1c9..59f5a0f17316 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -14,11 +14,11 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
 #ifdef CONFIG_SPARC_LEON
 	if (sparc_cpu_model == sparc_leon)
-		return &dma_direct_ops;
+		return NULL;
 #endif
 #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
 	if (bus == &pci_bus_type)
-		return &dma_direct_ops;
+		return NULL;
 #endif
 	return dma_ops;
 }
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index f4562fcec681..d460998ae828 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -17,7 +17,7 @@
 
 static bool disable_dac_quirk __read_mostly;
 
-const struct dma_map_ops *dma_ops = &dma_direct_ops;
+const struct dma_map_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
 #ifdef CONFIG_IOMMU_DEBUG
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 61a84b958d67..50637f372e9f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -581,7 +581,7 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
 
 	dev_priv->map_mode = vmw_dma_map_populate;
 
-	if (dma_ops->sync_single_for_cpu)
+	if (dma_ops && dma_ops->sync_single_for_cpu)
 		dev_priv->map_mode = vmw_dma_alloc_coherent;
 #ifdef CONFIG_SWIOTLB
 	if (swiotlb_nr_tbl() == 0)
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c5d6c7c42b0a..567221cca13c 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2184,7 +2184,7 @@ static int amd_iommu_add_device(struct device *dev)
 				dev_name(dev));
 
 		iommu_ignore_device(dev);
-		dev->dma_ops = &dma_direct_ops;
+		dev->dma_ops = NULL;
 		goto out;
 	}
 	init_iommu_group(dev);
@@ -2770,17 +2770,6 @@ int __init amd_iommu_init_dma_ops(void)
 	swiotlb        = (iommu_pass_through || sme_me_mask) ? 1 : 0;
 	iommu_detected = 1;
 
-	/*
-	 * In case we don't initialize SWIOTLB (actually the common case
-	 * when AMD IOMMU is enabled and SME is not active), make sure there
-	 * are global dma_ops set as a fall-back for devices not handled by
-	 * this driver (for example non-PCI devices). When SME is active,
-	 * make sure that swiotlb variable remains set so the global dma_ops
-	 * continue to be SWIOTLB.
-	 */
-	if (!swiotlb)
-		dma_ops = &dma_direct_ops;
-
 	if (amd_iommu_unmap_flush)
 		pr_info("AMD-Vi: IO/TLB flush on unmap enabled\n");
 	else
diff --git a/include/asm-generic/dma-mapping.h b/include/asm-generic/dma-mapping.h
index 880a292d792f..c13f46109e88 100644
--- a/include/asm-generic/dma-mapping.h
+++ b/include/asm-generic/dma-mapping.h
@@ -4,7 +4,7 @@
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-	return &dma_direct_ops;
+	return NULL;
 }
 
 #endif /* _ASM_GENERIC_DMA_MAPPING_H */
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 3b0a3ea3876d..b7338702592a 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -60,22 +60,5 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
 struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
 		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
 void __dma_direct_free_pages(struct device *dev, size_t size, struct page *page);
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-		unsigned long offset, size_t size, enum dma_data_direction dir,
-		unsigned long attrs);
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
-int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
-		enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
-		int nents, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_sync_single_for_device(struct device *dev,
-		dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,
-		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
-void dma_direct_sync_single_for_cpu(struct device *dev,
-		dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_cpu(struct device *dev,
-		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
 int dma_direct_supported(struct device *dev, u64 mask);
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 269ee27fc3d9..f422aec0f53c 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -134,7 +134,6 @@ struct dma_map_ops {
 
 #define DMA_MAPPING_ERROR		(~(dma_addr_t)0)
 
-extern const struct dma_map_ops dma_direct_ops;
 extern const struct dma_map_ops dma_virt_ops;
 extern const struct dma_map_ops dma_dummy_ops;
 
@@ -222,6 +221,69 @@ static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
 }
 #endif
 
+static inline bool dma_is_direct(const struct dma_map_ops *ops)
+{
+	return likely(!ops);
+}
+
+/*
+ * All the dma_direct_* declarations are here just for the indirect call bypass,
+ * and must not be used directly drivers!
+ */
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, enum dma_data_direction dir,
+		unsigned long attrs);
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+		enum dma_data_direction dir, unsigned long attrs);
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+    defined(CONFIG_SWIOTLB)
+void dma_direct_sync_single_for_device(struct device *dev,
+		dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_device(struct device *dev,
+		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
+#else
+static inline void dma_direct_sync_single_for_device(struct device *dev,
+		dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_device(struct device *dev,
+		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
+    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
+    defined(CONFIG_SWIOTLB)
+void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
+		int nents, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_sync_single_for_cpu(struct device *dev,
+		dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_cpu(struct device *dev,
+		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
+#else
+static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
+static inline void dma_direct_unmap_sg(struct device *dev,
+		struct scatterlist *sgl, int nents, enum dma_data_direction dir,
+		unsigned long attrs)
+{
+}
+static inline void dma_direct_sync_single_for_cpu(struct device *dev,
+		dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
+		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
 static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 					      size_t size,
 					      enum dma_data_direction dir,
@@ -232,9 +294,12 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_map_single(dev, ptr, size);
-	addr = ops->map_page(dev, virt_to_page(ptr),
-			     offset_in_page(ptr), size,
-			     dir, attrs);
+	if (dma_is_direct(ops))
+		addr = dma_direct_map_page(dev, virt_to_page(ptr),
+				offset_in_page(ptr), size, dir, attrs);
+	else
+		addr = ops->map_page(dev, virt_to_page(ptr),
+				offset_in_page(ptr), size, dir, attrs);
 	debug_dma_map_page(dev, virt_to_page(ptr),
 			   offset_in_page(ptr), size,
 			   dir, addr, true);
@@ -249,7 +314,9 @@ static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->unmap_page)
+	if (dma_is_direct(ops))
+		dma_direct_unmap_page(dev, addr, size, dir, attrs);
+	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	debug_dma_unmap_page(dev, addr, size, dir, true);
 }
@@ -272,7 +339,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	int ents;
 
 	BUG_ON(!valid_dma_direction(dir));
-	ents = ops->map_sg(dev, sg, nents, dir, attrs);
+	if (dma_is_direct(ops))
+		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
+	else
+		ents = ops->map_sg(dev, sg, nents, dir, attrs);
 	BUG_ON(ents < 0);
 	debug_dma_map_sg(dev, sg, nents, ents, dir);
 
@@ -287,7 +357,9 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_unmap_sg(dev, sg, nents, dir);
-	if (ops->unmap_sg)
+	if (dma_is_direct(ops))
+		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
+	else if (ops->unmap_sg)
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
 }
 
@@ -301,7 +373,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	addr = ops->map_page(dev, page, offset, size, dir, attrs);
+	if (dma_is_direct(ops))
+		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+	else
+		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	debug_dma_map_page(dev, page, offset, size, dir, addr, false);
 
 	return addr;
@@ -322,7 +397,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
 	BUG_ON(pfn_valid(PHYS_PFN(phys_addr)));
 
 	addr = phys_addr;
-	if (ops->map_resource)
+	if (ops && ops->map_resource)
 		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
 
 	debug_dma_map_resource(dev, phys_addr, size, dir, addr);
@@ -337,7 +412,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->unmap_resource)
+	if (ops && ops->unmap_resource)
 		ops->unmap_resource(dev, addr, size, dir, attrs);
 	debug_dma_unmap_resource(dev, addr, size, dir);
 }
@@ -349,7 +424,9 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_single_for_cpu)
+	if (dma_is_direct(ops))
+		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
+	else if (ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr, size, dir);
 	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
 }
@@ -368,7 +445,9 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_single_for_device)
+	if (dma_is_direct(ops))
+		dma_direct_sync_single_for_device(dev, addr, size, dir);
+	else if (ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr, size, dir);
 	debug_dma_sync_single_for_device(dev, addr, size, dir);
 }
@@ -387,7 +466,9 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_sg_for_cpu)
+	if (dma_is_direct(ops))
+		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
+	else if (ops->sync_sg_for_cpu)
 		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
 	debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
 }
@@ -399,7 +480,9 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_sg_for_device)
+	if (dma_is_direct(ops))
+		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
+	else if (ops->sync_sg_for_device)
 		ops->sync_sg_for_device(dev, sg, nelems, dir);
 	debug_dma_sync_sg_for_device(dev, sg, nelems, dir);
 
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 306557331d7d..69b36ed31a99 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -38,7 +38,10 @@ pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
 void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		enum dma_data_direction direction);
 #else
-#define arch_dma_cache_sync NULL
+static inline void arch_dma_cache_sync(struct device *dev, void *vaddr,
+		size_t size, enum dma_data_direction direction)
+{
+}
 #endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
 
 #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 85d8286a0ba2..79da61b49fa4 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -223,6 +223,7 @@ void dma_direct_sync_single_for_device(struct device *dev,
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_device(dev, paddr, size, dir);
 }
+EXPORT_SYMBOL(dma_direct_sync_single_for_device);
 
 void dma_direct_sync_sg_for_device(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
@@ -240,6 +241,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
 					dir);
 	}
 }
+EXPORT_SYMBOL(dma_direct_sync_sg_for_device);
 #endif
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
@@ -258,6 +260,7 @@ void dma_direct_sync_single_for_cpu(struct device *dev,
 	if (unlikely(is_swiotlb_buffer(paddr)))
 		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
 }
+EXPORT_SYMBOL(dma_direct_sync_single_for_cpu);
 
 void dma_direct_sync_sg_for_cpu(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
@@ -277,6 +280,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_cpu_all(dev);
 }
+EXPORT_SYMBOL(dma_direct_sync_sg_for_cpu);
 
 void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
@@ -289,6 +293,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 	if (unlikely(is_swiotlb_buffer(phys)))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
+EXPORT_SYMBOL(dma_direct_unmap_page);
 
 void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
@@ -300,11 +305,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
 		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
 			     attrs);
 }
-#else
-void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
-		int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-}
+EXPORT_SYMBOL(dma_direct_unmap_sg);
 #endif
 
 static inline bool dma_direct_possible(struct device *dev, dma_addr_t dma_addr,
@@ -331,6 +332,7 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 		arch_sync_dma_for_device(dev, phys, size, dir);
 	return dma_addr;
 }
+EXPORT_SYMBOL(dma_direct_map_page);
 
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 		enum dma_data_direction dir, unsigned long attrs)
@@ -352,6 +354,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
 	dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
 	return 0;
 }
+EXPORT_SYMBOL(dma_direct_map_sg);
 
 /*
  * Because 32-bit DMA masks are so common we expect every architecture to be
@@ -372,27 +375,3 @@ int dma_direct_supported(struct device *dev, u64 mask)
 
 	return mask >= phys_to_dma(dev, min_mask);
 }
-
-const struct dma_map_ops dma_direct_ops = {
-	.alloc			= dma_direct_alloc,
-	.free			= dma_direct_free,
-	.map_page		= dma_direct_map_page,
-	.map_sg			= dma_direct_map_sg,
-#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
-    defined(CONFIG_SWIOTLB)
-	.sync_single_for_device	= dma_direct_sync_single_for_device,
-	.sync_sg_for_device	= dma_direct_sync_sg_for_device,
-#endif
-#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
-    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
-    defined(CONFIG_SWIOTLB)
-	.sync_single_for_cpu	= dma_direct_sync_single_for_cpu,
-	.sync_sg_for_cpu	= dma_direct_sync_sg_for_cpu,
-	.unmap_page		= dma_direct_unmap_page,
-	.unmap_sg		= dma_direct_unmap_sg,
-#endif
-	.get_required_mask	= dma_direct_get_required_mask,
-	.dma_supported		= dma_direct_supported,
-	.cache_sync		= arch_dma_cache_sync,
-};
-EXPORT_SYMBOL(dma_direct_ops);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 0b18cfbdde95..fc84c81029d9 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -7,6 +7,7 @@
  */
 #include <linux/memblock.h> /* for max_pfn */
 #include <linux/acpi.h>
+#include <linux/dma-direct.h>
 #include <linux/dma-noncoherent.h>
 #include <linux/export.h>
 #include <linux/gfp.h>
@@ -229,8 +230,8 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 		unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-	BUG_ON(!ops);
-	if (ops->get_sgtable)
+
+	if (!dma_is_direct(ops) && ops->get_sgtable)
 		return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
 					attrs);
 	return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
@@ -293,8 +294,8 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 		unsigned long attrs)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
-	BUG_ON(!ops);
-	if (ops->mmap)
+
+	if (!dma_is_direct(ops) && ops->mmap)
 		return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
 	return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
 }
@@ -324,6 +325,8 @@ u64 dma_get_required_mask(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
+	if (dma_is_direct(ops))
+		return dma_direct_get_required_mask(dev);
 	if (ops->get_required_mask)
 		return ops->get_required_mask(dev);
 	return dma_default_get_required_mask(dev);
@@ -341,7 +344,6 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	void *cpu_addr;
 
-	BUG_ON(!ops);
 	WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
 
 	if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr))
@@ -352,10 +354,14 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 
 	if (!arch_dma_alloc_attrs(&dev))
 		return NULL;
-	if (!ops->alloc)
+
+	if (dma_is_direct(ops))
+		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
+	else if (ops->alloc)
+		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
+	else
 		return NULL;
 
-	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
 	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
 	return cpu_addr;
 }
@@ -366,8 +372,6 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	BUG_ON(!ops);
-
 	if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr))
 		return;
 	/*
@@ -379,11 +383,14 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 	 */
 	WARN_ON(irqs_disabled());
 
-	if (!ops->free || !cpu_addr)
+	if (!cpu_addr)
 		return;
 
 	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
-	ops->free(dev, size, cpu_addr, dma_handle, attrs);
+	if (dma_is_direct(ops))
+		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
+	else if (ops->free)
+		ops->free(dev, size, cpu_addr, dma_handle, attrs);
 }
 EXPORT_SYMBOL(dma_free_attrs);
 
@@ -397,9 +404,9 @@ int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (!ops)
-		return 0;
-	if (!ops->dma_supported)
+	if (dma_is_direct(ops))
+		return dma_direct_supported(dev, mask);
+	if (ops->dma_supported)
 		return 1;
 	return ops->dma_supported(dev, mask);
 }
@@ -437,7 +444,10 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->cache_sync)
+
+	if (dma_is_direct(ops))
+		arch_dma_cache_sync(dev, vaddr, size, dir);
+	else if (ops->cache_sync)
 		ops->cache_sync(dev, vaddr, size, dir);
 }
 EXPORT_SYMBOL(dma_cache_sync);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] avoid indirect calls for DMA direct mappings v2
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2018-12-07 19:07 ` [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct Christoph Hellwig
@ 2018-12-08 16:06 ` Jesper Dangaard Brouer
  2018-12-08 16:50   ` Christoph Hellwig
  2018-12-10 21:51 ` Luck, Tony
  2018-12-13 20:08 ` Christoph Hellwig
  17 siblings, 1 reply; 41+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-08 16:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Tariq Toukan, Ilias Apalodimas,
	Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Marek Szyprowski,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel, brouer

On Fri,  7 Dec 2018 11:07:05 -0800
Christoph Hellwig <hch@lst.de> wrote:

> Hi all,
> 
> a while ago Jesper reported major performance regressions due to the
> spectre v2 mitigations in his XDP forwarding workloads.  A large part
> of that is due to the DMA mapping API indirect calls.
> 
> It turns out that the most common implementation of the DMA API is the
> direct mapping case, and now that we have merged almost all duplicate
> implementations of that into a single generic one is easily feasily to
> direct calls for this fast path.
> 
> This series adds consolidate the DMA mapping code by merging the
> swiotlb case into the dma direct case, and then treats NULL dma_ops
> as an indicator that that we should directly call the direct mapping
> case.  This recovers a large part of the retpoline induces XDP slowdown.
> 
> This works is based on the dma-mapping tree, so you probably want to
> want this git tree for testing:
> 
>     git://git.infradead.org/users/hch/misc.git dma-direct-calls.2
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2

You can add my:
 Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
or
 Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

I'm very happy that you work on this.  And I've done micro-benchmark
testing of the patchset (and branch dma-direct-calls), which I've made
avail here:
 https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org

My XDP performance is back, minus the BPF-indirect call, and
net_rx_action napi->poll, and net_device->ndo_xdp_xmit calls.  I
verified that manually disabling retpoline for these remaining netstack
retpoline-calls restore the performance full (well minus 1.5 nanosec).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] avoid indirect calls for DMA direct mappings v2
  2018-12-08 16:06 ` [RFC] avoid indirect calls for DMA direct mappings v2 Jesper Dangaard Brouer
@ 2018-12-08 16:50   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-08 16:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Christoph Hellwig, iommu, Linus Torvalds, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Marek Szyprowski,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel

On Sat, Dec 08, 2018 at 05:06:48PM +0100, Jesper Dangaard Brouer wrote:
> You can add my:
>  Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
> or
>  Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> I'm very happy that you work on this.  And I've done micro-benchmark
> testing of the patchset (and branch dma-direct-calls), which I've made
> avail here:
>  https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org
> 
> My XDP performance is back, minus the BPF-indirect call, and
> net_rx_action napi->poll, and net_device->ndo_xdp_xmit calls.  I
> verified that manually disabling retpoline for these remaining netstack
> retpoline-calls restore the performance full (well minus 1.5 nanosec).

Thanks a lot for all the testing (and the initial report).  I'd love to
still gets this in for 4.21 if we can, but for that I'd need a lot
more folks to carefull review it, especially the swiotlb and ia64 bits.
So if we don't get that early next week I feel we might have to punt
it to the next merge window.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] avoid indirect calls for DMA direct mappings v2
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2018-12-08 16:06 ` [RFC] avoid indirect calls for DMA direct mappings v2 Jesper Dangaard Brouer
@ 2018-12-10 21:51 ` Luck, Tony
  2018-12-11  6:51   ` Christoph Hellwig
  2018-12-13 20:08 ` Christoph Hellwig
  17 siblings, 1 reply; 41+ messages in thread
From: Luck, Tony @ 2018-12-10 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Fenghua Yu, Marek Szyprowski, Keith Busch,
	Jonathan Derrick, linux-pci, linux-ia64, x86, linux-kernel

On Fri, Dec 07, 2018 at 11:07:05AM -0800, Christoph Hellwig wrote:
> This works is based on the dma-mapping tree, so you probably want to
> want this git tree for testing:
> 
>     git://git.infradead.org/users/hch/misc.git dma-direct-calls.2

Pulled this tree. Got HEAD

	33b9fc015171 ("dma-mapping: bypass indirect calls for dma-direct")

But the ia64 build fails with:

arch/ia64/mm/init.c:75:21: warning: 'enum dma_data_direction' declared inside parameter list [enabled by default]
arch/ia64/mm/init.c:75:21: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
arch/ia64/mm/init.c:75:40: error: parameter 4 ('dir') has incomplete type
arch/ia64/mm/init.c:74:6: error: function declaration isn't a prototype [-Werror=strict-prototypes]
arch/ia64/mm/init.c: In function 'arch_sync_dma_for_cpu':
arch/ia64/mm/init.c:77:2: error: implicit declaration of function '__phys_to_pfn' [-Werror=implicit-function-declaration]

-Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] avoid indirect calls for DMA direct mappings v2
  2018-12-10 21:51 ` Luck, Tony
@ 2018-12-11  6:51   ` Christoph Hellwig
  2018-12-11 16:42     ` Luck, Tony
  2018-12-11 17:13     ` Luck, Tony
  0 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-11  6:51 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Christoph Hellwig, iommu, Linus Torvalds, Jesper Dangaard Brouer,
	Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

On Mon, Dec 10, 2018 at 01:51:13PM -0800, Luck, Tony wrote:
> But the ia64 build fails with:

Yes, I just got the same complaint form the buildbot, unfortunately
I don't have a good ia64 cross compiler locally given that Debian
is lacking one, and the one provided by the buildbot doesn't build on
Debian either..

This should fix it:

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 2c51733f1dfd..a007afaa556c 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -8,6 +8,7 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 
+#include <linux/dma-noncoherent.h>
 #include <linux/efi.h>
 #include <linux/elf.h>
 #include <linux/memblock.h>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [RFC] avoid indirect calls for DMA direct mappings v2
  2018-12-11  6:51   ` Christoph Hellwig
@ 2018-12-11 16:42     ` Luck, Tony
  2018-12-11 17:13     ` Luck, Tony
  1 sibling, 0 replies; 41+ messages in thread
From: Luck, Tony @ 2018-12-11 16:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Yu, Fenghua, Marek Szyprowski, Busch,
	Keith, Derrick, Jonathan, linux-pci, linux-ia64, x86,
	linux-kernel

> This should fix it:
...
> +#include <linux/dma-noncoherent.h>

Not quite. Still have an issue with __phys_to_pfn(paddr)

Trying ti #include <asm-generic/memory_model.h> gave we a raft of redefined
macros. So I just added

#define      __phys_to_pfn(paddr)    PHYS_PFN(paddr)

to arch/ia64/mm/init.c

That made the build work. But boot spontaneously resets after:

mptsas: ioc1: attaching ssp device: fw_channel 0, fw_id 6, phy 6, sas_addr 0x5000c5000ecada69
scsi 5:0:0:0: Direct-Access     SEAGATE  ST9146802SS      0003 PQ: 0 ANSI: 5
EFI Variables Facility v0.08 2004-May-17
sd 5:0:0:0: [sdb] 286749488 512-byte logical blocks: (147 GB/137 GiB)
sd 5:0:0:0: [sdb] Write Protect is off
sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
 sdb: sdb1 sdb2
sd 5:0:0:0: [sdb] Attached SCSI disk

But that might not be your fault. My ancient system is getting flaky. A v4.19 build that
has booted before is also resetting :-(

-Tony



^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [RFC] avoid indirect calls for DMA direct mappings v2
  2018-12-11  6:51   ` Christoph Hellwig
  2018-12-11 16:42     ` Luck, Tony
@ 2018-12-11 17:13     ` Luck, Tony
  2018-12-11 17:15       ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Luck, Tony @ 2018-12-11 17:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Yu, Fenghua, Marek Szyprowski, Busch,
	Keith, Derrick, Jonathan, linux-pci, linux-ia64, x86,
	linux-kernel

> But that might not be your fault. My ancient system is getting flaky. A v4.19 build that
> has booted before is also resetting :-(

After a power-cycle (and some time to let the machine cool off). System now boots
with your patch series plus the __phys_to_pfn() #define

So if you can figure the right way to fix that, you are good to go.

Tested-by: Tony Luck <tony.luck@intel.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] avoid indirect calls for DMA direct mappings v2
  2018-12-11 17:13     ` Luck, Tony
@ 2018-12-11 17:15       ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-11 17:15 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Christoph Hellwig, iommu, Linus Torvalds, Jesper Dangaard Brouer,
	Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Yu, Fenghua,
	Marek Szyprowski, Busch, Keith, Derrick, Jonathan, linux-pci,
	linux-ia64, x86, linux-kernel

On Tue, Dec 11, 2018 at 05:13:30PM +0000, Luck, Tony wrote:
> > But that might not be your fault. My ancient system is getting flaky. A v4.19 build that
> > has booted before is also resetting :-(
> 
> After a power-cycle (and some time to let the machine cool off). System now boots
> with your patch series plus the __phys_to_pfn() #define
> 
> So if you can figure the right way to fix that, you are good to go.
> 
> Tested-by: Tony Luck <tony.luck@intel.com>

Thanks.  I'll just replace the __phys_to_pfn with PHYS_PFN for now,
and see if I can find time to get the whole kernel to agree to
one version of this macro eventually..

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] avoid indirect calls for DMA direct mappings v2
  2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2018-12-10 21:51 ` Luck, Tony
@ 2018-12-13 20:08 ` Christoph Hellwig
  17 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-13 20:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Fenghua Yu,
	Toke Høiland-Jørgensen, linux-ia64,
	Konrad Rzeszutek Wilk, linux-pci, Ilias Apalodimas, x86,
	linux-kernel, Keith Busch, Tony Luck, Robin Murphy,
	Jonathan Derrick, Tariq Toukan

I've pulled v2 with the ia64 into dma-mapping for-next.  This should
give us a little more than a week in linux-next to sort out any
issues.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-07 19:07 ` [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct Christoph Hellwig
@ 2018-12-14 14:11   ` Marek Szyprowski
  2018-12-14 14:24     ` Christoph Hellwig
  2018-12-15 17:46   ` [15/15] " Guenter Roeck
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Marek Szyprowski @ 2018-12-14 14:11 UTC (permalink / raw)
  To: Christoph Hellwig, iommu, Linus Torvalds, Jesper Dangaard Brouer
  Cc: Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel

Hi Christoph,

On 2018-12-07 20:07, Christoph Hellwig wrote:
> Avoid expensive indirect calls in the fast path DMA mapping
> operations by directly calling the dma_direct_* ops if we are using
> the directly mapped DMA operations.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This breaks direct DMA on ARM64 (also todays linux-next). NULL
dev->dma_ops fallbacks to get_arch_dma_ops(), which in turn returns
non-functional &dma_dummy_ops on ARM64...

> ---
>  arch/alpha/include/asm/dma-mapping.h |   2 +-
>  arch/arc/mm/cache.c                  |   2 +-
>  arch/arm/include/asm/dma-mapping.h   |   2 +-
>  arch/arm/mm/dma-mapping-nommu.c      |  14 +---
>  arch/arm64/mm/dma-mapping.c          |   3 -
>  arch/ia64/hp/common/hwsw_iommu.c     |   2 +-
>  arch/ia64/hp/common/sba_iommu.c      |   4 +-
>  arch/ia64/kernel/dma-mapping.c       |   1 -
>  arch/mips/include/asm/dma-mapping.h  |   2 +-
>  arch/parisc/kernel/setup.c           |   4 -
>  arch/sparc/include/asm/dma-mapping.h |   4 +-
>  arch/x86/kernel/pci-dma.c            |   2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   2 +-
>  drivers/iommu/amd_iommu.c            |  13 +---
>  include/asm-generic/dma-mapping.h    |   2 +-
>  include/linux/dma-direct.h           |  17 ----
>  include/linux/dma-mapping.h          | 111 +++++++++++++++++++++++----
>  include/linux/dma-noncoherent.h      |   5 +-
>  kernel/dma/direct.c                  |  37 ++-------
>  kernel/dma/mapping.c                 |  40 ++++++----
>  20 files changed, 150 insertions(+), 119 deletions(-)
>
> diff --git a/arch/alpha/include/asm/dma-mapping.h b/arch/alpha/include/asm/dma-mapping.h
> index 8beeafd4f68e..0ee6a5c99b16 100644
> --- a/arch/alpha/include/asm/dma-mapping.h
> +++ b/arch/alpha/include/asm/dma-mapping.h
> @@ -7,7 +7,7 @@ extern const struct dma_map_ops alpha_pci_ops;
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
>  #ifdef CONFIG_ALPHA_JENSEN
> -	return &dma_direct_ops;
> +	return NULL;
>  #else
>  	return &alpha_pci_ops;
>  #endif
> diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
> index f2701c13a66b..e188bb3ede53 100644
> --- a/arch/arc/mm/cache.c
> +++ b/arch/arc/mm/cache.c
> @@ -1280,7 +1280,7 @@ void __init arc_cache_init_master(void)
>  	/*
>  	 * In case of IOC (say IOC+SLC case), pointers above could still be set
>  	 * but end up not being relevant as the first function in chain is not
> -	 * called at all for @dma_direct_ops
> +	 * called at all for devices using coherent DMA.
>  	 *     arch_sync_dma_for_cpu() -> dma_cache_*() -> __dma_cache_*()
>  	 */
>  }
> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 965b7c846ecb..31d3b96f0f4b 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
>  
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
> -	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_direct_ops;
> +	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : NULL;
>  }
>  
>  #ifdef __arch_page_to_dma
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> index 712416ecd8e6..f304b10e23a4 100644
> --- a/arch/arm/mm/dma-mapping-nommu.c
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -22,7 +22,7 @@
>  #include "dma.h"
>  
>  /*
> - *  dma_direct_ops is used if
> + *  The generic direct mapping code is used if
>   *   - MMU/MPU is off
>   *   - cpu is v7m w/o cache support
>   *   - device is coherent
> @@ -209,16 +209,9 @@ const struct dma_map_ops arm_nommu_dma_ops = {
>  };
>  EXPORT_SYMBOL(arm_nommu_dma_ops);
>  
> -static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
> -{
> -	return coherent ? &dma_direct_ops : &arm_nommu_dma_ops;
> -}
> -
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  			const struct iommu_ops *iommu, bool coherent)
>  {
> -	const struct dma_map_ops *dma_ops;
> -
>  	if (IS_ENABLED(CONFIG_CPU_V7M)) {
>  		/*
>  		 * Cache support for v7m is optional, so can be treated as
> @@ -234,7 +227,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  		dev->archdata.dma_coherent = (get_cr() & CR_M) ? coherent : true;
>  	}
>  
> -	dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
> -
> -	set_dma_ops(dev, dma_ops);
> +	if (!dev->archdata.dma_coherent)
> +		set_dma_ops(dev, &arm_nommu_dma_ops);
>  }
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index ab1e417204d0..95eda81e3f2d 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -462,9 +462,6 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  			const struct iommu_ops *iommu, bool coherent)
>  {
> -	if (!dev->dma_ops)
> -		dev->dma_ops = &dma_direct_ops;
> -
>  	dev->dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
>  
> diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
> index f40ca499b246..8840ed97712f 100644
> --- a/arch/ia64/hp/common/hwsw_iommu.c
> +++ b/arch/ia64/hp/common/hwsw_iommu.c
> @@ -38,7 +38,7 @@ static inline int use_swiotlb(struct device *dev)
>  const struct dma_map_ops *hwsw_dma_get_ops(struct device *dev)
>  {
>  	if (use_swiotlb(dev))
> -		return &dma_direct_ops;
> +		return NULL;
>  	return &sba_dma_ops;
>  }
>  EXPORT_SYMBOL(hwsw_dma_get_ops);
> diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
> index 5ee74820a0f6..5a361e51cb1e 100644
> --- a/arch/ia64/hp/common/sba_iommu.c
> +++ b/arch/ia64/hp/common/sba_iommu.c
> @@ -2078,7 +2078,7 @@ sba_init(void)
>  	 * a successful kdump kernel boot is to use the swiotlb.
>  	 */
>  	if (is_kdump_kernel()) {
> -		dma_ops = &dma_direct_ops;
> +		dma_ops = NULL;
>  		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
>  			panic("Unable to initialize software I/O TLB:"
>  				  " Try machvec=dig boot option");
> @@ -2100,7 +2100,7 @@ sba_init(void)
>  		 * If we didn't find something sba_iommu can claim, we
>  		 * need to setup the swiotlb and switch to the dig machvec.
>  		 */
> -		dma_ops = &dma_direct_ops;
> +		dma_ops = NULL;
>  		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
>  			panic("Unable to find SBA IOMMU or initialize "
>  			      "software I/O TLB: Try machvec=dig boot option");
> diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
> index 80cd3e1ea95a..ad7d9963de34 100644
> --- a/arch/ia64/kernel/dma-mapping.c
> +++ b/arch/ia64/kernel/dma-mapping.c
> @@ -36,7 +36,6 @@ long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
>  
>  void __init swiotlb_dma_init(void)
>  {
> -	dma_ops = &dma_direct_ops;
>  	swiotlb_init(1);
>  }
>  #endif
> diff --git a/arch/mips/include/asm/dma-mapping.h b/arch/mips/include/asm/dma-mapping.h
> index 69f914667f3e..20dfaad3a55d 100644
> --- a/arch/mips/include/asm/dma-mapping.h
> +++ b/arch/mips/include/asm/dma-mapping.h
> @@ -11,7 +11,7 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  #if defined(CONFIG_MACH_JAZZ)
>  	return &jazz_dma_ops;
>  #else
> -	return &dma_direct_ops;
> +	return NULL;
>  #endif
>  }
>  
> diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
> index cd227f1cf629..54818cd78bd0 100644
> --- a/arch/parisc/kernel/setup.c
> +++ b/arch/parisc/kernel/setup.c
> @@ -99,10 +99,6 @@ void __init dma_ops_init(void)
>  
>  	case pcxl2:
>  		pa7300lc_init();
> -	case pcxl: /* falls through */
> -	case pcxs:
> -	case pcxt:
> -		hppa_dma_ops = &dma_direct_ops;
>  		break;
>  	default:
>  		break;
> diff --git a/arch/sparc/include/asm/dma-mapping.h b/arch/sparc/include/asm/dma-mapping.h
> index b0bb2fcaf1c9..59f5a0f17316 100644
> --- a/arch/sparc/include/asm/dma-mapping.h
> +++ b/arch/sparc/include/asm/dma-mapping.h
> @@ -14,11 +14,11 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
>  #ifdef CONFIG_SPARC_LEON
>  	if (sparc_cpu_model == sparc_leon)
> -		return &dma_direct_ops;
> +		return NULL;
>  #endif
>  #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
>  	if (bus == &pci_bus_type)
> -		return &dma_direct_ops;
> +		return NULL;
>  #endif
>  	return dma_ops;
>  }
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index f4562fcec681..d460998ae828 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -17,7 +17,7 @@
>  
>  static bool disable_dac_quirk __read_mostly;
>  
> -const struct dma_map_ops *dma_ops = &dma_direct_ops;
> +const struct dma_map_ops *dma_ops;
>  EXPORT_SYMBOL(dma_ops);
>  
>  #ifdef CONFIG_IOMMU_DEBUG
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 61a84b958d67..50637f372e9f 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -581,7 +581,7 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
>  
>  	dev_priv->map_mode = vmw_dma_map_populate;
>  
> -	if (dma_ops->sync_single_for_cpu)
> +	if (dma_ops && dma_ops->sync_single_for_cpu)
>  		dev_priv->map_mode = vmw_dma_alloc_coherent;
>  #ifdef CONFIG_SWIOTLB
>  	if (swiotlb_nr_tbl() == 0)
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index c5d6c7c42b0a..567221cca13c 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2184,7 +2184,7 @@ static int amd_iommu_add_device(struct device *dev)
>  				dev_name(dev));
>  
>  		iommu_ignore_device(dev);
> -		dev->dma_ops = &dma_direct_ops;
> +		dev->dma_ops = NULL;
>  		goto out;
>  	}
>  	init_iommu_group(dev);
> @@ -2770,17 +2770,6 @@ int __init amd_iommu_init_dma_ops(void)
>  	swiotlb        = (iommu_pass_through || sme_me_mask) ? 1 : 0;
>  	iommu_detected = 1;
>  
> -	/*
> -	 * In case we don't initialize SWIOTLB (actually the common case
> -	 * when AMD IOMMU is enabled and SME is not active), make sure there
> -	 * are global dma_ops set as a fall-back for devices not handled by
> -	 * this driver (for example non-PCI devices). When SME is active,
> -	 * make sure that swiotlb variable remains set so the global dma_ops
> -	 * continue to be SWIOTLB.
> -	 */
> -	if (!swiotlb)
> -		dma_ops = &dma_direct_ops;
> -
>  	if (amd_iommu_unmap_flush)
>  		pr_info("AMD-Vi: IO/TLB flush on unmap enabled\n");
>  	else
> diff --git a/include/asm-generic/dma-mapping.h b/include/asm-generic/dma-mapping.h
> index 880a292d792f..c13f46109e88 100644
> --- a/include/asm-generic/dma-mapping.h
> +++ b/include/asm-generic/dma-mapping.h
> @@ -4,7 +4,7 @@
>  
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
> -	return &dma_direct_ops;
> +	return NULL;
>  }
>  
>  #endif /* _ASM_GENERIC_DMA_MAPPING_H */
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index 3b0a3ea3876d..b7338702592a 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -60,22 +60,5 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
>  struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>  		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
>  void __dma_direct_free_pages(struct device *dev, size_t size, struct page *page);
> -dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
> -		unsigned long offset, size_t size, enum dma_data_direction dir,
> -		unsigned long attrs);
> -void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> -		size_t size, enum dma_data_direction dir, unsigned long attrs);
> -int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
> -		enum dma_data_direction dir, unsigned long attrs);
> -void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> -		int nents, enum dma_data_direction dir, unsigned long attrs);
> -void dma_direct_sync_single_for_device(struct device *dev,
> -		dma_addr_t addr, size_t size, enum dma_data_direction dir);
> -void dma_direct_sync_sg_for_device(struct device *dev,
> -		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
> -void dma_direct_sync_single_for_cpu(struct device *dev,
> -		dma_addr_t addr, size_t size, enum dma_data_direction dir);
> -void dma_direct_sync_sg_for_cpu(struct device *dev,
> -		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
>  int dma_direct_supported(struct device *dev, u64 mask);
>  #endif /* _LINUX_DMA_DIRECT_H */
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 269ee27fc3d9..f422aec0f53c 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -134,7 +134,6 @@ struct dma_map_ops {
>  
>  #define DMA_MAPPING_ERROR		(~(dma_addr_t)0)
>  
> -extern const struct dma_map_ops dma_direct_ops;
>  extern const struct dma_map_ops dma_virt_ops;
>  extern const struct dma_map_ops dma_dummy_ops;
>  
> @@ -222,6 +221,69 @@ static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
>  }
>  #endif
>  
> +static inline bool dma_is_direct(const struct dma_map_ops *ops)
> +{
> +	return likely(!ops);
> +}
> +
> +/*
> + * All the dma_direct_* declarations are here just for the indirect call bypass,
> + * and must not be used directly drivers!
> + */
> +dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
> +		unsigned long offset, size_t size, enum dma_data_direction dir,
> +		unsigned long attrs);
> +int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
> +		enum dma_data_direction dir, unsigned long attrs);
> +
> +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
> +    defined(CONFIG_SWIOTLB)
> +void dma_direct_sync_single_for_device(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir);
> +void dma_direct_sync_sg_for_device(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
> +#else
> +static inline void dma_direct_sync_single_for_device(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
> +{
> +}
> +static inline void dma_direct_sync_sg_for_device(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
> +{
> +}
> +#endif
> +
> +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> +    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
> +    defined(CONFIG_SWIOTLB)
> +void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> +		size_t size, enum dma_data_direction dir, unsigned long attrs);
> +void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> +		int nents, enum dma_data_direction dir, unsigned long attrs);
> +void dma_direct_sync_single_for_cpu(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir);
> +void dma_direct_sync_sg_for_cpu(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
> +#else
> +static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> +		size_t size, enum dma_data_direction dir, unsigned long attrs)
> +{
> +}
> +static inline void dma_direct_unmap_sg(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +}
> +static inline void dma_direct_sync_single_for_cpu(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
> +{
> +}
> +static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
> +{
> +}
> +#endif
> +
>  static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>  					      size_t size,
>  					      enum dma_data_direction dir,
> @@ -232,9 +294,12 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  	debug_dma_map_single(dev, ptr, size);
> -	addr = ops->map_page(dev, virt_to_page(ptr),
> -			     offset_in_page(ptr), size,
> -			     dir, attrs);
> +	if (dma_is_direct(ops))
> +		addr = dma_direct_map_page(dev, virt_to_page(ptr),
> +				offset_in_page(ptr), size, dir, attrs);
> +	else
> +		addr = ops->map_page(dev, virt_to_page(ptr),
> +				offset_in_page(ptr), size, dir, attrs);
>  	debug_dma_map_page(dev, virt_to_page(ptr),
>  			   offset_in_page(ptr), size,
>  			   dir, addr, true);
> @@ -249,7 +314,9 @@ static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->unmap_page)
> +	if (dma_is_direct(ops))
> +		dma_direct_unmap_page(dev, addr, size, dir, attrs);
> +	else if (ops->unmap_page)
>  		ops->unmap_page(dev, addr, size, dir, attrs);
>  	debug_dma_unmap_page(dev, addr, size, dir, true);
>  }
> @@ -272,7 +339,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>  	int ents;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	ents = ops->map_sg(dev, sg, nents, dir, attrs);
> +	if (dma_is_direct(ops))
> +		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
> +	else
> +		ents = ops->map_sg(dev, sg, nents, dir, attrs);
>  	BUG_ON(ents < 0);
>  	debug_dma_map_sg(dev, sg, nents, ents, dir);
>  
> @@ -287,7 +357,9 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  	debug_dma_unmap_sg(dev, sg, nents, dir);
> -	if (ops->unmap_sg)
> +	if (dma_is_direct(ops))
> +		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
> +	else if (ops->unmap_sg)
>  		ops->unmap_sg(dev, sg, nents, dir, attrs);
>  }
>  
> @@ -301,7 +373,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>  	dma_addr_t addr;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	addr = ops->map_page(dev, page, offset, size, dir, attrs);
> +	if (dma_is_direct(ops))
> +		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
> +	else
> +		addr = ops->map_page(dev, page, offset, size, dir, attrs);
>  	debug_dma_map_page(dev, page, offset, size, dir, addr, false);
>  
>  	return addr;
> @@ -322,7 +397,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>  	BUG_ON(pfn_valid(PHYS_PFN(phys_addr)));
>  
>  	addr = phys_addr;
> -	if (ops->map_resource)
> +	if (ops && ops->map_resource)
>  		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
>  
>  	debug_dma_map_resource(dev, phys_addr, size, dir, addr);
> @@ -337,7 +412,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->unmap_resource)
> +	if (ops && ops->unmap_resource)
>  		ops->unmap_resource(dev, addr, size, dir, attrs);
>  	debug_dma_unmap_resource(dev, addr, size, dir);
>  }
> @@ -349,7 +424,9 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->sync_single_for_cpu)
> +	if (dma_is_direct(ops))
> +		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
> +	else if (ops->sync_single_for_cpu)
>  		ops->sync_single_for_cpu(dev, addr, size, dir);
>  	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
>  }
> @@ -368,7 +445,9 @@ static inline void dma_sync_single_for_device(struct device *dev,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->sync_single_for_device)
> +	if (dma_is_direct(ops))
> +		dma_direct_sync_single_for_device(dev, addr, size, dir);
> +	else if (ops->sync_single_for_device)
>  		ops->sync_single_for_device(dev, addr, size, dir);
>  	debug_dma_sync_single_for_device(dev, addr, size, dir);
>  }
> @@ -387,7 +466,9 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->sync_sg_for_cpu)
> +	if (dma_is_direct(ops))
> +		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
> +	else if (ops->sync_sg_for_cpu)
>  		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
>  	debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
>  }
> @@ -399,7 +480,9 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->sync_sg_for_device)
> +	if (dma_is_direct(ops))
> +		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
> +	else if (ops->sync_sg_for_device)
>  		ops->sync_sg_for_device(dev, sg, nelems, dir);
>  	debug_dma_sync_sg_for_device(dev, sg, nelems, dir);
>  
> diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
> index 306557331d7d..69b36ed31a99 100644
> --- a/include/linux/dma-noncoherent.h
> +++ b/include/linux/dma-noncoherent.h
> @@ -38,7 +38,10 @@ pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
>  void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>  		enum dma_data_direction direction);
>  #else
> -#define arch_dma_cache_sync NULL
> +static inline void arch_dma_cache_sync(struct device *dev, void *vaddr,
> +		size_t size, enum dma_data_direction direction)
> +{
> +}
>  #endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
>  
>  #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 85d8286a0ba2..79da61b49fa4 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -223,6 +223,7 @@ void dma_direct_sync_single_for_device(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_device(dev, paddr, size, dir);
>  }
> +EXPORT_SYMBOL(dma_direct_sync_single_for_device);
>  
>  void dma_direct_sync_sg_for_device(struct device *dev,
>  		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
> @@ -240,6 +241,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>  					dir);
>  	}
>  }
> +EXPORT_SYMBOL(dma_direct_sync_sg_for_device);
>  #endif
>  
>  #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> @@ -258,6 +260,7 @@ void dma_direct_sync_single_for_cpu(struct device *dev,
>  	if (unlikely(is_swiotlb_buffer(paddr)))
>  		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
>  }
> +EXPORT_SYMBOL(dma_direct_sync_single_for_cpu);
>  
>  void dma_direct_sync_sg_for_cpu(struct device *dev,
>  		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
> @@ -277,6 +280,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_cpu_all(dev);
>  }
> +EXPORT_SYMBOL(dma_direct_sync_sg_for_cpu);
>  
>  void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>  		size_t size, enum dma_data_direction dir, unsigned long attrs)
> @@ -289,6 +293,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>  	if (unlikely(is_swiotlb_buffer(phys)))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
> +EXPORT_SYMBOL(dma_direct_unmap_page);
>  
>  void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>  		int nents, enum dma_data_direction dir, unsigned long attrs)
> @@ -300,11 +305,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>  		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>  			     attrs);
>  }
> -#else
> -void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> -		int nents, enum dma_data_direction dir, unsigned long attrs)
> -{
> -}
> +EXPORT_SYMBOL(dma_direct_unmap_sg);
>  #endif
>  
>  static inline bool dma_direct_possible(struct device *dev, dma_addr_t dma_addr,
> @@ -331,6 +332,7 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
>  		arch_sync_dma_for_device(dev, phys, size, dir);
>  	return dma_addr;
>  }
> +EXPORT_SYMBOL(dma_direct_map_page);
>  
>  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  		enum dma_data_direction dir, unsigned long attrs)
> @@ -352,6 +354,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  	dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
>  	return 0;
>  }
> +EXPORT_SYMBOL(dma_direct_map_sg);
>  
>  /*
>   * Because 32-bit DMA masks are so common we expect every architecture to be
> @@ -372,27 +375,3 @@ int dma_direct_supported(struct device *dev, u64 mask)
>  
>  	return mask >= phys_to_dma(dev, min_mask);
>  }
> -
> -const struct dma_map_ops dma_direct_ops = {
> -	.alloc			= dma_direct_alloc,
> -	.free			= dma_direct_free,
> -	.map_page		= dma_direct_map_page,
> -	.map_sg			= dma_direct_map_sg,
> -#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
> -    defined(CONFIG_SWIOTLB)
> -	.sync_single_for_device	= dma_direct_sync_single_for_device,
> -	.sync_sg_for_device	= dma_direct_sync_sg_for_device,
> -#endif
> -#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> -    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
> -    defined(CONFIG_SWIOTLB)
> -	.sync_single_for_cpu	= dma_direct_sync_single_for_cpu,
> -	.sync_sg_for_cpu	= dma_direct_sync_sg_for_cpu,
> -	.unmap_page		= dma_direct_unmap_page,
> -	.unmap_sg		= dma_direct_unmap_sg,
> -#endif
> -	.get_required_mask	= dma_direct_get_required_mask,
> -	.dma_supported		= dma_direct_supported,
> -	.cache_sync		= arch_dma_cache_sync,
> -};
> -EXPORT_SYMBOL(dma_direct_ops);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 0b18cfbdde95..fc84c81029d9 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -7,6 +7,7 @@
>   */
>  #include <linux/memblock.h> /* for max_pfn */
>  #include <linux/acpi.h>
> +#include <linux/dma-direct.h>
>  #include <linux/dma-noncoherent.h>
>  #include <linux/export.h>
>  #include <linux/gfp.h>
> @@ -229,8 +230,8 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>  		unsigned long attrs)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
> -	BUG_ON(!ops);
> -	if (ops->get_sgtable)
> +
> +	if (!dma_is_direct(ops) && ops->get_sgtable)
>  		return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
>  					attrs);
>  	return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
> @@ -293,8 +294,8 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>  		unsigned long attrs)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
> -	BUG_ON(!ops);
> -	if (ops->mmap)
> +
> +	if (!dma_is_direct(ops) && ops->mmap)
>  		return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
>  	return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
>  }
> @@ -324,6 +325,8 @@ u64 dma_get_required_mask(struct device *dev)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> +	if (dma_is_direct(ops))
> +		return dma_direct_get_required_mask(dev);
>  	if (ops->get_required_mask)
>  		return ops->get_required_mask(dev);
>  	return dma_default_get_required_mask(dev);
> @@ -341,7 +344,6 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  	void *cpu_addr;
>  
> -	BUG_ON(!ops);
>  	WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
>  
>  	if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr))
> @@ -352,10 +354,14 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  
>  	if (!arch_dma_alloc_attrs(&dev))
>  		return NULL;
> -	if (!ops->alloc)
> +
> +	if (dma_is_direct(ops))
> +		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
> +	else if (ops->alloc)
> +		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> +	else
>  		return NULL;
>  
> -	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
>  	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
>  	return cpu_addr;
>  }
> @@ -366,8 +372,6 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	BUG_ON(!ops);
> -
>  	if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr))
>  		return;
>  	/*
> @@ -379,11 +383,14 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>  	 */
>  	WARN_ON(irqs_disabled());
>  
> -	if (!ops->free || !cpu_addr)
> +	if (!cpu_addr)
>  		return;
>  
>  	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
> -	ops->free(dev, size, cpu_addr, dma_handle, attrs);
> +	if (dma_is_direct(ops))
> +		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
> +	else if (ops->free)
> +		ops->free(dev, size, cpu_addr, dma_handle, attrs);
>  }
>  EXPORT_SYMBOL(dma_free_attrs);
>  
> @@ -397,9 +404,9 @@ int dma_supported(struct device *dev, u64 mask)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (!ops)
> -		return 0;
> -	if (!ops->dma_supported)
> +	if (dma_is_direct(ops))
> +		return dma_direct_supported(dev, mask);
> +	if (ops->dma_supported)
>  		return 1;
>  	return ops->dma_supported(dev, mask);
>  }
> @@ -437,7 +444,10 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->cache_sync)
> +
> +	if (dma_is_direct(ops))
> +		arch_dma_cache_sync(dev, vaddr, size, dir);
> +	else if (ops->cache_sync)
>  		ops->cache_sync(dev, vaddr, size, dir);
>  }
>  EXPORT_SYMBOL(dma_cache_sync);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-14 14:11   ` Marek Szyprowski
@ 2018-12-14 14:24     ` Christoph Hellwig
  2018-12-14 14:32       ` Marek Szyprowski
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-14 14:24 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Christoph Hellwig, iommu, Linus Torvalds, Jesper Dangaard Brouer,
	Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel

On Fri, Dec 14, 2018 at 03:11:37PM +0100, Marek Szyprowski wrote:
> Hi Christoph,
> 
> On 2018-12-07 20:07, Christoph Hellwig wrote:
> > Avoid expensive indirect calls in the fast path DMA mapping
> > operations by directly calling the dma_direct_* ops if we are using
> > the directly mapped DMA operations.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> This breaks direct DMA on ARM64 (also todays linux-next). NULL
> dev->dma_ops fallbacks to get_arch_dma_ops(), which in turn returns
> non-functional &dma_dummy_ops on ARM64...

Yeah, fallback from direct (NULL) dev->dma_ops to something else won't
work with NULL as the indicator.

Fortunately we shouldn't even need that thanks to the patch from Robin
that explicitly set the dummy ops where needed.

Can you try the patch below?

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index 273e778f7de2..95dbf3ef735a 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -26,11 +26,7 @@
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-	/*
-	 * We expect no ISA devices, and all other DMA masters are expected to
-	 * have someone call arch_setup_dma_ops at device creation time.
-	 */
-	return &dma_dummy_ops;
+	return NULL;
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-14 14:24     ` Christoph Hellwig
@ 2018-12-14 14:32       ` Marek Szyprowski
  0 siblings, 0 replies; 41+ messages in thread
From: Marek Szyprowski @ 2018-12-14 14:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Keith Busch,
	Jonathan Derrick, linux-pci, linux-ia64, x86, linux-kernel

Hi Christoph,

On 2018-12-14 15:24, Christoph Hellwig wrote:
> On Fri, Dec 14, 2018 at 03:11:37PM +0100, Marek Szyprowski wrote:
>> Hi Christoph,
>>
>> On 2018-12-07 20:07, Christoph Hellwig wrote:
>>> Avoid expensive indirect calls in the fast path DMA mapping
>>> operations by directly calling the dma_direct_* ops if we are using
>>> the directly mapped DMA operations.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> This breaks direct DMA on ARM64 (also todays linux-next). NULL
>> dev->dma_ops fallbacks to get_arch_dma_ops(), which in turn returns
>> non-functional &dma_dummy_ops on ARM64...
> Yeah, fallback from direct (NULL) dev->dma_ops to something else won't
> work with NULL as the indicator.
>
> Fortunately we shouldn't even need that thanks to the patch from Robin
> that explicitly set the dummy ops where needed.
>
> Can you try the patch below?

Yes, it fixes the problem.

> diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
> index 273e778f7de2..95dbf3ef735a 100644
> --- a/arch/arm64/include/asm/dma-mapping.h
> +++ b/arch/arm64/include/asm/dma-mapping.h
> @@ -26,11 +26,7 @@
>  
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
> -	/*
> -	 * We expect no ISA devices, and all other DMA masters are expected to
> -	 * have someone call arch_setup_dma_ops at device creation time.
> -	 */
> -	return &dma_dummy_ops;
> +	return NULL;
>  }
>  
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 13/15] ACPI / scan: Refactor _CCA enforcement
  2018-12-07 19:07 ` [PATCH 13/15] ACPI / scan: Refactor _CCA enforcement Christoph Hellwig
@ 2018-12-14 21:15   ` Bjorn Helgaas
  0 siblings, 0 replies; 41+ messages in thread
From: Bjorn Helgaas @ 2018-12-14 21:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Marek Szyprowski,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel

On Fri, Dec 07, 2018 at 11:07:18AM -0800, Christoph Hellwig wrote:
> From: Robin Murphy <robin.murphy@arm.com>
> 
> Rather than checking the DMA attribute at each callsite, just pass it
> through for acpi_dma_configure() to handle directly. That can then deal
> with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly
> installing dummy DMA ops instead of just skipping setup entirely. This
> will then free up the dev->dma_ops == NULL case for some valuable
> fastpath optimisations.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/acpi/scan.c      | 5 +++++
>  drivers/base/platform.c  | 3 +--
>  drivers/pci/pci-driver.c | 3 +--

Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# drivers/pci part

>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index bd1c59fb0e17..b75ae34ed188 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1456,6 +1456,11 @@ int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr)
>  	const struct iommu_ops *iommu;
>  	u64 dma_addr = 0, size = 0;
>  
> +	if (attr == DEV_DMA_NOT_SUPPORTED) {
> +		set_dma_ops(dev, &dma_dummy_ops);
> +		return 0;
> +	}
> +
>  	iort_dma_setup(dev, &dma_addr, &size);
>  
>  	iommu = iort_iommu_configure(dev);
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index eae841935a45..c1ddf191711e 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1138,8 +1138,7 @@ int platform_dma_configure(struct device *dev)
>  		ret = of_dma_configure(dev, dev->of_node, true);
>  	} else if (has_acpi_companion(dev)) {
>  		attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
> -		if (attr != DEV_DMA_NOT_SUPPORTED)
> -			ret = acpi_dma_configure(dev, attr);
> +		ret = acpi_dma_configure(dev, attr);
>  	}
>  
>  	return ret;
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index bef17c3fca67..1b58e058b13f 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1602,8 +1602,7 @@ static int pci_dma_configure(struct device *dev)
>  		struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
>  		enum dev_dma_attr attr = acpi_get_dma_attr(adev);
>  
> -		if (attr != DEV_DMA_NOT_SUPPORTED)
> -			ret = acpi_dma_configure(dev, attr);
> +		ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
>  	}
>  
>  	pci_put_host_bridge_device(bridge);
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls
  2018-12-07 19:07 ` [PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls Christoph Hellwig
@ 2018-12-14 21:17   ` Bjorn Helgaas
  2018-12-14 21:34     ` Derrick, Jonathan
  0 siblings, 1 reply; 41+ messages in thread
From: Bjorn Helgaas @ 2018-12-14 21:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Marek Szyprowski,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel

Conventional spelling in subject is

  PCI: vmd: Use dma_* APIs instead of direct method calls

On Fri, Dec 07, 2018 at 11:07:19AM -0800, Christoph Hellwig wrote:
> With the bypass support for the direct mapping we might not always have
> methods to call, so use the proper APIs instead.  The only downside is
> that we will create two dma-debug entries for each mapping if
> CONFIG_DMA_DEBUG is enabled.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

You cc'd the VMD maintainers already, and I have no objection to this
from a PCI core point of view, so:

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  drivers/pci/controller/vmd.c | 42 +++++++++++++++---------------------
>  1 file changed, 17 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 98ce79eac128..3890812cdf87 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -307,39 +307,32 @@ static struct device *to_vmd_dev(struct device *dev)
>  	return &vmd->dev->dev;
>  }
>  
> -static const struct dma_map_ops *vmd_dma_ops(struct device *dev)
> -{
> -	return get_dma_ops(to_vmd_dev(dev));
> -}
> -
>  static void *vmd_alloc(struct device *dev, size_t size, dma_addr_t *addr,
>  		       gfp_t flag, unsigned long attrs)
>  {
> -	return vmd_dma_ops(dev)->alloc(to_vmd_dev(dev), size, addr, flag,
> -				       attrs);
> +	return dma_alloc_attrs(to_vmd_dev(dev), size, addr, flag, attrs);
>  }
>  
>  static void vmd_free(struct device *dev, size_t size, void *vaddr,
>  		     dma_addr_t addr, unsigned long attrs)
>  {
> -	return vmd_dma_ops(dev)->free(to_vmd_dev(dev), size, vaddr, addr,
> -				      attrs);
> +	return dma_free_attrs(to_vmd_dev(dev), size, vaddr, addr, attrs);
>  }
>  
>  static int vmd_mmap(struct device *dev, struct vm_area_struct *vma,
>  		    void *cpu_addr, dma_addr_t addr, size_t size,
>  		    unsigned long attrs)
>  {
> -	return vmd_dma_ops(dev)->mmap(to_vmd_dev(dev), vma, cpu_addr, addr,
> -				      size, attrs);
> +	return dma_mmap_attrs(to_vmd_dev(dev), vma, cpu_addr, addr, size,
> +			attrs);
>  }
>  
>  static int vmd_get_sgtable(struct device *dev, struct sg_table *sgt,
>  			   void *cpu_addr, dma_addr_t addr, size_t size,
>  			   unsigned long attrs)
>  {
> -	return vmd_dma_ops(dev)->get_sgtable(to_vmd_dev(dev), sgt, cpu_addr,
> -					     addr, size, attrs);
> +	return dma_get_sgtable_attrs(to_vmd_dev(dev), sgt, cpu_addr, addr, size,
> +			attrs);
>  }
>  
>  static dma_addr_t vmd_map_page(struct device *dev, struct page *page,
> @@ -347,61 +340,60 @@ static dma_addr_t vmd_map_page(struct device *dev, struct page *page,
>  			       enum dma_data_direction dir,
>  			       unsigned long attrs)
>  {
> -	return vmd_dma_ops(dev)->map_page(to_vmd_dev(dev), page, offset, size,
> -					  dir, attrs);
> +	return dma_map_page_attrs(to_vmd_dev(dev), page, offset, size, dir,
> +			attrs);
>  }
>  
>  static void vmd_unmap_page(struct device *dev, dma_addr_t addr, size_t size,
>  			   enum dma_data_direction dir, unsigned long attrs)
>  {
> -	vmd_dma_ops(dev)->unmap_page(to_vmd_dev(dev), addr, size, dir, attrs);
> +	dma_unmap_page_attrs(to_vmd_dev(dev), addr, size, dir, attrs);
>  }
>  
>  static int vmd_map_sg(struct device *dev, struct scatterlist *sg, int nents,
>  		      enum dma_data_direction dir, unsigned long attrs)
>  {
> -	return vmd_dma_ops(dev)->map_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
> +	return dma_map_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
>  }
>  
>  static void vmd_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
>  			 enum dma_data_direction dir, unsigned long attrs)
>  {
> -	vmd_dma_ops(dev)->unmap_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
> +	dma_unmap_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
>  }
>  
>  static void vmd_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>  				    size_t size, enum dma_data_direction dir)
>  {
> -	vmd_dma_ops(dev)->sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
> +	dma_sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
>  }
>  
>  static void vmd_sync_single_for_device(struct device *dev, dma_addr_t addr,
>  				       size_t size, enum dma_data_direction dir)
>  {
> -	vmd_dma_ops(dev)->sync_single_for_device(to_vmd_dev(dev), addr, size,
> -						 dir);
> +	dma_sync_single_for_device(to_vmd_dev(dev), addr, size, dir);
>  }
>  
>  static void vmd_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>  				int nents, enum dma_data_direction dir)
>  {
> -	vmd_dma_ops(dev)->sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir);
> +	dma_sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir);
>  }
>  
>  static void vmd_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>  				   int nents, enum dma_data_direction dir)
>  {
> -	vmd_dma_ops(dev)->sync_sg_for_device(to_vmd_dev(dev), sg, nents, dir);
> +	dma_sync_sg_for_device(to_vmd_dev(dev), sg, nents, dir);
>  }
>  
>  static int vmd_dma_supported(struct device *dev, u64 mask)
>  {
> -	return vmd_dma_ops(dev)->dma_supported(to_vmd_dev(dev), mask);
> +	return dma_supported(to_vmd_dev(dev), mask);
>  }
>  
>  static u64 vmd_get_required_mask(struct device *dev)
>  {
> -	return vmd_dma_ops(dev)->get_required_mask(to_vmd_dev(dev));
> +	return dma_get_required_mask(to_vmd_dev(dev));
>  }
>  
>  static void vmd_teardown_dma_ops(struct vmd_dev *vmd)
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls
  2018-12-14 21:17   ` Bjorn Helgaas
@ 2018-12-14 21:34     ` Derrick, Jonathan
  0 siblings, 0 replies; 41+ messages in thread
From: Derrick, Jonathan @ 2018-12-14 21:34 UTC (permalink / raw)
  To: hch, helgaas
  Cc: linux-kernel, torvalds, brouer, Yu, Fenghua, tariqt, toke,
	konrad.wilk, m.szyprowski, iommu, linux-ia64, robin.murphy,
	ilias.apalodimas, x86, Luck, Tony, linux-pci, Busch, Keith


[-- Attachment #1: Type: text/plain, Size: 5993 bytes --]

Looks good to me
Thanks Christoph

Acked-by: Jon Derrick <jonathan.derrick@intel.com>

On Fri, 2018-12-14 at 15:17 -0600, Bjorn Helgaas wrote:
> Conventional spelling in subject is
> 
>   PCI: vmd: Use dma_* APIs instead of direct method calls
> 
> On Fri, Dec 07, 2018 at 11:07:19AM -0800, Christoph Hellwig wrote:
> > With the bypass support for the direct mapping we might not always have
> > methods to call, so use the proper APIs instead.  The only downside is
> > that we will create two dma-debug entries for each mapping if
> > CONFIG_DMA_DEBUG is enabled.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> You cc'd the VMD maintainers already, and I have no objection to this
> from a PCI core point of view, so:
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> > ---
> >  drivers/pci/controller/vmd.c | 42 +++++++++++++++---------------------
> >  1 file changed, 17 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > index 98ce79eac128..3890812cdf87 100644
> > --- a/drivers/pci/controller/vmd.c
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -307,39 +307,32 @@ static struct device *to_vmd_dev(struct device *dev)
> >  	return &vmd->dev->dev;
> >  }
> >  
> > -static const struct dma_map_ops *vmd_dma_ops(struct device *dev)
> > -{
> > -	return get_dma_ops(to_vmd_dev(dev));
> > -}
> > -
> >  static void *vmd_alloc(struct device *dev, size_t size, dma_addr_t *addr,
> >  		       gfp_t flag, unsigned long attrs)
> >  {
> > -	return vmd_dma_ops(dev)->alloc(to_vmd_dev(dev), size, addr, flag,
> > -				       attrs);
> > +	return dma_alloc_attrs(to_vmd_dev(dev), size, addr, flag, attrs);
> >  }
> >  
> >  static void vmd_free(struct device *dev, size_t size, void *vaddr,
> >  		     dma_addr_t addr, unsigned long attrs)
> >  {
> > -	return vmd_dma_ops(dev)->free(to_vmd_dev(dev), size, vaddr, addr,
> > -				      attrs);
> > +	return dma_free_attrs(to_vmd_dev(dev), size, vaddr, addr, attrs);
> >  }
> >  
> >  static int vmd_mmap(struct device *dev, struct vm_area_struct *vma,
> >  		    void *cpu_addr, dma_addr_t addr, size_t size,
> >  		    unsigned long attrs)
> >  {
> > -	return vmd_dma_ops(dev)->mmap(to_vmd_dev(dev), vma, cpu_addr, addr,
> > -				      size, attrs);
> > +	return dma_mmap_attrs(to_vmd_dev(dev), vma, cpu_addr, addr, size,
> > +			attrs);
> >  }
> >  
> >  static int vmd_get_sgtable(struct device *dev, struct sg_table *sgt,
> >  			   void *cpu_addr, dma_addr_t addr, size_t size,
> >  			   unsigned long attrs)
> >  {
> > -	return vmd_dma_ops(dev)->get_sgtable(to_vmd_dev(dev), sgt, cpu_addr,
> > -					     addr, size, attrs);
> > +	return dma_get_sgtable_attrs(to_vmd_dev(dev), sgt, cpu_addr, addr, size,
> > +			attrs);
> >  }
> >  
> >  static dma_addr_t vmd_map_page(struct device *dev, struct page *page,
> > @@ -347,61 +340,60 @@ static dma_addr_t vmd_map_page(struct device *dev, struct page *page,
> >  			       enum dma_data_direction dir,
> >  			       unsigned long attrs)
> >  {
> > -	return vmd_dma_ops(dev)->map_page(to_vmd_dev(dev), page, offset, size,
> > -					  dir, attrs);
> > +	return dma_map_page_attrs(to_vmd_dev(dev), page, offset, size, dir,
> > +			attrs);
> >  }
> >  
> >  static void vmd_unmap_page(struct device *dev, dma_addr_t addr, size_t size,
> >  			   enum dma_data_direction dir, unsigned long attrs)
> >  {
> > -	vmd_dma_ops(dev)->unmap_page(to_vmd_dev(dev), addr, size, dir, attrs);
> > +	dma_unmap_page_attrs(to_vmd_dev(dev), addr, size, dir, attrs);
> >  }
> >  
> >  static int vmd_map_sg(struct device *dev, struct scatterlist *sg, int nents,
> >  		      enum dma_data_direction dir, unsigned long attrs)
> >  {
> > -	return vmd_dma_ops(dev)->map_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
> > +	return dma_map_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
> >  }
> >  
> >  static void vmd_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
> >  			 enum dma_data_direction dir, unsigned long attrs)
> >  {
> > -	vmd_dma_ops(dev)->unmap_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
> > +	dma_unmap_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
> >  }
> >  
> >  static void vmd_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
> >  				    size_t size, enum dma_data_direction dir)
> >  {
> > -	vmd_dma_ops(dev)->sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
> > +	dma_sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
> >  }
> >  
> >  static void vmd_sync_single_for_device(struct device *dev, dma_addr_t addr,
> >  				       size_t size, enum dma_data_direction dir)
> >  {
> > -	vmd_dma_ops(dev)->sync_single_for_device(to_vmd_dev(dev), addr, size,
> > -						 dir);
> > +	dma_sync_single_for_device(to_vmd_dev(dev), addr, size, dir);
> >  }
> >  
> >  static void vmd_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
> >  				int nents, enum dma_data_direction dir)
> >  {
> > -	vmd_dma_ops(dev)->sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir);
> > +	dma_sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir);
> >  }
> >  
> >  static void vmd_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
> >  				   int nents, enum dma_data_direction dir)
> >  {
> > -	vmd_dma_ops(dev)->sync_sg_for_device(to_vmd_dev(dev), sg, nents, dir);
> > +	dma_sync_sg_for_device(to_vmd_dev(dev), sg, nents, dir);
> >  }
> >  
> >  static int vmd_dma_supported(struct device *dev, u64 mask)
> >  {
> > -	return vmd_dma_ops(dev)->dma_supported(to_vmd_dev(dev), mask);
> > +	return dma_supported(to_vmd_dev(dev), mask);
> >  }
> >  
> >  static u64 vmd_get_required_mask(struct device *dev)
> >  {
> > -	return vmd_dma_ops(dev)->get_required_mask(to_vmd_dev(dev));
> > +	return dma_get_required_mask(to_vmd_dev(dev));
> >  }
> >  
> >  static void vmd_teardown_dma_ops(struct vmd_dev *vmd)
> > -- 
> > 2.19.1
> > 

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3278 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-07 19:07 ` [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct Christoph Hellwig
  2018-12-14 14:11   ` Marek Szyprowski
@ 2018-12-15 17:46   ` Guenter Roeck
  2018-12-16  9:02     ` Christoph Hellwig
  2018-12-18 20:34   ` Guillaume Tucker
  2018-12-20 16:44   ` [PATCH 15/15] " Thierry Reding
  3 siblings, 1 reply; 41+ messages in thread
From: Guenter Roeck @ 2018-12-15 17:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Marek Szyprowski,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel

Hi,

On Fri, Dec 07, 2018 at 11:07:20AM -0800, Christoph Hellwig wrote:
> Avoid expensive indirect calls in the fast path DMA mapping
> operations by directly calling the dma_direct_* ops if we are using
> the directly mapped DMA operations.
> 

This patch results in arm64 boot failures. Reverting the patch fixes
the problem. Bisect results are attached. Per logs, the system fails
to instantiate the root device. Examples from two logs:

[   22.843080] nvme nvme0: pci function 0000:00:02.0
[   22.853820] nvme 0000:00:02.0: enabling device (0000 -> 0002)
[   22.884178] nvme nvme0: Removing after probe failure status: -12

[   15.451963] xhci_hcd 0000:00:02.0: xHCI Host Controller
[   15.453294] xhci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 1
[   15.456042] xhci_hcd 0000:00:02.0: can't setup: -12
[   15.457003] xhci_hcd 0000:00:02.0: USB bus 1 deregistered
[   15.458340] xhci_hcd 0000:00:02.0: init 0000:00:02.0 fail, -12
[   15.458825] xhci_hcd: probe of 0000:00:02.0 failed with error -12

Guenter

---
# bad: [d14b746c6c1ca310f679ef13f661587454e2c588] Add linux-next specific files for 20181214
# good: [40e020c129cfc991e8ab4736d2665351ffd1468d] Linux 4.20-rc6
git bisect start 'HEAD' 'v4.20-rc6'
# bad: [ddfdda7f7d1ebdca0851f30a814e76749f08be99] Merge remote-tracking branch 'spi-nor/spi-nor/next'
git bisect bad ddfdda7f7d1ebdca0851f30a814e76749f08be99
# bad: [466d2f8b964745cc8db7f126607e19526385f2d5] Merge remote-tracking branch 'file-locks/locks-next'
git bisect bad 466d2f8b964745cc8db7f126607e19526385f2d5
# bad: [c43abf670f074a3eba2eebf9568ba95b2fe57f00] Merge remote-tracking branch 'arm-soc/for-next'
git bisect bad c43abf670f074a3eba2eebf9568ba95b2fe57f00
# good: [e4337d9d50eb940a25d3808ef76bb0eaa61a0146] Merge branch 'next/dt' into for-next
git bisect good e4337d9d50eb940a25d3808ef76bb0eaa61a0146
# bad: [32d851d8e81b1152d3e663b6c0b318474d649098] Merge remote-tracking branch 'dma-mapping/for-next'
git bisect bad 32d851d8e81b1152d3e663b6c0b318474d649098
# good: [32550839013d8e72d35c1cc0a756c818d7f9ae32] Merge remote-tracking branch 'scsi-fixes/fixes'
git bisect good 32550839013d8e72d35c1cc0a756c818d7f9ae32
# good: [8ea3ac17b6557f30697c624d1cd4ff2b30af82e1] Merge remote-tracking branch 'kbuild/for-next'
git bisect good 8ea3ac17b6557f30697c624d1cd4ff2b30af82e1
# good: [ad78dee0b630527bdfed809d1f5ed95c601886ae] dma-debug: Batch dma_debug_entry allocation
git bisect good ad78dee0b630527bdfed809d1f5ed95c601886ae
# good: [55897af63091ebc2c3f239c6a6666f748113ac50] dma-direct: merge swiotlb_dma_ops into the dma_direct code
git bisect good 55897af63091ebc2c3f239c6a6666f748113ac50
# good: [7d32be2e5abb2d88cf321357178d05c461b1cc83] leaking_addresses: do not parse binary files
git bisect good 7d32be2e5abb2d88cf321357178d05c461b1cc83
# good: [9db33987ee2e5abb32a40dca44a2953391786833] leaking_addresses: remove version number
git bisect good 9db33987ee2e5abb32a40dca44a2953391786833
# good: [7fd0d1346c1f96371a9a4996a590b86d570098f9] Merge remote-tracking branch 'leaks/leaks-next'
git bisect good 7fd0d1346c1f96371a9a4996a590b86d570098f9
# bad: [356da6d0cde3323236977fce54c1f9612a742036] dma-mapping: bypass indirect calls for dma-direct
git bisect bad 356da6d0cde3323236977fce54c1f9612a742036
# good: [190d4e5916a2d70a11009022b968fca948fb5dc7] vmd: use the proper dma_* APIs instead of direct methods calls
git bisect good 190d4e5916a2d70a11009022b968fca948fb5dc7
# first bad commit: [356da6d0cde3323236977fce54c1f9612a742036] dma-mapping: bypass indirect calls for dma-direct

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-15 17:46   ` [15/15] " Guenter Roeck
@ 2018-12-16  9:02     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-16  9:02 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Christoph Hellwig, iommu, Linus Torvalds, Jesper Dangaard Brouer,
	Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

On Sat, Dec 15, 2018 at 09:46:54AM -0800, Guenter Roeck wrote:
> Hi,
> 
> On Fri, Dec 07, 2018 at 11:07:20AM -0800, Christoph Hellwig wrote:
> > Avoid expensive indirect calls in the fast path DMA mapping
> > operations by directly calling the dma_direct_* ops if we are using
> > the directly mapped DMA operations.
> > 
> 
> This patch results in arm64 boot failures. Reverting the patch fixes
> the problem. Bisect results are attached. Per logs, the system fails
> to instantiate the root device. Examples from two logs:

This patch should fix it, it has already been sent out Friday and
I'm just waiting for an ACK from the arm64 maintainers before
applying it:

--
From e6dc770875bc551ff77fe5673f5dd3c8ac242536 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Fri, 14 Dec 2018 15:18:08 +0100
Subject: arm64: default to the direct mapping in get_arch_dma_ops

Otherwise the direct mapping won't work at all given that a NULL
dev->dma_ops causes a fallback.  Note that we already explicitly set
dev->dma_ops to dma_dummy_ops for dma-incapable devices, so this
fallback should not be needed anyway.

Fixes: 356da6d0cd ("dma-mapping: bypass indirect calls for dma-direct")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm64/include/asm/dma-mapping.h | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index 273e778f7de2..95dbf3ef735a 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -26,11 +26,7 @@
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-	/*
-	 * We expect no ISA devices, and all other DMA masters are expected to
-	 * have someone call arch_setup_dma_ops at device creation time.
-	 */
-	return &dma_dummy_ops;
+	return NULL;
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-- 
2.19.2


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-07 19:07 ` [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct Christoph Hellwig
  2018-12-14 14:11   ` Marek Szyprowski
  2018-12-15 17:46   ` [15/15] " Guenter Roeck
@ 2018-12-18 20:34   ` Guillaume Tucker
  2018-12-18 20:42     ` Robin Murphy
  2018-12-20 16:44   ` [PATCH 15/15] " Thierry Reding
  3 siblings, 1 reply; 41+ messages in thread
From: Guillaume Tucker @ 2018-12-18 20:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Marek Szyprowski,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel, ezequiel Garcia, linux-arm-kernel

On 07/12/2018 19:07, Christoph Hellwig wrote:
> Avoid expensive indirect calls in the fast path DMA mapping
> operations by directly calling the dma_direct_* ops if we are using
> the directly mapped DMA operations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

I've run a semi-automated bisection on kernelci.org and found that this
patch appeared to cause some regressions in linux-next on the
rk3399-gru-kevin arm64 platform.  The bisection was run between
next-20181128 and its merge base in mainline master (6531e115b7ab) with
a plain defconfig.


The problems seem to start with this message:

[    3.242163] mmc1: Unable to allocate ADMA buffers - falling back to standard DMA

then we can see this kind of warnings:

[    3.424261] mmc1: asked for transfer of 512 bytes exceeds bounce buffer 0 bytes
[    3.432488] WARNING: CPU: 3 PID: 1596 at ../drivers/mmc/host/sdhci.c:1050 sdhci_send_command+0x8f0/0xfe8

see also:

[   16.046084] rk_iommu ff8f3f00.iommu: DMA map error for DT


The full kernel log is available here:

  https://lava.collabora.co.uk/scheduler/job/1395093


Reverting this patch makes the errors go away, but I haven't done any
further investigation so the actual problem may well lie somewhere else.

Hope this helps!

Best wishes,
Guillaume

> ---
>  arch/alpha/include/asm/dma-mapping.h |   2 +-
>  arch/arc/mm/cache.c                  |   2 +-
>  arch/arm/include/asm/dma-mapping.h   |   2 +-
>  arch/arm/mm/dma-mapping-nommu.c      |  14 +---
>  arch/arm64/mm/dma-mapping.c          |   3 -
>  arch/ia64/hp/common/hwsw_iommu.c     |   2 +-
>  arch/ia64/hp/common/sba_iommu.c      |   4 +-
>  arch/ia64/kernel/dma-mapping.c       |   1 -
>  arch/mips/include/asm/dma-mapping.h  |   2 +-
>  arch/parisc/kernel/setup.c           |   4 -
>  arch/sparc/include/asm/dma-mapping.h |   4 +-
>  arch/x86/kernel/pci-dma.c            |   2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   2 +-
>  drivers/iommu/amd_iommu.c            |  13 +---
>  include/asm-generic/dma-mapping.h    |   2 +-
>  include/linux/dma-direct.h           |  17 ----
>  include/linux/dma-mapping.h          | 111 +++++++++++++++++++++++----
>  include/linux/dma-noncoherent.h      |   5 +-
>  kernel/dma/direct.c                  |  37 ++-------
>  kernel/dma/mapping.c                 |  40 ++++++----
>  20 files changed, 150 insertions(+), 119 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/dma-mapping.h b/arch/alpha/include/asm/dma-mapping.h
> index 8beeafd4f68e..0ee6a5c99b16 100644
> --- a/arch/alpha/include/asm/dma-mapping.h
> +++ b/arch/alpha/include/asm/dma-mapping.h
> @@ -7,7 +7,7 @@ extern const struct dma_map_ops alpha_pci_ops;
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
>  #ifdef CONFIG_ALPHA_JENSEN
> -	return &dma_direct_ops;
> +	return NULL;
>  #else
>  	return &alpha_pci_ops;
>  #endif
> diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
> index f2701c13a66b..e188bb3ede53 100644
> --- a/arch/arc/mm/cache.c
> +++ b/arch/arc/mm/cache.c
> @@ -1280,7 +1280,7 @@ void __init arc_cache_init_master(void)
>  	/*
>  	 * In case of IOC (say IOC+SLC case), pointers above could still be set
>  	 * but end up not being relevant as the first function in chain is not
> -	 * called at all for @dma_direct_ops
> +	 * called at all for devices using coherent DMA.
>  	 *     arch_sync_dma_for_cpu() -> dma_cache_*() -> __dma_cache_*()
>  	 */
>  }
> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 965b7c846ecb..31d3b96f0f4b 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
>  
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
> -	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_direct_ops;
> +	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : NULL;
>  }
>  
>  #ifdef __arch_page_to_dma
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> index 712416ecd8e6..f304b10e23a4 100644
> --- a/arch/arm/mm/dma-mapping-nommu.c
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -22,7 +22,7 @@
>  #include "dma.h"
>  
>  /*
> - *  dma_direct_ops is used if
> + *  The generic direct mapping code is used if
>   *   - MMU/MPU is off
>   *   - cpu is v7m w/o cache support
>   *   - device is coherent
> @@ -209,16 +209,9 @@ const struct dma_map_ops arm_nommu_dma_ops = {
>  };
>  EXPORT_SYMBOL(arm_nommu_dma_ops);
>  
> -static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
> -{
> -	return coherent ? &dma_direct_ops : &arm_nommu_dma_ops;
> -}
> -
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  			const struct iommu_ops *iommu, bool coherent)
>  {
> -	const struct dma_map_ops *dma_ops;
> -
>  	if (IS_ENABLED(CONFIG_CPU_V7M)) {
>  		/*
>  		 * Cache support for v7m is optional, so can be treated as
> @@ -234,7 +227,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  		dev->archdata.dma_coherent = (get_cr() & CR_M) ? coherent : true;
>  	}
>  
> -	dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
> -
> -	set_dma_ops(dev, dma_ops);
> +	if (!dev->archdata.dma_coherent)
> +		set_dma_ops(dev, &arm_nommu_dma_ops);
>  }
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index ab1e417204d0..95eda81e3f2d 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -462,9 +462,6 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  			const struct iommu_ops *iommu, bool coherent)
>  {
> -	if (!dev->dma_ops)
> -		dev->dma_ops = &dma_direct_ops;
> -
>  	dev->dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
>  
> diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
> index f40ca499b246..8840ed97712f 100644
> --- a/arch/ia64/hp/common/hwsw_iommu.c
> +++ b/arch/ia64/hp/common/hwsw_iommu.c
> @@ -38,7 +38,7 @@ static inline int use_swiotlb(struct device *dev)
>  const struct dma_map_ops *hwsw_dma_get_ops(struct device *dev)
>  {
>  	if (use_swiotlb(dev))
> -		return &dma_direct_ops;
> +		return NULL;
>  	return &sba_dma_ops;
>  }
>  EXPORT_SYMBOL(hwsw_dma_get_ops);
> diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
> index 5ee74820a0f6..5a361e51cb1e 100644
> --- a/arch/ia64/hp/common/sba_iommu.c
> +++ b/arch/ia64/hp/common/sba_iommu.c
> @@ -2078,7 +2078,7 @@ sba_init(void)
>  	 * a successful kdump kernel boot is to use the swiotlb.
>  	 */
>  	if (is_kdump_kernel()) {
> -		dma_ops = &dma_direct_ops;
> +		dma_ops = NULL;
>  		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
>  			panic("Unable to initialize software I/O TLB:"
>  				  " Try machvec=dig boot option");
> @@ -2100,7 +2100,7 @@ sba_init(void)
>  		 * If we didn't find something sba_iommu can claim, we
>  		 * need to setup the swiotlb and switch to the dig machvec.
>  		 */
> -		dma_ops = &dma_direct_ops;
> +		dma_ops = NULL;
>  		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
>  			panic("Unable to find SBA IOMMU or initialize "
>  			      "software I/O TLB: Try machvec=dig boot option");
> diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
> index 80cd3e1ea95a..ad7d9963de34 100644
> --- a/arch/ia64/kernel/dma-mapping.c
> +++ b/arch/ia64/kernel/dma-mapping.c
> @@ -36,7 +36,6 @@ long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
>  
>  void __init swiotlb_dma_init(void)
>  {
> -	dma_ops = &dma_direct_ops;
>  	swiotlb_init(1);
>  }
>  #endif
> diff --git a/arch/mips/include/asm/dma-mapping.h b/arch/mips/include/asm/dma-mapping.h
> index 69f914667f3e..20dfaad3a55d 100644
> --- a/arch/mips/include/asm/dma-mapping.h
> +++ b/arch/mips/include/asm/dma-mapping.h
> @@ -11,7 +11,7 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  #if defined(CONFIG_MACH_JAZZ)
>  	return &jazz_dma_ops;
>  #else
> -	return &dma_direct_ops;
> +	return NULL;
>  #endif
>  }
>  
> diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
> index cd227f1cf629..54818cd78bd0 100644
> --- a/arch/parisc/kernel/setup.c
> +++ b/arch/parisc/kernel/setup.c
> @@ -99,10 +99,6 @@ void __init dma_ops_init(void)
>  
>  	case pcxl2:
>  		pa7300lc_init();
> -	case pcxl: /* falls through */
> -	case pcxs:
> -	case pcxt:
> -		hppa_dma_ops = &dma_direct_ops;
>  		break;
>  	default:
>  		break;
> diff --git a/arch/sparc/include/asm/dma-mapping.h b/arch/sparc/include/asm/dma-mapping.h
> index b0bb2fcaf1c9..59f5a0f17316 100644
> --- a/arch/sparc/include/asm/dma-mapping.h
> +++ b/arch/sparc/include/asm/dma-mapping.h
> @@ -14,11 +14,11 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
>  #ifdef CONFIG_SPARC_LEON
>  	if (sparc_cpu_model == sparc_leon)
> -		return &dma_direct_ops;
> +		return NULL;
>  #endif
>  #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
>  	if (bus == &pci_bus_type)
> -		return &dma_direct_ops;
> +		return NULL;
>  #endif
>  	return dma_ops;
>  }
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index f4562fcec681..d460998ae828 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -17,7 +17,7 @@
>  
>  static bool disable_dac_quirk __read_mostly;
>  
> -const struct dma_map_ops *dma_ops = &dma_direct_ops;
> +const struct dma_map_ops *dma_ops;
>  EXPORT_SYMBOL(dma_ops);
>  
>  #ifdef CONFIG_IOMMU_DEBUG
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 61a84b958d67..50637f372e9f 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -581,7 +581,7 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
>  
>  	dev_priv->map_mode = vmw_dma_map_populate;
>  
> -	if (dma_ops->sync_single_for_cpu)
> +	if (dma_ops && dma_ops->sync_single_for_cpu)
>  		dev_priv->map_mode = vmw_dma_alloc_coherent;
>  #ifdef CONFIG_SWIOTLB
>  	if (swiotlb_nr_tbl() == 0)
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index c5d6c7c42b0a..567221cca13c 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2184,7 +2184,7 @@ static int amd_iommu_add_device(struct device *dev)
>  				dev_name(dev));
>  
>  		iommu_ignore_device(dev);
> -		dev->dma_ops = &dma_direct_ops;
> +		dev->dma_ops = NULL;
>  		goto out;
>  	}
>  	init_iommu_group(dev);
> @@ -2770,17 +2770,6 @@ int __init amd_iommu_init_dma_ops(void)
>  	swiotlb        = (iommu_pass_through || sme_me_mask) ? 1 : 0;
>  	iommu_detected = 1;
>  
> -	/*
> -	 * In case we don't initialize SWIOTLB (actually the common case
> -	 * when AMD IOMMU is enabled and SME is not active), make sure there
> -	 * are global dma_ops set as a fall-back for devices not handled by
> -	 * this driver (for example non-PCI devices). When SME is active,
> -	 * make sure that swiotlb variable remains set so the global dma_ops
> -	 * continue to be SWIOTLB.
> -	 */
> -	if (!swiotlb)
> -		dma_ops = &dma_direct_ops;
> -
>  	if (amd_iommu_unmap_flush)
>  		pr_info("AMD-Vi: IO/TLB flush on unmap enabled\n");
>  	else
> diff --git a/include/asm-generic/dma-mapping.h b/include/asm-generic/dma-mapping.h
> index 880a292d792f..c13f46109e88 100644
> --- a/include/asm-generic/dma-mapping.h
> +++ b/include/asm-generic/dma-mapping.h
> @@ -4,7 +4,7 @@
>  
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>  {
> -	return &dma_direct_ops;
> +	return NULL;
>  }
>  
>  #endif /* _ASM_GENERIC_DMA_MAPPING_H */
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index 3b0a3ea3876d..b7338702592a 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -60,22 +60,5 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
>  struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>  		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
>  void __dma_direct_free_pages(struct device *dev, size_t size, struct page *page);
> -dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
> -		unsigned long offset, size_t size, enum dma_data_direction dir,
> -		unsigned long attrs);
> -void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> -		size_t size, enum dma_data_direction dir, unsigned long attrs);
> -int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
> -		enum dma_data_direction dir, unsigned long attrs);
> -void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> -		int nents, enum dma_data_direction dir, unsigned long attrs);
> -void dma_direct_sync_single_for_device(struct device *dev,
> -		dma_addr_t addr, size_t size, enum dma_data_direction dir);
> -void dma_direct_sync_sg_for_device(struct device *dev,
> -		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
> -void dma_direct_sync_single_for_cpu(struct device *dev,
> -		dma_addr_t addr, size_t size, enum dma_data_direction dir);
> -void dma_direct_sync_sg_for_cpu(struct device *dev,
> -		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
>  int dma_direct_supported(struct device *dev, u64 mask);
>  #endif /* _LINUX_DMA_DIRECT_H */
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 269ee27fc3d9..f422aec0f53c 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -134,7 +134,6 @@ struct dma_map_ops {
>  
>  #define DMA_MAPPING_ERROR		(~(dma_addr_t)0)
>  
> -extern const struct dma_map_ops dma_direct_ops;
>  extern const struct dma_map_ops dma_virt_ops;
>  extern const struct dma_map_ops dma_dummy_ops;
>  
> @@ -222,6 +221,69 @@ static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
>  }
>  #endif
>  
> +static inline bool dma_is_direct(const struct dma_map_ops *ops)
> +{
> +	return likely(!ops);
> +}
> +
> +/*
> + * All the dma_direct_* declarations are here just for the indirect call bypass,
> + * and must not be used directly drivers!
> + */
> +dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
> +		unsigned long offset, size_t size, enum dma_data_direction dir,
> +		unsigned long attrs);
> +int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
> +		enum dma_data_direction dir, unsigned long attrs);
> +
> +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
> +    defined(CONFIG_SWIOTLB)
> +void dma_direct_sync_single_for_device(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir);
> +void dma_direct_sync_sg_for_device(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
> +#else
> +static inline void dma_direct_sync_single_for_device(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
> +{
> +}
> +static inline void dma_direct_sync_sg_for_device(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
> +{
> +}
> +#endif
> +
> +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> +    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
> +    defined(CONFIG_SWIOTLB)
> +void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> +		size_t size, enum dma_data_direction dir, unsigned long attrs);
> +void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> +		int nents, enum dma_data_direction dir, unsigned long attrs);
> +void dma_direct_sync_single_for_cpu(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir);
> +void dma_direct_sync_sg_for_cpu(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
> +#else
> +static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
> +		size_t size, enum dma_data_direction dir, unsigned long attrs)
> +{
> +}
> +static inline void dma_direct_unmap_sg(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir,
> +		unsigned long attrs)
> +{
> +}
> +static inline void dma_direct_sync_single_for_cpu(struct device *dev,
> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
> +{
> +}
> +static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
> +{
> +}
> +#endif
> +
>  static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>  					      size_t size,
>  					      enum dma_data_direction dir,
> @@ -232,9 +294,12 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  	debug_dma_map_single(dev, ptr, size);
> -	addr = ops->map_page(dev, virt_to_page(ptr),
> -			     offset_in_page(ptr), size,
> -			     dir, attrs);
> +	if (dma_is_direct(ops))
> +		addr = dma_direct_map_page(dev, virt_to_page(ptr),
> +				offset_in_page(ptr), size, dir, attrs);
> +	else
> +		addr = ops->map_page(dev, virt_to_page(ptr),
> +				offset_in_page(ptr), size, dir, attrs);
>  	debug_dma_map_page(dev, virt_to_page(ptr),
>  			   offset_in_page(ptr), size,
>  			   dir, addr, true);
> @@ -249,7 +314,9 @@ static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->unmap_page)
> +	if (dma_is_direct(ops))
> +		dma_direct_unmap_page(dev, addr, size, dir, attrs);
> +	else if (ops->unmap_page)
>  		ops->unmap_page(dev, addr, size, dir, attrs);
>  	debug_dma_unmap_page(dev, addr, size, dir, true);
>  }
> @@ -272,7 +339,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>  	int ents;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	ents = ops->map_sg(dev, sg, nents, dir, attrs);
> +	if (dma_is_direct(ops))
> +		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
> +	else
> +		ents = ops->map_sg(dev, sg, nents, dir, attrs);
>  	BUG_ON(ents < 0);
>  	debug_dma_map_sg(dev, sg, nents, ents, dir);
>  
> @@ -287,7 +357,9 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  	debug_dma_unmap_sg(dev, sg, nents, dir);
> -	if (ops->unmap_sg)
> +	if (dma_is_direct(ops))
> +		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
> +	else if (ops->unmap_sg)
>  		ops->unmap_sg(dev, sg, nents, dir, attrs);
>  }
>  
> @@ -301,7 +373,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>  	dma_addr_t addr;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	addr = ops->map_page(dev, page, offset, size, dir, attrs);
> +	if (dma_is_direct(ops))
> +		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
> +	else
> +		addr = ops->map_page(dev, page, offset, size, dir, attrs);
>  	debug_dma_map_page(dev, page, offset, size, dir, addr, false);
>  
>  	return addr;
> @@ -322,7 +397,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>  	BUG_ON(pfn_valid(PHYS_PFN(phys_addr)));
>  
>  	addr = phys_addr;
> -	if (ops->map_resource)
> +	if (ops && ops->map_resource)
>  		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
>  
>  	debug_dma_map_resource(dev, phys_addr, size, dir, addr);
> @@ -337,7 +412,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->unmap_resource)
> +	if (ops && ops->unmap_resource)
>  		ops->unmap_resource(dev, addr, size, dir, attrs);
>  	debug_dma_unmap_resource(dev, addr, size, dir);
>  }
> @@ -349,7 +424,9 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->sync_single_for_cpu)
> +	if (dma_is_direct(ops))
> +		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
> +	else if (ops->sync_single_for_cpu)
>  		ops->sync_single_for_cpu(dev, addr, size, dir);
>  	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
>  }
> @@ -368,7 +445,9 @@ static inline void dma_sync_single_for_device(struct device *dev,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->sync_single_for_device)
> +	if (dma_is_direct(ops))
> +		dma_direct_sync_single_for_device(dev, addr, size, dir);
> +	else if (ops->sync_single_for_device)
>  		ops->sync_single_for_device(dev, addr, size, dir);
>  	debug_dma_sync_single_for_device(dev, addr, size, dir);
>  }
> @@ -387,7 +466,9 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->sync_sg_for_cpu)
> +	if (dma_is_direct(ops))
> +		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
> +	else if (ops->sync_sg_for_cpu)
>  		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
>  	debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
>  }
> @@ -399,7 +480,9 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->sync_sg_for_device)
> +	if (dma_is_direct(ops))
> +		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
> +	else if (ops->sync_sg_for_device)
>  		ops->sync_sg_for_device(dev, sg, nelems, dir);
>  	debug_dma_sync_sg_for_device(dev, sg, nelems, dir);
>  
> diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
> index 306557331d7d..69b36ed31a99 100644
> --- a/include/linux/dma-noncoherent.h
> +++ b/include/linux/dma-noncoherent.h
> @@ -38,7 +38,10 @@ pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
>  void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>  		enum dma_data_direction direction);
>  #else
> -#define arch_dma_cache_sync NULL
> +static inline void arch_dma_cache_sync(struct device *dev, void *vaddr,
> +		size_t size, enum dma_data_direction direction)
> +{
> +}
>  #endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
>  
>  #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 85d8286a0ba2..79da61b49fa4 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -223,6 +223,7 @@ void dma_direct_sync_single_for_device(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_device(dev, paddr, size, dir);
>  }
> +EXPORT_SYMBOL(dma_direct_sync_single_for_device);
>  
>  void dma_direct_sync_sg_for_device(struct device *dev,
>  		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
> @@ -240,6 +241,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>  					dir);
>  	}
>  }
> +EXPORT_SYMBOL(dma_direct_sync_sg_for_device);
>  #endif
>  
>  #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> @@ -258,6 +260,7 @@ void dma_direct_sync_single_for_cpu(struct device *dev,
>  	if (unlikely(is_swiotlb_buffer(paddr)))
>  		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
>  }
> +EXPORT_SYMBOL(dma_direct_sync_single_for_cpu);
>  
>  void dma_direct_sync_sg_for_cpu(struct device *dev,
>  		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
> @@ -277,6 +280,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_cpu_all(dev);
>  }
> +EXPORT_SYMBOL(dma_direct_sync_sg_for_cpu);
>  
>  void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>  		size_t size, enum dma_data_direction dir, unsigned long attrs)
> @@ -289,6 +293,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>  	if (unlikely(is_swiotlb_buffer(phys)))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
> +EXPORT_SYMBOL(dma_direct_unmap_page);
>  
>  void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>  		int nents, enum dma_data_direction dir, unsigned long attrs)
> @@ -300,11 +305,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>  		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>  			     attrs);
>  }
> -#else
> -void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> -		int nents, enum dma_data_direction dir, unsigned long attrs)
> -{
> -}
> +EXPORT_SYMBOL(dma_direct_unmap_sg);
>  #endif
>  
>  static inline bool dma_direct_possible(struct device *dev, dma_addr_t dma_addr,
> @@ -331,6 +332,7 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
>  		arch_sync_dma_for_device(dev, phys, size, dir);
>  	return dma_addr;
>  }
> +EXPORT_SYMBOL(dma_direct_map_page);
>  
>  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  		enum dma_data_direction dir, unsigned long attrs)
> @@ -352,6 +354,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>  	dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
>  	return 0;
>  }
> +EXPORT_SYMBOL(dma_direct_map_sg);
>  
>  /*
>   * Because 32-bit DMA masks are so common we expect every architecture to be
> @@ -372,27 +375,3 @@ int dma_direct_supported(struct device *dev, u64 mask)
>  
>  	return mask >= phys_to_dma(dev, min_mask);
>  }
> -
> -const struct dma_map_ops dma_direct_ops = {
> -	.alloc			= dma_direct_alloc,
> -	.free			= dma_direct_free,
> -	.map_page		= dma_direct_map_page,
> -	.map_sg			= dma_direct_map_sg,
> -#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
> -    defined(CONFIG_SWIOTLB)
> -	.sync_single_for_device	= dma_direct_sync_single_for_device,
> -	.sync_sg_for_device	= dma_direct_sync_sg_for_device,
> -#endif
> -#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
> -    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
> -    defined(CONFIG_SWIOTLB)
> -	.sync_single_for_cpu	= dma_direct_sync_single_for_cpu,
> -	.sync_sg_for_cpu	= dma_direct_sync_sg_for_cpu,
> -	.unmap_page		= dma_direct_unmap_page,
> -	.unmap_sg		= dma_direct_unmap_sg,
> -#endif
> -	.get_required_mask	= dma_direct_get_required_mask,
> -	.dma_supported		= dma_direct_supported,
> -	.cache_sync		= arch_dma_cache_sync,
> -};
> -EXPORT_SYMBOL(dma_direct_ops);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 0b18cfbdde95..fc84c81029d9 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -7,6 +7,7 @@
>   */
>  #include <linux/memblock.h> /* for max_pfn */
>  #include <linux/acpi.h>
> +#include <linux/dma-direct.h>
>  #include <linux/dma-noncoherent.h>
>  #include <linux/export.h>
>  #include <linux/gfp.h>
> @@ -229,8 +230,8 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>  		unsigned long attrs)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
> -	BUG_ON(!ops);
> -	if (ops->get_sgtable)
> +
> +	if (!dma_is_direct(ops) && ops->get_sgtable)
>  		return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
>  					attrs);
>  	return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
> @@ -293,8 +294,8 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>  		unsigned long attrs)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
> -	BUG_ON(!ops);
> -	if (ops->mmap)
> +
> +	if (!dma_is_direct(ops) && ops->mmap)
>  		return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
>  	return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
>  }
> @@ -324,6 +325,8 @@ u64 dma_get_required_mask(struct device *dev)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> +	if (dma_is_direct(ops))
> +		return dma_direct_get_required_mask(dev);
>  	if (ops->get_required_mask)
>  		return ops->get_required_mask(dev);
>  	return dma_default_get_required_mask(dev);
> @@ -341,7 +344,6 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  	void *cpu_addr;
>  
> -	BUG_ON(!ops);
>  	WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
>  
>  	if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr))
> @@ -352,10 +354,14 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  
>  	if (!arch_dma_alloc_attrs(&dev))
>  		return NULL;
> -	if (!ops->alloc)
> +
> +	if (dma_is_direct(ops))
> +		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
> +	else if (ops->alloc)
> +		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> +	else
>  		return NULL;
>  
> -	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
>  	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
>  	return cpu_addr;
>  }
> @@ -366,8 +372,6 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	BUG_ON(!ops);
> -
>  	if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr))
>  		return;
>  	/*
> @@ -379,11 +383,14 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>  	 */
>  	WARN_ON(irqs_disabled());
>  
> -	if (!ops->free || !cpu_addr)
> +	if (!cpu_addr)
>  		return;
>  
>  	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
> -	ops->free(dev, size, cpu_addr, dma_handle, attrs);
> +	if (dma_is_direct(ops))
> +		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
> +	else if (ops->free)
> +		ops->free(dev, size, cpu_addr, dma_handle, attrs);
>  }
>  EXPORT_SYMBOL(dma_free_attrs);
>  
> @@ -397,9 +404,9 @@ int dma_supported(struct device *dev, u64 mask)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (!ops)
> -		return 0;
> -	if (!ops->dma_supported)
> +	if (dma_is_direct(ops))
> +		return dma_direct_supported(dev, mask);
> +	if (ops->dma_supported)
>  		return 1;
>  	return ops->dma_supported(dev, mask);
>  }
> @@ -437,7 +444,10 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (ops->cache_sync)
> +
> +	if (dma_is_direct(ops))
> +		arch_dma_cache_sync(dev, vaddr, size, dir);
> +	else if (ops->cache_sync)
>  		ops->cache_sync(dev, vaddr, size, dir);
>  }
>  EXPORT_SYMBOL(dma_cache_sync);
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-18 20:34   ` Guillaume Tucker
@ 2018-12-18 20:42     ` Robin Murphy
  2018-12-19  6:42       ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Robin Murphy @ 2018-12-18 20:42 UTC (permalink / raw)
  To: Guillaume Tucker, Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Marek Szyprowski,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel, ezequiel Garcia, linux-arm-kernel

On 2018-12-18 8:34 pm, Guillaume Tucker wrote:
> On 07/12/2018 19:07, Christoph Hellwig wrote:
>> Avoid expensive indirect calls in the fast path DMA mapping
>> operations by directly calling the dma_direct_* ops if we are using
>> the directly mapped DMA operations.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> 
> I've run a semi-automated bisection on kernelci.org and found that this
> patch appeared to cause some regressions in linux-next on the
> rk3399-gru-kevin arm64 platform.  The bisection was run between
> next-20181128 and its merge base in mainline master (6531e115b7ab) with
> a plain defconfig.
> 
> 
> The problems seem to start with this message:
> 
> [    3.242163] mmc1: Unable to allocate ADMA buffers - falling back to standard DMA
> 
> then we can see this kind of warnings:
> 
> [    3.424261] mmc1: asked for transfer of 512 bytes exceeds bounce buffer 0 bytes
> [    3.432488] WARNING: CPU: 3 PID: 1596 at ../drivers/mmc/host/sdhci.c:1050 sdhci_send_command+0x8f0/0xfe8
> 
> see also:
> 
> [   16.046084] rk_iommu ff8f3f00.iommu: DMA map error for DT

Yup, with this patch as-is, anything which isn't behind an IOMMU will be 
erroneously banned from DMA entirely - see here:

https://lore.kernel.org/lkml/20181214142435.GA18448@lst.de/

Robin.

> 
> 
> The full kernel log is available here:
> 
>    https://lava.collabora.co.uk/scheduler/job/1395093
> 
> 
> Reverting this patch makes the errors go away, but I haven't done any
> further investigation so the actual problem may well lie somewhere else.
> 
> Hope this helps!
> 
> Best wishes,
> Guillaume
> 
>> ---
>>   arch/alpha/include/asm/dma-mapping.h |   2 +-
>>   arch/arc/mm/cache.c                  |   2 +-
>>   arch/arm/include/asm/dma-mapping.h   |   2 +-
>>   arch/arm/mm/dma-mapping-nommu.c      |  14 +---
>>   arch/arm64/mm/dma-mapping.c          |   3 -
>>   arch/ia64/hp/common/hwsw_iommu.c     |   2 +-
>>   arch/ia64/hp/common/sba_iommu.c      |   4 +-
>>   arch/ia64/kernel/dma-mapping.c       |   1 -
>>   arch/mips/include/asm/dma-mapping.h  |   2 +-
>>   arch/parisc/kernel/setup.c           |   4 -
>>   arch/sparc/include/asm/dma-mapping.h |   4 +-
>>   arch/x86/kernel/pci-dma.c            |   2 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   2 +-
>>   drivers/iommu/amd_iommu.c            |  13 +---
>>   include/asm-generic/dma-mapping.h    |   2 +-
>>   include/linux/dma-direct.h           |  17 ----
>>   include/linux/dma-mapping.h          | 111 +++++++++++++++++++++++----
>>   include/linux/dma-noncoherent.h      |   5 +-
>>   kernel/dma/direct.c                  |  37 ++-------
>>   kernel/dma/mapping.c                 |  40 ++++++----
>>   20 files changed, 150 insertions(+), 119 deletions(-)
>>
>> diff --git a/arch/alpha/include/asm/dma-mapping.h b/arch/alpha/include/asm/dma-mapping.h
>> index 8beeafd4f68e..0ee6a5c99b16 100644
>> --- a/arch/alpha/include/asm/dma-mapping.h
>> +++ b/arch/alpha/include/asm/dma-mapping.h
>> @@ -7,7 +7,7 @@ extern const struct dma_map_ops alpha_pci_ops;
>>   static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>>   {
>>   #ifdef CONFIG_ALPHA_JENSEN
>> -	return &dma_direct_ops;
>> +	return NULL;
>>   #else
>>   	return &alpha_pci_ops;
>>   #endif
>> diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
>> index f2701c13a66b..e188bb3ede53 100644
>> --- a/arch/arc/mm/cache.c
>> +++ b/arch/arc/mm/cache.c
>> @@ -1280,7 +1280,7 @@ void __init arc_cache_init_master(void)
>>   	/*
>>   	 * In case of IOC (say IOC+SLC case), pointers above could still be set
>>   	 * but end up not being relevant as the first function in chain is not
>> -	 * called at all for @dma_direct_ops
>> +	 * called at all for devices using coherent DMA.
>>   	 *     arch_sync_dma_for_cpu() -> dma_cache_*() -> __dma_cache_*()
>>   	 */
>>   }
>> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
>> index 965b7c846ecb..31d3b96f0f4b 100644
>> --- a/arch/arm/include/asm/dma-mapping.h
>> +++ b/arch/arm/include/asm/dma-mapping.h
>> @@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
>>   
>>   static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>>   {
>> -	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_direct_ops;
>> +	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : NULL;
>>   }
>>   
>>   #ifdef __arch_page_to_dma
>> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
>> index 712416ecd8e6..f304b10e23a4 100644
>> --- a/arch/arm/mm/dma-mapping-nommu.c
>> +++ b/arch/arm/mm/dma-mapping-nommu.c
>> @@ -22,7 +22,7 @@
>>   #include "dma.h"
>>   
>>   /*
>> - *  dma_direct_ops is used if
>> + *  The generic direct mapping code is used if
>>    *   - MMU/MPU is off
>>    *   - cpu is v7m w/o cache support
>>    *   - device is coherent
>> @@ -209,16 +209,9 @@ const struct dma_map_ops arm_nommu_dma_ops = {
>>   };
>>   EXPORT_SYMBOL(arm_nommu_dma_ops);
>>   
>> -static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
>> -{
>> -	return coherent ? &dma_direct_ops : &arm_nommu_dma_ops;
>> -}
>> -
>>   void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>>   			const struct iommu_ops *iommu, bool coherent)
>>   {
>> -	const struct dma_map_ops *dma_ops;
>> -
>>   	if (IS_ENABLED(CONFIG_CPU_V7M)) {
>>   		/*
>>   		 * Cache support for v7m is optional, so can be treated as
>> @@ -234,7 +227,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>>   		dev->archdata.dma_coherent = (get_cr() & CR_M) ? coherent : true;
>>   	}
>>   
>> -	dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
>> -
>> -	set_dma_ops(dev, dma_ops);
>> +	if (!dev->archdata.dma_coherent)
>> +		set_dma_ops(dev, &arm_nommu_dma_ops);
>>   }
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index ab1e417204d0..95eda81e3f2d 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -462,9 +462,6 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>>   void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>>   			const struct iommu_ops *iommu, bool coherent)
>>   {
>> -	if (!dev->dma_ops)
>> -		dev->dma_ops = &dma_direct_ops;
>> -
>>   	dev->dma_coherent = coherent;
>>   	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
>>   
>> diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
>> index f40ca499b246..8840ed97712f 100644
>> --- a/arch/ia64/hp/common/hwsw_iommu.c
>> +++ b/arch/ia64/hp/common/hwsw_iommu.c
>> @@ -38,7 +38,7 @@ static inline int use_swiotlb(struct device *dev)
>>   const struct dma_map_ops *hwsw_dma_get_ops(struct device *dev)
>>   {
>>   	if (use_swiotlb(dev))
>> -		return &dma_direct_ops;
>> +		return NULL;
>>   	return &sba_dma_ops;
>>   }
>>   EXPORT_SYMBOL(hwsw_dma_get_ops);
>> diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
>> index 5ee74820a0f6..5a361e51cb1e 100644
>> --- a/arch/ia64/hp/common/sba_iommu.c
>> +++ b/arch/ia64/hp/common/sba_iommu.c
>> @@ -2078,7 +2078,7 @@ sba_init(void)
>>   	 * a successful kdump kernel boot is to use the swiotlb.
>>   	 */
>>   	if (is_kdump_kernel()) {
>> -		dma_ops = &dma_direct_ops;
>> +		dma_ops = NULL;
>>   		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
>>   			panic("Unable to initialize software I/O TLB:"
>>   				  " Try machvec=dig boot option");
>> @@ -2100,7 +2100,7 @@ sba_init(void)
>>   		 * If we didn't find something sba_iommu can claim, we
>>   		 * need to setup the swiotlb and switch to the dig machvec.
>>   		 */
>> -		dma_ops = &dma_direct_ops;
>> +		dma_ops = NULL;
>>   		if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
>>   			panic("Unable to find SBA IOMMU or initialize "
>>   			      "software I/O TLB: Try machvec=dig boot option");
>> diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
>> index 80cd3e1ea95a..ad7d9963de34 100644
>> --- a/arch/ia64/kernel/dma-mapping.c
>> +++ b/arch/ia64/kernel/dma-mapping.c
>> @@ -36,7 +36,6 @@ long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
>>   
>>   void __init swiotlb_dma_init(void)
>>   {
>> -	dma_ops = &dma_direct_ops;
>>   	swiotlb_init(1);
>>   }
>>   #endif
>> diff --git a/arch/mips/include/asm/dma-mapping.h b/arch/mips/include/asm/dma-mapping.h
>> index 69f914667f3e..20dfaad3a55d 100644
>> --- a/arch/mips/include/asm/dma-mapping.h
>> +++ b/arch/mips/include/asm/dma-mapping.h
>> @@ -11,7 +11,7 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>>   #if defined(CONFIG_MACH_JAZZ)
>>   	return &jazz_dma_ops;
>>   #else
>> -	return &dma_direct_ops;
>> +	return NULL;
>>   #endif
>>   }
>>   
>> diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
>> index cd227f1cf629..54818cd78bd0 100644
>> --- a/arch/parisc/kernel/setup.c
>> +++ b/arch/parisc/kernel/setup.c
>> @@ -99,10 +99,6 @@ void __init dma_ops_init(void)
>>   
>>   	case pcxl2:
>>   		pa7300lc_init();
>> -	case pcxl: /* falls through */
>> -	case pcxs:
>> -	case pcxt:
>> -		hppa_dma_ops = &dma_direct_ops;
>>   		break;
>>   	default:
>>   		break;
>> diff --git a/arch/sparc/include/asm/dma-mapping.h b/arch/sparc/include/asm/dma-mapping.h
>> index b0bb2fcaf1c9..59f5a0f17316 100644
>> --- a/arch/sparc/include/asm/dma-mapping.h
>> +++ b/arch/sparc/include/asm/dma-mapping.h
>> @@ -14,11 +14,11 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>>   {
>>   #ifdef CONFIG_SPARC_LEON
>>   	if (sparc_cpu_model == sparc_leon)
>> -		return &dma_direct_ops;
>> +		return NULL;
>>   #endif
>>   #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
>>   	if (bus == &pci_bus_type)
>> -		return &dma_direct_ops;
>> +		return NULL;
>>   #endif
>>   	return dma_ops;
>>   }
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index f4562fcec681..d460998ae828 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -17,7 +17,7 @@
>>   
>>   static bool disable_dac_quirk __read_mostly;
>>   
>> -const struct dma_map_ops *dma_ops = &dma_direct_ops;
>> +const struct dma_map_ops *dma_ops;
>>   EXPORT_SYMBOL(dma_ops);
>>   
>>   #ifdef CONFIG_IOMMU_DEBUG
>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
>> index 61a84b958d67..50637f372e9f 100644
>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
>> @@ -581,7 +581,7 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
>>   
>>   	dev_priv->map_mode = vmw_dma_map_populate;
>>   
>> -	if (dma_ops->sync_single_for_cpu)
>> +	if (dma_ops && dma_ops->sync_single_for_cpu)
>>   		dev_priv->map_mode = vmw_dma_alloc_coherent;
>>   #ifdef CONFIG_SWIOTLB
>>   	if (swiotlb_nr_tbl() == 0)
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index c5d6c7c42b0a..567221cca13c 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -2184,7 +2184,7 @@ static int amd_iommu_add_device(struct device *dev)
>>   				dev_name(dev));
>>   
>>   		iommu_ignore_device(dev);
>> -		dev->dma_ops = &dma_direct_ops;
>> +		dev->dma_ops = NULL;
>>   		goto out;
>>   	}
>>   	init_iommu_group(dev);
>> @@ -2770,17 +2770,6 @@ int __init amd_iommu_init_dma_ops(void)
>>   	swiotlb        = (iommu_pass_through || sme_me_mask) ? 1 : 0;
>>   	iommu_detected = 1;
>>   
>> -	/*
>> -	 * In case we don't initialize SWIOTLB (actually the common case
>> -	 * when AMD IOMMU is enabled and SME is not active), make sure there
>> -	 * are global dma_ops set as a fall-back for devices not handled by
>> -	 * this driver (for example non-PCI devices). When SME is active,
>> -	 * make sure that swiotlb variable remains set so the global dma_ops
>> -	 * continue to be SWIOTLB.
>> -	 */
>> -	if (!swiotlb)
>> -		dma_ops = &dma_direct_ops;
>> -
>>   	if (amd_iommu_unmap_flush)
>>   		pr_info("AMD-Vi: IO/TLB flush on unmap enabled\n");
>>   	else
>> diff --git a/include/asm-generic/dma-mapping.h b/include/asm-generic/dma-mapping.h
>> index 880a292d792f..c13f46109e88 100644
>> --- a/include/asm-generic/dma-mapping.h
>> +++ b/include/asm-generic/dma-mapping.h
>> @@ -4,7 +4,7 @@
>>   
>>   static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
>>   {
>> -	return &dma_direct_ops;
>> +	return NULL;
>>   }
>>   
>>   #endif /* _ASM_GENERIC_DMA_MAPPING_H */
>> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>> index 3b0a3ea3876d..b7338702592a 100644
>> --- a/include/linux/dma-direct.h
>> +++ b/include/linux/dma-direct.h
>> @@ -60,22 +60,5 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
>>   struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>>   		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
>>   void __dma_direct_free_pages(struct device *dev, size_t size, struct page *page);
>> -dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
>> -		unsigned long offset, size_t size, enum dma_data_direction dir,
>> -		unsigned long attrs);
>> -void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>> -		size_t size, enum dma_data_direction dir, unsigned long attrs);
>> -int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>> -		enum dma_data_direction dir, unsigned long attrs);
>> -void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>> -		int nents, enum dma_data_direction dir, unsigned long attrs);
>> -void dma_direct_sync_single_for_device(struct device *dev,
>> -		dma_addr_t addr, size_t size, enum dma_data_direction dir);
>> -void dma_direct_sync_sg_for_device(struct device *dev,
>> -		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
>> -void dma_direct_sync_single_for_cpu(struct device *dev,
>> -		dma_addr_t addr, size_t size, enum dma_data_direction dir);
>> -void dma_direct_sync_sg_for_cpu(struct device *dev,
>> -		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
>>   int dma_direct_supported(struct device *dev, u64 mask);
>>   #endif /* _LINUX_DMA_DIRECT_H */
>> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
>> index 269ee27fc3d9..f422aec0f53c 100644
>> --- a/include/linux/dma-mapping.h
>> +++ b/include/linux/dma-mapping.h
>> @@ -134,7 +134,6 @@ struct dma_map_ops {
>>   
>>   #define DMA_MAPPING_ERROR		(~(dma_addr_t)0)
>>   
>> -extern const struct dma_map_ops dma_direct_ops;
>>   extern const struct dma_map_ops dma_virt_ops;
>>   extern const struct dma_map_ops dma_dummy_ops;
>>   
>> @@ -222,6 +221,69 @@ static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
>>   }
>>   #endif
>>   
>> +static inline bool dma_is_direct(const struct dma_map_ops *ops)
>> +{
>> +	return likely(!ops);
>> +}
>> +
>> +/*
>> + * All the dma_direct_* declarations are here just for the indirect call bypass,
>> + * and must not be used directly drivers!
>> + */
>> +dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
>> +		unsigned long offset, size_t size, enum dma_data_direction dir,
>> +		unsigned long attrs);
>> +int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>> +		enum dma_data_direction dir, unsigned long attrs);
>> +
>> +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
>> +    defined(CONFIG_SWIOTLB)
>> +void dma_direct_sync_single_for_device(struct device *dev,
>> +		dma_addr_t addr, size_t size, enum dma_data_direction dir);
>> +void dma_direct_sync_sg_for_device(struct device *dev,
>> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
>> +#else
>> +static inline void dma_direct_sync_single_for_device(struct device *dev,
>> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
>> +{
>> +}
>> +static inline void dma_direct_sync_sg_for_device(struct device *dev,
>> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
>> +{
>> +}
>> +#endif
>> +
>> +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
>> +    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
>> +    defined(CONFIG_SWIOTLB)
>> +void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>> +		size_t size, enum dma_data_direction dir, unsigned long attrs);
>> +void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>> +		int nents, enum dma_data_direction dir, unsigned long attrs);
>> +void dma_direct_sync_single_for_cpu(struct device *dev,
>> +		dma_addr_t addr, size_t size, enum dma_data_direction dir);
>> +void dma_direct_sync_sg_for_cpu(struct device *dev,
>> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir);
>> +#else
>> +static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>> +		size_t size, enum dma_data_direction dir, unsigned long attrs)
>> +{
>> +}
>> +static inline void dma_direct_unmap_sg(struct device *dev,
>> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir,
>> +		unsigned long attrs)
>> +{
>> +}
>> +static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>> +		dma_addr_t addr, size_t size, enum dma_data_direction dir)
>> +{
>> +}
>> +static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
>> +		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
>> +{
>> +}
>> +#endif
>> +
>>   static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>>   					      size_t size,
>>   					      enum dma_data_direction dir,
>> @@ -232,9 +294,12 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>>   	debug_dma_map_single(dev, ptr, size);
>> -	addr = ops->map_page(dev, virt_to_page(ptr),
>> -			     offset_in_page(ptr), size,
>> -			     dir, attrs);
>> +	if (dma_is_direct(ops))
>> +		addr = dma_direct_map_page(dev, virt_to_page(ptr),
>> +				offset_in_page(ptr), size, dir, attrs);
>> +	else
>> +		addr = ops->map_page(dev, virt_to_page(ptr),
>> +				offset_in_page(ptr), size, dir, attrs);
>>   	debug_dma_map_page(dev, virt_to_page(ptr),
>>   			   offset_in_page(ptr), size,
>>   			   dir, addr, true);
>> @@ -249,7 +314,9 @@ static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	if (ops->unmap_page)
>> +	if (dma_is_direct(ops))
>> +		dma_direct_unmap_page(dev, addr, size, dir, attrs);
>> +	else if (ops->unmap_page)
>>   		ops->unmap_page(dev, addr, size, dir, attrs);
>>   	debug_dma_unmap_page(dev, addr, size, dir, true);
>>   }
>> @@ -272,7 +339,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>>   	int ents;
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	ents = ops->map_sg(dev, sg, nents, dir, attrs);
>> +	if (dma_is_direct(ops))
>> +		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>> +	else
>> +		ents = ops->map_sg(dev, sg, nents, dir, attrs);
>>   	BUG_ON(ents < 0);
>>   	debug_dma_map_sg(dev, sg, nents, ents, dir);
>>   
>> @@ -287,7 +357,9 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>>   	debug_dma_unmap_sg(dev, sg, nents, dir);
>> -	if (ops->unmap_sg)
>> +	if (dma_is_direct(ops))
>> +		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
>> +	else if (ops->unmap_sg)
>>   		ops->unmap_sg(dev, sg, nents, dir, attrs);
>>   }
>>   
>> @@ -301,7 +373,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>>   	dma_addr_t addr;
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	addr = ops->map_page(dev, page, offset, size, dir, attrs);
>> +	if (dma_is_direct(ops))
>> +		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>> +	else
>> +		addr = ops->map_page(dev, page, offset, size, dir, attrs);
>>   	debug_dma_map_page(dev, page, offset, size, dir, addr, false);
>>   
>>   	return addr;
>> @@ -322,7 +397,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>>   	BUG_ON(pfn_valid(PHYS_PFN(phys_addr)));
>>   
>>   	addr = phys_addr;
>> -	if (ops->map_resource)
>> +	if (ops && ops->map_resource)
>>   		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
>>   
>>   	debug_dma_map_resource(dev, phys_addr, size, dir, addr);
>> @@ -337,7 +412,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	if (ops->unmap_resource)
>> +	if (ops && ops->unmap_resource)
>>   		ops->unmap_resource(dev, addr, size, dir, attrs);
>>   	debug_dma_unmap_resource(dev, addr, size, dir);
>>   }
>> @@ -349,7 +424,9 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	if (ops->sync_single_for_cpu)
>> +	if (dma_is_direct(ops))
>> +		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>> +	else if (ops->sync_single_for_cpu)
>>   		ops->sync_single_for_cpu(dev, addr, size, dir);
>>   	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
>>   }
>> @@ -368,7 +445,9 @@ static inline void dma_sync_single_for_device(struct device *dev,
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	if (ops->sync_single_for_device)
>> +	if (dma_is_direct(ops))
>> +		dma_direct_sync_single_for_device(dev, addr, size, dir);
>> +	else if (ops->sync_single_for_device)
>>   		ops->sync_single_for_device(dev, addr, size, dir);
>>   	debug_dma_sync_single_for_device(dev, addr, size, dir);
>>   }
>> @@ -387,7 +466,9 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	if (ops->sync_sg_for_cpu)
>> +	if (dma_is_direct(ops))
>> +		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>> +	else if (ops->sync_sg_for_cpu)
>>   		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
>>   	debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
>>   }
>> @@ -399,7 +480,9 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	if (ops->sync_sg_for_device)
>> +	if (dma_is_direct(ops))
>> +		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
>> +	else if (ops->sync_sg_for_device)
>>   		ops->sync_sg_for_device(dev, sg, nelems, dir);
>>   	debug_dma_sync_sg_for_device(dev, sg, nelems, dir);
>>   
>> diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
>> index 306557331d7d..69b36ed31a99 100644
>> --- a/include/linux/dma-noncoherent.h
>> +++ b/include/linux/dma-noncoherent.h
>> @@ -38,7 +38,10 @@ pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
>>   void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>>   		enum dma_data_direction direction);
>>   #else
>> -#define arch_dma_cache_sync NULL
>> +static inline void arch_dma_cache_sync(struct device *dev, void *vaddr,
>> +		size_t size, enum dma_data_direction direction)
>> +{
>> +}
>>   #endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
>>   
>>   #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> index 85d8286a0ba2..79da61b49fa4 100644
>> --- a/kernel/dma/direct.c
>> +++ b/kernel/dma/direct.c
>> @@ -223,6 +223,7 @@ void dma_direct_sync_single_for_device(struct device *dev,
>>   	if (!dev_is_dma_coherent(dev))
>>   		arch_sync_dma_for_device(dev, paddr, size, dir);
>>   }
>> +EXPORT_SYMBOL(dma_direct_sync_single_for_device);
>>   
>>   void dma_direct_sync_sg_for_device(struct device *dev,
>>   		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
>> @@ -240,6 +241,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>>   					dir);
>>   	}
>>   }
>> +EXPORT_SYMBOL(dma_direct_sync_sg_for_device);
>>   #endif
>>   
>>   #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
>> @@ -258,6 +260,7 @@ void dma_direct_sync_single_for_cpu(struct device *dev,
>>   	if (unlikely(is_swiotlb_buffer(paddr)))
>>   		swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
>>   }
>> +EXPORT_SYMBOL(dma_direct_sync_single_for_cpu);
>>   
>>   void dma_direct_sync_sg_for_cpu(struct device *dev,
>>   		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
>> @@ -277,6 +280,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>>   	if (!dev_is_dma_coherent(dev))
>>   		arch_sync_dma_for_cpu_all(dev);
>>   }
>> +EXPORT_SYMBOL(dma_direct_sync_sg_for_cpu);
>>   
>>   void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>>   		size_t size, enum dma_data_direction dir, unsigned long attrs)
>> @@ -289,6 +293,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>>   	if (unlikely(is_swiotlb_buffer(phys)))
>>   		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>>   }
>> +EXPORT_SYMBOL(dma_direct_unmap_page);
>>   
>>   void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>>   		int nents, enum dma_data_direction dir, unsigned long attrs)
>> @@ -300,11 +305,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>>   		dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
>>   			     attrs);
>>   }
>> -#else
>> -void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
>> -		int nents, enum dma_data_direction dir, unsigned long attrs)
>> -{
>> -}
>> +EXPORT_SYMBOL(dma_direct_unmap_sg);
>>   #endif
>>   
>>   static inline bool dma_direct_possible(struct device *dev, dma_addr_t dma_addr,
>> @@ -331,6 +332,7 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
>>   		arch_sync_dma_for_device(dev, phys, size, dir);
>>   	return dma_addr;
>>   }
>> +EXPORT_SYMBOL(dma_direct_map_page);
>>   
>>   int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>>   		enum dma_data_direction dir, unsigned long attrs)
>> @@ -352,6 +354,7 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
>>   	dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
>>   	return 0;
>>   }
>> +EXPORT_SYMBOL(dma_direct_map_sg);
>>   
>>   /*
>>    * Because 32-bit DMA masks are so common we expect every architecture to be
>> @@ -372,27 +375,3 @@ int dma_direct_supported(struct device *dev, u64 mask)
>>   
>>   	return mask >= phys_to_dma(dev, min_mask);
>>   }
>> -
>> -const struct dma_map_ops dma_direct_ops = {
>> -	.alloc			= dma_direct_alloc,
>> -	.free			= dma_direct_free,
>> -	.map_page		= dma_direct_map_page,
>> -	.map_sg			= dma_direct_map_sg,
>> -#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
>> -    defined(CONFIG_SWIOTLB)
>> -	.sync_single_for_device	= dma_direct_sync_single_for_device,
>> -	.sync_sg_for_device	= dma_direct_sync_sg_for_device,
>> -#endif
>> -#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
>> -    defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
>> -    defined(CONFIG_SWIOTLB)
>> -	.sync_single_for_cpu	= dma_direct_sync_single_for_cpu,
>> -	.sync_sg_for_cpu	= dma_direct_sync_sg_for_cpu,
>> -	.unmap_page		= dma_direct_unmap_page,
>> -	.unmap_sg		= dma_direct_unmap_sg,
>> -#endif
>> -	.get_required_mask	= dma_direct_get_required_mask,
>> -	.dma_supported		= dma_direct_supported,
>> -	.cache_sync		= arch_dma_cache_sync,
>> -};
>> -EXPORT_SYMBOL(dma_direct_ops);
>> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
>> index 0b18cfbdde95..fc84c81029d9 100644
>> --- a/kernel/dma/mapping.c
>> +++ b/kernel/dma/mapping.c
>> @@ -7,6 +7,7 @@
>>    */
>>   #include <linux/memblock.h> /* for max_pfn */
>>   #include <linux/acpi.h>
>> +#include <linux/dma-direct.h>
>>   #include <linux/dma-noncoherent.h>
>>   #include <linux/export.h>
>>   #include <linux/gfp.h>
>> @@ -229,8 +230,8 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>>   		unsigned long attrs)
>>   {
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>> -	BUG_ON(!ops);
>> -	if (ops->get_sgtable)
>> +
>> +	if (!dma_is_direct(ops) && ops->get_sgtable)
>>   		return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
>>   					attrs);
>>   	return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
>> @@ -293,8 +294,8 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>>   		unsigned long attrs)
>>   {
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>> -	BUG_ON(!ops);
>> -	if (ops->mmap)
>> +
>> +	if (!dma_is_direct(ops) && ops->mmap)
>>   		return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
>>   	return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
>>   }
>> @@ -324,6 +325,8 @@ u64 dma_get_required_mask(struct device *dev)
>>   {
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>> +	if (dma_is_direct(ops))
>> +		return dma_direct_get_required_mask(dev);
>>   	if (ops->get_required_mask)
>>   		return ops->get_required_mask(dev);
>>   	return dma_default_get_required_mask(dev);
>> @@ -341,7 +344,6 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   	void *cpu_addr;
>>   
>> -	BUG_ON(!ops);
>>   	WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
>>   
>>   	if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr))
>> @@ -352,10 +354,14 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>>   
>>   	if (!arch_dma_alloc_attrs(&dev))
>>   		return NULL;
>> -	if (!ops->alloc)
>> +
>> +	if (dma_is_direct(ops))
>> +		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
>> +	else if (ops->alloc)
>> +		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
>> +	else
>>   		return NULL;
>>   
>> -	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
>>   	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
>>   	return cpu_addr;
>>   }
>> @@ -366,8 +372,6 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>>   {
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>> -	BUG_ON(!ops);
>> -
>>   	if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr))
>>   		return;
>>   	/*
>> @@ -379,11 +383,14 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>>   	 */
>>   	WARN_ON(irqs_disabled());
>>   
>> -	if (!ops->free || !cpu_addr)
>> +	if (!cpu_addr)
>>   		return;
>>   
>>   	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
>> -	ops->free(dev, size, cpu_addr, dma_handle, attrs);
>> +	if (dma_is_direct(ops))
>> +		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
>> +	else if (ops->free)
>> +		ops->free(dev, size, cpu_addr, dma_handle, attrs);
>>   }
>>   EXPORT_SYMBOL(dma_free_attrs);
>>   
>> @@ -397,9 +404,9 @@ int dma_supported(struct device *dev, u64 mask)
>>   {
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>> -	if (!ops)
>> -		return 0;
>> -	if (!ops->dma_supported)
>> +	if (dma_is_direct(ops))
>> +		return dma_direct_supported(dev, mask);
>> +	if (ops->dma_supported)
>>   		return 1;
>>   	return ops->dma_supported(dev, mask);
>>   }
>> @@ -437,7 +444,10 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>>   
>>   	BUG_ON(!valid_dma_direction(dir));
>> -	if (ops->cache_sync)
>> +
>> +	if (dma_is_direct(ops))
>> +		arch_dma_cache_sync(dev, vaddr, size, dir);
>> +	else if (ops->cache_sync)
>>   		ops->cache_sync(dev, vaddr, size, dir);
>>   }
>>   EXPORT_SYMBOL(dma_cache_sync);
>>
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-18 20:42     ` Robin Murphy
@ 2018-12-19  6:42       ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-19  6:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Guillaume Tucker, Christoph Hellwig, iommu, Linus Torvalds,
	Jesper Dangaard Brouer, Tariq Toukan, Ilias Apalodimas,
	Toke Høiland-Jørgensen, Konrad Rzeszutek Wilk,
	Tony Luck, Fenghua Yu, Marek Szyprowski, Keith Busch,
	Jonathan Derrick, linux-pci, linux-ia64, x86, linux-kernel,
	ezequiel Garcia, linux-arm-kernel

On Tue, Dec 18, 2018 at 08:42:46PM +0000, Robin Murphy wrote:
>> [   16.046084] rk_iommu ff8f3f00.iommu: DMA map error for DT
>
> Yup, with this patch as-is, anything which isn't behind an IOMMU will be 
> erroneously banned from DMA entirely - see here:
>
> https://lore.kernel.org/lkml/20181214142435.GA18448@lst.de/

FYI, I'm still waiting for a review on that..

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-07 19:07 ` [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct Christoph Hellwig
                     ` (2 preceding siblings ...)
  2018-12-18 20:34   ` Guillaume Tucker
@ 2018-12-20 16:44   ` Thierry Reding
  2018-12-20 16:46     ` Christoph Hellwig
  3 siblings, 1 reply; 41+ messages in thread
From: Thierry Reding @ 2018-12-20 16:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu, Marek Szyprowski,
	Keith Busch, Jonathan Derrick, linux-pci, linux-ia64, x86,
	linux-kernel


[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]

On Fri, Dec 07, 2018 at 11:07:20AM -0800, Christoph Hellwig wrote:
[...]
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 0b18cfbdde95..fc84c81029d9 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
[...]
> @@ -397,9 +404,9 @@ int dma_supported(struct device *dev, u64 mask)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (!ops)
> -		return 0;
> -	if (!ops->dma_supported)
> +	if (dma_is_direct(ops))
> +		return dma_direct_supported(dev, mask);
> +	if (ops->dma_supported)
>  		return 1;
>  	return ops->dma_supported(dev, mask);
>  }

Hi Christoph,

This hunk causes a crash on boot for me. It looks like a ! got lost in
the rework here. The following patch fixes the crash for me and restores
the logic of the op->dma_supported check.

Feel free to squash this patch into the above if you prefer that.

Thierry

--- >8 ---
From c502b29ab01fa857e81c78cd574d4d22d7d20e09 Mon Sep 17 00:00:00 2001
From: Thierry Reding <treding@nvidia.com>
Date: Thu, 20 Dec 2018 17:35:47 +0100
Subject: [PATCH] dma-mapping: Fix inverted logic in dma_supported()

The cleanup in commit 356da6d0cde3 ("dma-mapping: bypass indirect calls
for dma-direct") accidentally inverted the logic in the check for the
presence of a ->dma_supported() callback. Switch this back to the way it
was to prevent a crash on boot.

Fixes: 356da6d0cde3 ("dma-mapping: bypass indirect calls for dma-direct")
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 kernel/dma/mapping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fc84c81029d9..d7c34d2d1ba5 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -406,7 +406,7 @@ int dma_supported(struct device *dev, u64 mask)
 
 	if (dma_is_direct(ops))
 		return dma_direct_supported(dev, mask);
-	if (ops->dma_supported)
+	if (!ops->dma_supported)
 		return 1;
 	return ops->dma_supported(dev, mask);
 }
-- 
2.19.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct
  2018-12-20 16:44   ` [PATCH 15/15] " Thierry Reding
@ 2018-12-20 16:46     ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2018-12-20 16:46 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Christoph Hellwig, iommu, Linus Torvalds, Jesper Dangaard Brouer,
	Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Tony Luck, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, x86, linux-kernel

On Thu, Dec 20, 2018 at 05:44:18PM +0100, Thierry Reding wrote:
> On Fri, Dec 07, 2018 at 11:07:20AM -0800, Christoph Hellwig wrote:
> [...]
> > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> > index 0b18cfbdde95..fc84c81029d9 100644
> > --- a/kernel/dma/mapping.c
> > +++ b/kernel/dma/mapping.c
> [...]
> > @@ -397,9 +404,9 @@ int dma_supported(struct device *dev, u64 mask)
> >  {
> >  	const struct dma_map_ops *ops = get_dma_ops(dev);
> >  
> > -	if (!ops)
> > -		return 0;
> > -	if (!ops->dma_supported)
> > +	if (dma_is_direct(ops))
> > +		return dma_direct_supported(dev, mask);
> > +	if (ops->dma_supported)
> >  		return 1;
> >  	return ops->dma_supported(dev, mask);
> >  }
> 
> Hi Christoph,
> 
> This hunk causes a crash on boot for me. It looks like a ! got lost in
> the rework here. The following patch fixes the crash for me and restores
> the logic of the op->dma_supported check.
> 
> Feel free to squash this patch into the above if you prefer that.

I don't want to rebase, so I'll pick this up ASAP.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/15] swiotlb: remove dma_mark_clean
  2018-12-07 19:07 ` [PATCH 02/15] swiotlb: remove dma_mark_clean Christoph Hellwig
@ 2019-01-02 21:53   ` Tony Luck
  2019-01-03  7:23     ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Tony Luck @ 2019-01-02 21:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Fenghua Yu, Marek Szyprowski, Keith Busch,
	Jonathan Derrick, linux-pci, linux-ia64, X86-ML,
	Linux Kernel Mailing List

On Fri, Dec 7, 2018 at 11:08 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Instead of providing a special dma_mark_clean hook just for ia64, switch
> ia64 to use the normal arch_sync_dma_for_cpu hooks instead.
>
> This means that we now also set the PG_arch_1 bit for pages in the
> swiotlb buffer, which isn't stricly needed as we will never execute code
> out of the swiotlb buffer, but otherwise harmless.

ia64 build based on arch/ia64/configs/zx1_defconfig now fails with undefined
symbols arch_dma_alloc and arch_dma_free (used by kernel/dma/direct.c).

This config doesn't define CONFIG_SWIOTLB, so we don't get the
benefit of the routines in arch/ia64/kernel/dma-mapping.c

-Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/15] swiotlb: remove dma_mark_clean
  2019-01-02 21:53   ` Tony Luck
@ 2019-01-03  7:23     ` Christoph Hellwig
  2019-01-03 17:35       ` Tony Luck
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2019-01-03  7:23 UTC (permalink / raw)
  To: Tony Luck
  Cc: Christoph Hellwig, iommu, Linus Torvalds, Jesper Dangaard Brouer,
	Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, X86-ML, Linux Kernel Mailing List

On Wed, Jan 02, 2019 at 01:53:33PM -0800, Tony Luck wrote:
> On Fri, Dec 7, 2018 at 11:08 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > Instead of providing a special dma_mark_clean hook just for ia64, switch
> > ia64 to use the normal arch_sync_dma_for_cpu hooks instead.
> >
> > This means that we now also set the PG_arch_1 bit for pages in the
> > swiotlb buffer, which isn't stricly needed as we will never execute code
> > out of the swiotlb buffer, but otherwise harmless.
> 
> ia64 build based on arch/ia64/configs/zx1_defconfig now fails with undefined
> symbols arch_dma_alloc and arch_dma_free (used by kernel/dma/direct.c).
> 
> This config doesn't define CONFIG_SWIOTLB, so we don't get the
> benefit of the routines in arch/ia64/kernel/dma-mapping.c

I think something like the patch below should fix it:

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index ccd56f5df8cd..8d7396bd1790 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -31,7 +31,7 @@ config IA64
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select ARCH_HAS_DMA_COHERENT_TO_PFN if SWIOTLB
-	select ARCH_HAS_SYNC_DMA_FOR_CPU
+	select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB
 	select VIRT_TO_BUS
 	select ARCH_DISCARD_MEMBLOCK
 	select GENERIC_IRQ_PROBE

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/15] swiotlb: remove dma_mark_clean
  2019-01-03  7:23     ` Christoph Hellwig
@ 2019-01-03 17:35       ` Tony Luck
  2019-01-04  8:09         ` Christoph Hellwig
  0 siblings, 1 reply; 41+ messages in thread
From: Tony Luck @ 2019-01-03 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Linus Torvalds, Jesper Dangaard Brouer, Tariq Toukan,
	Ilias Apalodimas, Toke Høiland-Jørgensen, Robin Murphy,
	Konrad Rzeszutek Wilk, Fenghua Yu, Marek Szyprowski, Keith Busch,
	Jonathan Derrick, linux-pci, linux-ia64, X86-ML,
	Linux Kernel Mailing List

On Wed, Jan 2, 2019 at 11:23 PM Christoph Hellwig <hch@lst.de> wrote:
> I think something like the patch below should fix it:
>
> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> index ccd56f5df8cd..8d7396bd1790 100644
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -31,7 +31,7 @@ config IA64
>         select HAVE_MEMBLOCK_NODE_MAP
>         select HAVE_VIRT_CPU_ACCOUNTING
>         select ARCH_HAS_DMA_COHERENT_TO_PFN if SWIOTLB
> -       select ARCH_HAS_SYNC_DMA_FOR_CPU
> +       select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB

Close. But no cigar. Now I get:

  CC      arch/ia64/mm/init.o
arch/ia64/mm/init.c:75:6: error: redefinition of ‘arch_sync_dma_for_cpu’
./include/linux/dma-noncoherent.h:61:20: note: previous definition of
‘arch_sync_dma_for_cpu’ was here
make[1]: *** [arch/ia64/mm/init.o] Error 1
make: *** [arch/ia64/mm/init.o] Error 2


-Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/15] swiotlb: remove dma_mark_clean
  2019-01-03 17:35       ` Tony Luck
@ 2019-01-04  8:09         ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2019-01-04  8:09 UTC (permalink / raw)
  To: Tony Luck
  Cc: Christoph Hellwig, iommu, Linus Torvalds, Jesper Dangaard Brouer,
	Tariq Toukan, Ilias Apalodimas, Toke Høiland-Jørgensen,
	Robin Murphy, Konrad Rzeszutek Wilk, Fenghua Yu,
	Marek Szyprowski, Keith Busch, Jonathan Derrick, linux-pci,
	linux-ia64, X86-ML, Linux Kernel Mailing List

One more ifdef to rescue..

Btw, do you know why we only play these mark clean bits for swiotlb
and not for the various iommus?

Also do you have any good receipe to build an ia64 cross compiler on
a recent Debian system?  Unlike most architectures Debian doesn't have
a pre-built one, and the script from the kernel buіldbot doesn't work
either unfortunately.

---
From 3f5f3297aa989cf27b3fe10e2d010422332574b3 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Fri, 4 Jan 2019 09:06:05 +0100
Subject: ia64: fix compile without swiotlb

Some non-generic ia64 configs don't build swiotlb, and thus should not
pull in the generic non-coherent DMA infrastructure.

Fixes: 68c608345c ("swiotlb: remove dma_mark_clean")
Reported-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/ia64/Kconfig   | 2 +-
 arch/ia64/mm/init.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index ccd56f5df8cd..8d7396bd1790 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -31,7 +31,7 @@ config IA64
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_VIRT_CPU_ACCOUNTING
 	select ARCH_HAS_DMA_COHERENT_TO_PFN if SWIOTLB
-	select ARCH_HAS_SYNC_DMA_FOR_CPU
+	select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB
 	select VIRT_TO_BUS
 	select ARCH_DISCARD_MEMBLOCK
 	select GENERIC_IRQ_PROBE
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 055382622f07..29d841525ca1 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -67,6 +67,7 @@ __ia64_sync_icache_dcache (pte_t pte)
 	set_bit(PG_arch_1, &page->flags);	/* mark page as clean */
 }
 
+#ifdef CONFIG_SWIOTLB
 /*
  * Since DMA is i-cache coherent, any (complete) pages that were written via
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
@@ -81,6 +82,7 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
 		set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
 	} while (++pfn <= PHYS_PFN(paddr + size - 1));
 }
+#endif
 
 inline void
 ia64_set_rbs_bot (void)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, back to index

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-07 19:07 [RFC] avoid indirect calls for DMA direct mappings v2 Christoph Hellwig
2018-12-07 19:07 ` [PATCH 01/15] swiotlb: remove SWIOTLB_MAP_ERROR Christoph Hellwig
2018-12-07 19:07 ` [PATCH 02/15] swiotlb: remove dma_mark_clean Christoph Hellwig
2019-01-02 21:53   ` Tony Luck
2019-01-03  7:23     ` Christoph Hellwig
2019-01-03 17:35       ` Tony Luck
2019-01-04  8:09         ` Christoph Hellwig
2018-12-07 19:07 ` [PATCH 03/15] dma-direct: improve addressability error reporting Christoph Hellwig
2018-12-07 19:07 ` [PATCH 04/15] dma-direct: use dma_direct_map_page to implement dma_direct_map_sg Christoph Hellwig
2018-12-07 19:07 ` [PATCH 05/15] dma-direct: merge swiotlb_dma_ops into the dma_direct code Christoph Hellwig
2018-12-07 19:07 ` [PATCH 06/15] dma-mapping: simplify the dma_sync_single_range_for_{cpu,device} implementation Christoph Hellwig
2018-12-07 19:07 ` [PATCH 07/15] dma-mapping: merge dma_unmap_page_attrs and dma_unmap_single_attrs Christoph Hellwig
2018-12-07 19:07 ` [PATCH 08/15] dma-mapping: move dma_get_required_mask to kernel/dma Christoph Hellwig
2018-12-07 19:07 ` [PATCH 09/15] dma-mapping: move various slow path functions out of line Christoph Hellwig
2018-12-07 19:07 ` [PATCH 10/15] dma-mapping: move dma_cache_sync " Christoph Hellwig
2018-12-07 19:07 ` [PATCH 11/15] dma-mapping: always build the direct mapping code Christoph Hellwig
2018-12-07 19:07 ` [PATCH 12/15] dma-mapping: factor out dummy DMA ops Christoph Hellwig
2018-12-07 19:07 ` [PATCH 13/15] ACPI / scan: Refactor _CCA enforcement Christoph Hellwig
2018-12-14 21:15   ` Bjorn Helgaas
2018-12-07 19:07 ` [PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls Christoph Hellwig
2018-12-14 21:17   ` Bjorn Helgaas
2018-12-14 21:34     ` Derrick, Jonathan
2018-12-07 19:07 ` [PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct Christoph Hellwig
2018-12-14 14:11   ` Marek Szyprowski
2018-12-14 14:24     ` Christoph Hellwig
2018-12-14 14:32       ` Marek Szyprowski
2018-12-15 17:46   ` [15/15] " Guenter Roeck
2018-12-16  9:02     ` Christoph Hellwig
2018-12-18 20:34   ` Guillaume Tucker
2018-12-18 20:42     ` Robin Murphy
2018-12-19  6:42       ` Christoph Hellwig
2018-12-20 16:44   ` [PATCH 15/15] " Thierry Reding
2018-12-20 16:46     ` Christoph Hellwig
2018-12-08 16:06 ` [RFC] avoid indirect calls for DMA direct mappings v2 Jesper Dangaard Brouer
2018-12-08 16:50   ` Christoph Hellwig
2018-12-10 21:51 ` Luck, Tony
2018-12-11  6:51   ` Christoph Hellwig
2018-12-11 16:42     ` Luck, Tony
2018-12-11 17:13     ` Luck, Tony
2018-12-11 17:15       ` Christoph Hellwig
2018-12-13 20:08 ` Christoph Hellwig

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git