All of lore.kernel.org
 help / color / mirror / Atom feed
* generic DMA bypass flag v2
@ 2020-03-20 14:16 ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel

Hi all,

I've recently beeing chatting with Lu about using dma-iommu and
per-device DMA ops in the intel IOMMU driver, and one missing feature
in dma-iommu is a bypass mode where the direct mapping is used even
when an iommu is attached to improve performance.  The powerpc
code already has a similar mode, so I'd like to move it to the core
DMA mapping code.  As part of that I noticed that the current
powerpc code has a little bug in that it used the wrong check in the
dma_sync_* routines to see if the direct mapping code is used.

These two patches just add the generic code and move powerpc over,
the intel IOMMU bits will require a separate discussion.

The x86 AMD Gart code also has a bypass mode, but it is a lot
strange, so I'm not going to touch it for now.

Changes since v1:
 - rebased to the current dma-mapping-for-next tree

^ permalink raw reply	[flat|nested] 94+ messages in thread

* generic DMA bypass flag v2
@ 2020-03-20 14:16 ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	linuxppc-dev, Lu Baolu

Hi all,

I've recently beeing chatting with Lu about using dma-iommu and
per-device DMA ops in the intel IOMMU driver, and one missing feature
in dma-iommu is a bypass mode where the direct mapping is used even
when an iommu is attached to improve performance.  The powerpc
code already has a similar mode, so I'd like to move it to the core
DMA mapping code.  As part of that I noticed that the current
powerpc code has a little bug in that it used the wrong check in the
dma_sync_* routines to see if the direct mapping code is used.

These two patches just add the generic code and move powerpc over,
the intel IOMMU bits will require a separate discussion.

The x86 AMD Gart code also has a bypass mode, but it is a lot
strange, so I'm not going to touch it for now.

Changes since v1:
 - rebased to the current dma-mapping-for-next tree

^ permalink raw reply	[flat|nested] 94+ messages in thread

* generic DMA bypass flag v2
@ 2020-03-20 14:16 ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, linuxppc-dev

Hi all,

I've recently beeing chatting with Lu about using dma-iommu and
per-device DMA ops in the intel IOMMU driver, and one missing feature
in dma-iommu is a bypass mode where the direct mapping is used even
when an iommu is attached to improve performance.  The powerpc
code already has a similar mode, so I'd like to move it to the core
DMA mapping code.  As part of that I noticed that the current
powerpc code has a little bug in that it used the wrong check in the
dma_sync_* routines to see if the direct mapping code is used.

These two patches just add the generic code and move powerpc over,
the intel IOMMU bits will require a separate discussion.

The x86 AMD Gart code also has a bypass mode, but it is a lot
strange, so I'm not going to touch it for now.

Changes since v1:
 - rebased to the current dma-mapping-for-next tree
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-20 14:16 ` Christoph Hellwig
  (?)
@ 2020-03-20 14:16   ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel

Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  6 ++++++
 include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
 kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
 3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 0cd7c647c16c..09be8bb2c4a6 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -525,6 +525,11 @@ struct dev_links_info {
  *		  sync_state() callback.
  * @dma_coherent: this particular device is dma coherent, even if the
  *		architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
+ *		and optionall (if the coherent mask is large enough) also
+ *		for dma allocations.  This flag is managed by the dma ops
+ *		instance from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -625,6 +630,7 @@ struct device {
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
 	bool			dma_coherent:1;
 #endif
+	bool			dma_ops_bypass : 1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 330ad58fbf4d..c3af0cf5e435 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_map_direct(struct device *dev,
+		const struct dma_map_ops *ops)
 {
-	return likely(!ops);
+	return likely(!ops) || dev->dma_ops_bypass;
 }
 
 /*
@@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
@@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	int ents;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
 	else
 		ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_unmap_sg(dev, sg, nents, dir);
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
 	else if (ops->unmap_sg)
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
 	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
 		return DMA_MAPPING_ERROR;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
 	else if (ops->map_resource)
 		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
@@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (!dma_is_direct(ops) && ops->unmap_resource)
+	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
 		ops->unmap_resource(dev, addr, size, dir, attrs);
 	debug_dma_unmap_resource(dev, addr, size, dir);
 }
@@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 	else if (ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr, size, dir);
@@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_device(dev, addr, size, dir);
 	else if (ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr, size, dir);
@@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_cpu)
 		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
@@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_device)
 		ops->sync_sg_for_device(dev, sg, nelems, dir);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 12ff766ec1fa..fdea45574345 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 }
 EXPORT_SYMBOL(dmam_alloc_attrs);
 
+static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
+{
+	if (!ops)
+		return true;
+
+	/*
+	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
+	 * is large enough.
+	 */
+	if (dev->dma_ops_bypass) {
+		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
+				dma_direct_get_required_mask(dev))
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Create scatter-list for the already allocated DMA buffer.
  */
@@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
 				size, attrs);
 	if (!ops->get_sgtable)
@@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_can_mmap(dev);
 	return ops->mmap != NULL;
 }
@@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
 				attrs);
 	if (!ops->mmap)
@@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		return dma_direct_get_required_mask(dev);
 	if (ops->get_required_mask)
 		return ops->get_required_mask(dev);
@@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	/* let the implementation decide on the zone to allocate from: */
 	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	else if (ops->alloc)
 		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
@@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 		return;
 
 	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
 	else if (ops->free)
 		ops->free(dev, size, cpu_addr, dma_handle, attrs);
@@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (!ops)
 		return dma_direct_supported(dev, mask);
 	if (!ops->dma_supported)
 		return 1;
@@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		arch_dma_cache_sync(dev, vaddr, size, dir);
 	else if (ops->cache_sync)
 		ops->cache_sync(dev, vaddr, size, dir);
@@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	size_t size = SIZE_MAX;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		size = dma_direct_max_mapping_size(dev);
 	else if (ops && ops->max_mapping_size)
 		size = ops->max_mapping_size(dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-20 14:16   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	linuxppc-dev, Lu Baolu

Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  6 ++++++
 include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
 kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
 3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 0cd7c647c16c..09be8bb2c4a6 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -525,6 +525,11 @@ struct dev_links_info {
  *		  sync_state() callback.
  * @dma_coherent: this particular device is dma coherent, even if the
  *		architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
+ *		and optionall (if the coherent mask is large enough) also
+ *		for dma allocations.  This flag is managed by the dma ops
+ *		instance from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -625,6 +630,7 @@ struct device {
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
 	bool			dma_coherent:1;
 #endif
+	bool			dma_ops_bypass : 1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 330ad58fbf4d..c3af0cf5e435 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_map_direct(struct device *dev,
+		const struct dma_map_ops *ops)
 {
-	return likely(!ops);
+	return likely(!ops) || dev->dma_ops_bypass;
 }
 
 /*
@@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
@@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	int ents;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
 	else
 		ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_unmap_sg(dev, sg, nents, dir);
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
 	else if (ops->unmap_sg)
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
 	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
 		return DMA_MAPPING_ERROR;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
 	else if (ops->map_resource)
 		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
@@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (!dma_is_direct(ops) && ops->unmap_resource)
+	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
 		ops->unmap_resource(dev, addr, size, dir, attrs);
 	debug_dma_unmap_resource(dev, addr, size, dir);
 }
@@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 	else if (ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr, size, dir);
@@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_device(dev, addr, size, dir);
 	else if (ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr, size, dir);
@@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_cpu)
 		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
@@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_device)
 		ops->sync_sg_for_device(dev, sg, nelems, dir);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 12ff766ec1fa..fdea45574345 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 }
 EXPORT_SYMBOL(dmam_alloc_attrs);
 
+static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
+{
+	if (!ops)
+		return true;
+
+	/*
+	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
+	 * is large enough.
+	 */
+	if (dev->dma_ops_bypass) {
+		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
+				dma_direct_get_required_mask(dev))
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Create scatter-list for the already allocated DMA buffer.
  */
@@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
 				size, attrs);
 	if (!ops->get_sgtable)
@@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_can_mmap(dev);
 	return ops->mmap != NULL;
 }
@@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
 				attrs);
 	if (!ops->mmap)
@@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		return dma_direct_get_required_mask(dev);
 	if (ops->get_required_mask)
 		return ops->get_required_mask(dev);
@@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	/* let the implementation decide on the zone to allocate from: */
 	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	else if (ops->alloc)
 		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
@@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 		return;
 
 	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
 	else if (ops->free)
 		ops->free(dev, size, cpu_addr, dma_handle, attrs);
@@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (!ops)
 		return dma_direct_supported(dev, mask);
 	if (!ops->dma_supported)
 		return 1;
@@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		arch_dma_cache_sync(dev, vaddr, size, dir);
 	else if (ops->cache_sync)
 		ops->cache_sync(dev, vaddr, size, dir);
@@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	size_t size = SIZE_MAX;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		size = dma_direct_max_mapping_size(dev);
 	else if (ops && ops->max_mapping_size)
 		size = ops->max_mapping_size(dev);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-20 14:16   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, linuxppc-dev

Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  6 ++++++
 include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
 kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
 3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 0cd7c647c16c..09be8bb2c4a6 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -525,6 +525,11 @@ struct dev_links_info {
  *		  sync_state() callback.
  * @dma_coherent: this particular device is dma coherent, even if the
  *		architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
+ *		and optionall (if the coherent mask is large enough) also
+ *		for dma allocations.  This flag is managed by the dma ops
+ *		instance from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -625,6 +630,7 @@ struct device {
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
 	bool			dma_coherent:1;
 #endif
+	bool			dma_ops_bypass : 1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 330ad58fbf4d..c3af0cf5e435 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_map_direct(struct device *dev,
+		const struct dma_map_ops *ops)
 {
-	return likely(!ops);
+	return likely(!ops) || dev->dma_ops_bypass;
 }
 
 /*
@@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
@@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	int ents;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
 	else
 		ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_unmap_sg(dev, sg, nents, dir);
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
 	else if (ops->unmap_sg)
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
 	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
 		return DMA_MAPPING_ERROR;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
 	else if (ops->map_resource)
 		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
@@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (!dma_is_direct(ops) && ops->unmap_resource)
+	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
 		ops->unmap_resource(dev, addr, size, dir, attrs);
 	debug_dma_unmap_resource(dev, addr, size, dir);
 }
@@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 	else if (ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr, size, dir);
@@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_device(dev, addr, size, dir);
 	else if (ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr, size, dir);
@@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_cpu)
 		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
@@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_device)
 		ops->sync_sg_for_device(dev, sg, nelems, dir);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 12ff766ec1fa..fdea45574345 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 }
 EXPORT_SYMBOL(dmam_alloc_attrs);
 
+static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
+{
+	if (!ops)
+		return true;
+
+	/*
+	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
+	 * is large enough.
+	 */
+	if (dev->dma_ops_bypass) {
+		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
+				dma_direct_get_required_mask(dev))
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Create scatter-list for the already allocated DMA buffer.
  */
@@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
 				size, attrs);
 	if (!ops->get_sgtable)
@@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_can_mmap(dev);
 	return ops->mmap != NULL;
 }
@@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
 				attrs);
 	if (!ops->mmap)
@@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		return dma_direct_get_required_mask(dev);
 	if (ops->get_required_mask)
 		return ops->get_required_mask(dev);
@@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	/* let the implementation decide on the zone to allocate from: */
 	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	else if (ops->alloc)
 		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
@@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 		return;
 
 	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
 	else if (ops->free)
 		ops->free(dev, size, cpu_addr, dma_handle, attrs);
@@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (!ops)
 		return dma_direct_supported(dev, mask);
 	if (!ops->dma_supported)
 		return 1;
@@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		arch_dma_cache_sync(dev, vaddr, size, dir);
 	else if (ops->cache_sync)
 		ops->cache_sync(dev, vaddr, size, dir);
@@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	size_t size = SIZE_MAX;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		size = dma_direct_max_mapping_size(dev);
 	else if (ops && ops->max_mapping_size)
 		size = ops->max_mapping_size(dev);
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 2/2] powerpc: use the generic dma_ops_bypass mode
  2020-03-20 14:16 ` Christoph Hellwig
  (?)
@ 2020-03-20 14:16   ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel

Use the DMA API bypass mechanism for direct window mappings.  This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all.  It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/include/asm/device.h |  5 --
 arch/powerpc/kernel/dma-iommu.c   | 90 ++++---------------------------
 2 files changed, 9 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 266542769e4b..452402215e12 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -18,11 +18,6 @@ struct iommu_table;
  * drivers/macintosh/macio_asic.c
  */
 struct dev_archdata {
-	/*
-	 * Set to %true if the dma_iommu_ops are requested to use a direct
-	 * window instead of dynamically mapping memory.
-	 */
-	bool			iommu_bypass : 1;
 	/*
 	 * These two used to be a union. However, with the hybrid ops we need
 	 * both so here we store both a DMA offset for direct mappings and
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de2..569fecd7b5b2 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,23 +14,6 @@
  * Generic iommu implementation
  */
 
-/*
- * The coherent mask may be smaller than the real mask, check if we can
- * really use a direct window.
- */
-static inline bool dma_iommu_alloc_bypass(struct device *dev)
-{
-	return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
-		dma_direct_supported(dev, dev->coherent_dma_mask);
-}
-
-static inline bool dma_iommu_map_bypass(struct device *dev,
-		unsigned long attrs)
-{
-	return dev->archdata.iommu_bypass &&
-		(!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
-}
-
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -39,8 +22,6 @@ static void *dma_iommu_alloc_coherent(struct device *dev, size_t size,
 				      dma_addr_t *dma_handle, gfp_t flag,
 				      unsigned long attrs)
 {
-	if (dma_iommu_alloc_bypass(dev))
-		return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
 				    dma_handle, dev->coherent_dma_mask, flag,
 				    dev_to_node(dev));
@@ -50,11 +31,7 @@ static void dma_iommu_free_coherent(struct device *dev, size_t size,
 				    void *vaddr, dma_addr_t dma_handle,
 				    unsigned long attrs)
 {
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-	else
-		iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
-				dma_handle);
+	iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
 }
 
 /* Creates TCEs for a user provided buffer.  The user buffer must be
@@ -67,9 +44,6 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
 				     enum dma_data_direction direction,
 				     unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
-		return dma_direct_map_page(dev, page, offset, size, direction,
-				attrs);
 	return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
 			      size, dma_get_mask(dev), direction, attrs);
 }
@@ -79,11 +53,8 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
 				 size_t size, enum dma_data_direction direction,
 				 unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
-		iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
-				direction,  attrs);
-	else
-		dma_direct_unmap_page(dev, dma_handle, size, direction, attrs);
+	iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size, direction,
+			 attrs);
 }
 
 
@@ -91,8 +62,6 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
 			    int nelems, enum dma_data_direction direction,
 			    unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
-		return dma_direct_map_sg(dev, sglist, nelems, direction, attrs);
 	return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
 				dma_get_mask(dev), direction, attrs);
 }
@@ -101,11 +70,8 @@ static void dma_iommu_unmap_sg(struct device *dev, struct scatterlist *sglist,
 		int nelems, enum dma_data_direction direction,
 		unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
-		ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
+	ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
 			   direction, attrs);
-	else
-		dma_direct_unmap_sg(dev, sglist, nelems, direction, attrs);
 }
 
 static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
@@ -113,8 +79,9 @@ static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct pci_controller *phb = pci_bus_to_host(pdev->bus);
 
-	return phb->controller_ops.iommu_bypass_supported &&
-		phb->controller_ops.iommu_bypass_supported(pdev, mask);
+	if (iommu_fixed_is_weak || !phb->controller_ops.iommu_bypass_supported)
+		return false;
+	return phb->controller_ops.iommu_bypass_supported(pdev, mask);
 }
 
 /* We support DMA to/from any memory page via the iommu */
@@ -123,7 +90,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
 	struct iommu_table *tbl = get_iommu_table_base(dev);
 
 	if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
-		dev->archdata.iommu_bypass = true;
+		dev->dma_ops_bypass = true;
 		dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
 		return 1;
 	}
@@ -141,7 +108,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
 	}
 
 	dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
-	dev->archdata.iommu_bypass = false;
+	dev->dma_ops_bypass = false;
 	return 1;
 }
 
@@ -153,47 +120,12 @@ u64 dma_iommu_get_required_mask(struct device *dev)
 	if (!tbl)
 		return 0;
 
-	if (dev_is_pci(dev)) {
-		u64 bypass_mask = dma_direct_get_required_mask(dev);
-
-		if (dma_iommu_bypass_supported(dev, bypass_mask))
-			return bypass_mask;
-	}
-
 	mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1);
 	mask += mask - 1;
 
 	return mask;
 }
 
-static void dma_iommu_sync_for_cpu(struct device *dev, dma_addr_t addr,
-		size_t size, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
-}
-
-static void dma_iommu_sync_for_device(struct device *dev, dma_addr_t addr,
-		size_t sz, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_single_for_device(dev, addr, sz, dir);
-}
-
-extern void dma_iommu_sync_sg_for_cpu(struct device *dev,
-		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_sg_for_cpu(dev, sgl, nents, dir);
-}
-
-extern void dma_iommu_sync_sg_for_device(struct device *dev,
-		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_sg_for_device(dev, sgl, nents, dir);
-}
-
 const struct dma_map_ops dma_iommu_ops = {
 	.alloc			= dma_iommu_alloc_coherent,
 	.free			= dma_iommu_free_coherent,
@@ -203,10 +135,6 @@ const struct dma_map_ops dma_iommu_ops = {
 	.map_page		= dma_iommu_map_page,
 	.unmap_page		= dma_iommu_unmap_page,
 	.get_required_mask	= dma_iommu_get_required_mask,
-	.sync_single_for_cpu	= dma_iommu_sync_for_cpu,
-	.sync_single_for_device	= dma_iommu_sync_for_device,
-	.sync_sg_for_cpu	= dma_iommu_sync_sg_for_cpu,
-	.sync_sg_for_device	= dma_iommu_sync_sg_for_device,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 2/2] powerpc: use the generic dma_ops_bypass mode
@ 2020-03-20 14:16   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	linuxppc-dev, Lu Baolu

Use the DMA API bypass mechanism for direct window mappings.  This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all.  It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/include/asm/device.h |  5 --
 arch/powerpc/kernel/dma-iommu.c   | 90 ++++---------------------------
 2 files changed, 9 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 266542769e4b..452402215e12 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -18,11 +18,6 @@ struct iommu_table;
  * drivers/macintosh/macio_asic.c
  */
 struct dev_archdata {
-	/*
-	 * Set to %true if the dma_iommu_ops are requested to use a direct
-	 * window instead of dynamically mapping memory.
-	 */
-	bool			iommu_bypass : 1;
 	/*
 	 * These two used to be a union. However, with the hybrid ops we need
 	 * both so here we store both a DMA offset for direct mappings and
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de2..569fecd7b5b2 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,23 +14,6 @@
  * Generic iommu implementation
  */
 
-/*
- * The coherent mask may be smaller than the real mask, check if we can
- * really use a direct window.
- */
-static inline bool dma_iommu_alloc_bypass(struct device *dev)
-{
-	return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
-		dma_direct_supported(dev, dev->coherent_dma_mask);
-}
-
-static inline bool dma_iommu_map_bypass(struct device *dev,
-		unsigned long attrs)
-{
-	return dev->archdata.iommu_bypass &&
-		(!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
-}
-
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -39,8 +22,6 @@ static void *dma_iommu_alloc_coherent(struct device *dev, size_t size,
 				      dma_addr_t *dma_handle, gfp_t flag,
 				      unsigned long attrs)
 {
-	if (dma_iommu_alloc_bypass(dev))
-		return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
 				    dma_handle, dev->coherent_dma_mask, flag,
 				    dev_to_node(dev));
@@ -50,11 +31,7 @@ static void dma_iommu_free_coherent(struct device *dev, size_t size,
 				    void *vaddr, dma_addr_t dma_handle,
 				    unsigned long attrs)
 {
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-	else
-		iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
-				dma_handle);
+	iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
 }
 
 /* Creates TCEs for a user provided buffer.  The user buffer must be
@@ -67,9 +44,6 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
 				     enum dma_data_direction direction,
 				     unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
-		return dma_direct_map_page(dev, page, offset, size, direction,
-				attrs);
 	return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
 			      size, dma_get_mask(dev), direction, attrs);
 }
@@ -79,11 +53,8 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
 				 size_t size, enum dma_data_direction direction,
 				 unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
-		iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
-				direction,  attrs);
-	else
-		dma_direct_unmap_page(dev, dma_handle, size, direction, attrs);
+	iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size, direction,
+			 attrs);
 }
 
 
@@ -91,8 +62,6 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
 			    int nelems, enum dma_data_direction direction,
 			    unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
-		return dma_direct_map_sg(dev, sglist, nelems, direction, attrs);
 	return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
 				dma_get_mask(dev), direction, attrs);
 }
@@ -101,11 +70,8 @@ static void dma_iommu_unmap_sg(struct device *dev, struct scatterlist *sglist,
 		int nelems, enum dma_data_direction direction,
 		unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
-		ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
+	ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
 			   direction, attrs);
-	else
-		dma_direct_unmap_sg(dev, sglist, nelems, direction, attrs);
 }
 
 static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
@@ -113,8 +79,9 @@ static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct pci_controller *phb = pci_bus_to_host(pdev->bus);
 
-	return phb->controller_ops.iommu_bypass_supported &&
-		phb->controller_ops.iommu_bypass_supported(pdev, mask);
+	if (iommu_fixed_is_weak || !phb->controller_ops.iommu_bypass_supported)
+		return false;
+	return phb->controller_ops.iommu_bypass_supported(pdev, mask);
 }
 
 /* We support DMA to/from any memory page via the iommu */
@@ -123,7 +90,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
 	struct iommu_table *tbl = get_iommu_table_base(dev);
 
 	if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
-		dev->archdata.iommu_bypass = true;
+		dev->dma_ops_bypass = true;
 		dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
 		return 1;
 	}
@@ -141,7 +108,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
 	}
 
 	dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
-	dev->archdata.iommu_bypass = false;
+	dev->dma_ops_bypass = false;
 	return 1;
 }
 
@@ -153,47 +120,12 @@ u64 dma_iommu_get_required_mask(struct device *dev)
 	if (!tbl)
 		return 0;
 
-	if (dev_is_pci(dev)) {
-		u64 bypass_mask = dma_direct_get_required_mask(dev);
-
-		if (dma_iommu_bypass_supported(dev, bypass_mask))
-			return bypass_mask;
-	}
-
 	mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1);
 	mask += mask - 1;
 
 	return mask;
 }
 
-static void dma_iommu_sync_for_cpu(struct device *dev, dma_addr_t addr,
-		size_t size, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
-}
-
-static void dma_iommu_sync_for_device(struct device *dev, dma_addr_t addr,
-		size_t sz, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_single_for_device(dev, addr, sz, dir);
-}
-
-extern void dma_iommu_sync_sg_for_cpu(struct device *dev,
-		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_sg_for_cpu(dev, sgl, nents, dir);
-}
-
-extern void dma_iommu_sync_sg_for_device(struct device *dev,
-		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_sg_for_device(dev, sgl, nents, dir);
-}
-
 const struct dma_map_ops dma_iommu_ops = {
 	.alloc			= dma_iommu_alloc_coherent,
 	.free			= dma_iommu_free_coherent,
@@ -203,10 +135,6 @@ const struct dma_map_ops dma_iommu_ops = {
 	.map_page		= dma_iommu_map_page,
 	.unmap_page		= dma_iommu_unmap_page,
 	.get_required_mask	= dma_iommu_get_required_mask,
-	.sync_single_for_cpu	= dma_iommu_sync_for_cpu,
-	.sync_single_for_device	= dma_iommu_sync_for_device,
-	.sync_sg_for_cpu	= dma_iommu_sync_sg_for_cpu,
-	.sync_sg_for_device	= dma_iommu_sync_sg_for_device,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 2/2] powerpc: use the generic dma_ops_bypass mode
@ 2020-03-20 14:16   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-20 14:16 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, linuxppc-dev

Use the DMA API bypass mechanism for direct window mappings.  This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all.  It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/include/asm/device.h |  5 --
 arch/powerpc/kernel/dma-iommu.c   | 90 ++++---------------------------
 2 files changed, 9 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 266542769e4b..452402215e12 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -18,11 +18,6 @@ struct iommu_table;
  * drivers/macintosh/macio_asic.c
  */
 struct dev_archdata {
-	/*
-	 * Set to %true if the dma_iommu_ops are requested to use a direct
-	 * window instead of dynamically mapping memory.
-	 */
-	bool			iommu_bypass : 1;
 	/*
 	 * These two used to be a union. However, with the hybrid ops we need
 	 * both so here we store both a DMA offset for direct mappings and
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de2..569fecd7b5b2 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,23 +14,6 @@
  * Generic iommu implementation
  */
 
-/*
- * The coherent mask may be smaller than the real mask, check if we can
- * really use a direct window.
- */
-static inline bool dma_iommu_alloc_bypass(struct device *dev)
-{
-	return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
-		dma_direct_supported(dev, dev->coherent_dma_mask);
-}
-
-static inline bool dma_iommu_map_bypass(struct device *dev,
-		unsigned long attrs)
-{
-	return dev->archdata.iommu_bypass &&
-		(!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
-}
-
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -39,8 +22,6 @@ static void *dma_iommu_alloc_coherent(struct device *dev, size_t size,
 				      dma_addr_t *dma_handle, gfp_t flag,
 				      unsigned long attrs)
 {
-	if (dma_iommu_alloc_bypass(dev))
-		return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
 				    dma_handle, dev->coherent_dma_mask, flag,
 				    dev_to_node(dev));
@@ -50,11 +31,7 @@ static void dma_iommu_free_coherent(struct device *dev, size_t size,
 				    void *vaddr, dma_addr_t dma_handle,
 				    unsigned long attrs)
 {
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-	else
-		iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
-				dma_handle);
+	iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
 }
 
 /* Creates TCEs for a user provided buffer.  The user buffer must be
@@ -67,9 +44,6 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
 				     enum dma_data_direction direction,
 				     unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
-		return dma_direct_map_page(dev, page, offset, size, direction,
-				attrs);
 	return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
 			      size, dma_get_mask(dev), direction, attrs);
 }
@@ -79,11 +53,8 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
 				 size_t size, enum dma_data_direction direction,
 				 unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
-		iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
-				direction,  attrs);
-	else
-		dma_direct_unmap_page(dev, dma_handle, size, direction, attrs);
+	iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size, direction,
+			 attrs);
 }
 
 
@@ -91,8 +62,6 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
 			    int nelems, enum dma_data_direction direction,
 			    unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
-		return dma_direct_map_sg(dev, sglist, nelems, direction, attrs);
 	return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
 				dma_get_mask(dev), direction, attrs);
 }
@@ -101,11 +70,8 @@ static void dma_iommu_unmap_sg(struct device *dev, struct scatterlist *sglist,
 		int nelems, enum dma_data_direction direction,
 		unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
-		ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
+	ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
 			   direction, attrs);
-	else
-		dma_direct_unmap_sg(dev, sglist, nelems, direction, attrs);
 }
 
 static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
@@ -113,8 +79,9 @@ static bool dma_iommu_bypass_supported(struct device *dev, u64 mask)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct pci_controller *phb = pci_bus_to_host(pdev->bus);
 
-	return phb->controller_ops.iommu_bypass_supported &&
-		phb->controller_ops.iommu_bypass_supported(pdev, mask);
+	if (iommu_fixed_is_weak || !phb->controller_ops.iommu_bypass_supported)
+		return false;
+	return phb->controller_ops.iommu_bypass_supported(pdev, mask);
 }
 
 /* We support DMA to/from any memory page via the iommu */
@@ -123,7 +90,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
 	struct iommu_table *tbl = get_iommu_table_base(dev);
 
 	if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
-		dev->archdata.iommu_bypass = true;
+		dev->dma_ops_bypass = true;
 		dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
 		return 1;
 	}
@@ -141,7 +108,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
 	}
 
 	dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
-	dev->archdata.iommu_bypass = false;
+	dev->dma_ops_bypass = false;
 	return 1;
 }
 
@@ -153,47 +120,12 @@ u64 dma_iommu_get_required_mask(struct device *dev)
 	if (!tbl)
 		return 0;
 
-	if (dev_is_pci(dev)) {
-		u64 bypass_mask = dma_direct_get_required_mask(dev);
-
-		if (dma_iommu_bypass_supported(dev, bypass_mask))
-			return bypass_mask;
-	}
-
 	mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1);
 	mask += mask - 1;
 
 	return mask;
 }
 
-static void dma_iommu_sync_for_cpu(struct device *dev, dma_addr_t addr,
-		size_t size, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
-}
-
-static void dma_iommu_sync_for_device(struct device *dev, dma_addr_t addr,
-		size_t sz, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_single_for_device(dev, addr, sz, dir);
-}
-
-extern void dma_iommu_sync_sg_for_cpu(struct device *dev,
-		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_sg_for_cpu(dev, sgl, nents, dir);
-}
-
-extern void dma_iommu_sync_sg_for_device(struct device *dev,
-		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
-{
-	if (dma_iommu_alloc_bypass(dev))
-		dma_direct_sync_sg_for_device(dev, sgl, nents, dir);
-}
-
 const struct dma_map_ops dma_iommu_ops = {
 	.alloc			= dma_iommu_alloc_coherent,
 	.free			= dma_iommu_free_coherent,
@@ -203,10 +135,6 @@ const struct dma_map_ops dma_iommu_ops = {
 	.map_page		= dma_iommu_map_page,
 	.unmap_page		= dma_iommu_unmap_page,
 	.get_required_mask	= dma_iommu_get_required_mask,
-	.sync_single_for_cpu	= dma_iommu_sync_for_cpu,
-	.sync_single_for_device	= dma_iommu_sync_for_device,
-	.sync_sg_for_cpu	= dma_iommu_sync_sg_for_cpu,
-	.sync_sg_for_device	= dma_iommu_sync_sg_for_device,
 	.mmap			= dma_common_mmap,
 	.get_sgtable		= dma_common_get_sgtable,
 };
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-20 14:16   ` Christoph Hellwig
  (?)
@ 2020-03-20 15:02     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-20 15:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Alexey Kardashevskiy, linuxppc-dev, Lu Baolu,
	Joerg Roedel, Robin Murphy, linux-kernel

On Fri, Mar 20, 2020 at 03:16:39PM +0100, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/device.h      |  6 ++++++
>  include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>  kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>  3 files changed, 51 insertions(+), 21 deletions(-)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-20 15:02     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-20 15:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexey Kardashevskiy, linuxppc-dev, Joerg Roedel, linux-kernel,
	iommu, Robin Murphy, Lu Baolu

On Fri, Mar 20, 2020 at 03:16:39PM +0100, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/device.h      |  6 ++++++
>  include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>  kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>  3 files changed, 51 insertions(+), 21 deletions(-)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-20 15:02     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 94+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-20 15:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linuxppc-dev, linux-kernel, iommu, Robin Murphy

On Fri, Mar 20, 2020 at 03:16:39PM +0100, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/device.h      |  6 ++++++
>  include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>  kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>  3 files changed, 51 insertions(+), 21 deletions(-)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-20 14:16   ` Christoph Hellwig
  (?)
@ 2020-03-23  1:28     ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-23  1:28 UTC (permalink / raw)
  To: Christoph Hellwig, iommu
  Cc: linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel, Aneesh Kumar K.V



On 21/03/2020 01:16, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/device.h      |  6 ++++++
>  include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>  kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>  3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 0cd7c647c16c..09be8bb2c4a6 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -525,6 +525,11 @@ struct dev_links_info {
>   *		  sync_state() callback.
>   * @dma_coherent: this particular device is dma coherent, even if the
>   *		architecture supports non-coherent devices.
> + * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> + *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
> + *		and optionall (if the coherent mask is large enough) also
> + *		for dma allocations.  This flag is managed by the dma ops
> + *		instance from ->dma_supported.
>   *
>   * At the lowest level, every device in a Linux system is represented by an
>   * instance of struct device. The device structure contains the information
> @@ -625,6 +630,7 @@ struct device {
>      defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>  	bool			dma_coherent:1;
>  #endif
> +	bool			dma_ops_bypass : 1;
>  };
>  
>  static inline struct device *kobj_to_dev(struct kobject *kobj)
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 330ad58fbf4d..c3af0cf5e435 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
>  }
>  #endif /* CONFIG_DMA_DECLARE_COHERENT */
>  
> -static inline bool dma_is_direct(const struct dma_map_ops *ops)
> +/*
> + * Check if the devices uses a direct mapping for streaming DMA operations.
> + * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
> + * enough.
> + */
> +static inline bool dma_map_direct(struct device *dev,
> +		const struct dma_map_ops *ops)
>  {
> -	return likely(!ops);
> +	return likely(!ops) || dev->dma_ops_bypass;
>  }
>  
>  /*
> @@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>  	dma_addr_t addr;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>  	else
>  		addr = ops->map_page(dev, page, offset, size, dir, attrs);
> @@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_unmap_page(dev, addr, size, dir, attrs);
>  	else if (ops->unmap_page)
>  		ops->unmap_page(dev, addr, size, dir, attrs);
> @@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>  	int ents;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>  	else
>  		ents = ops->map_sg(dev, sg, nents, dir, attrs);
> @@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  	debug_dma_unmap_sg(dev, sg, nents, dir);
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
>  	else if (ops->unmap_sg)
>  		ops->unmap_sg(dev, sg, nents, dir, attrs);
> @@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>  	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>  		return DMA_MAPPING_ERROR;
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
>  	else if (ops->map_resource)
>  		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
> @@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (!dma_is_direct(ops) && ops->unmap_resource)
> +	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
>  		ops->unmap_resource(dev, addr, size, dir, attrs);
>  	debug_dma_unmap_resource(dev, addr, size, dir);
>  }
> @@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>  	else if (ops->sync_single_for_cpu)
>  		ops->sync_single_for_cpu(dev, addr, size, dir);
> @@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_single_for_device(dev, addr, size, dir);
>  	else if (ops->sync_single_for_device)
>  		ops->sync_single_for_device(dev, addr, size, dir);
> @@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>  	else if (ops->sync_sg_for_cpu)
>  		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
> @@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
>  	else if (ops->sync_sg_for_device)
>  		ops->sync_sg_for_device(dev, sg, nelems, dir);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 12ff766ec1fa..fdea45574345 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  }
>  EXPORT_SYMBOL(dmam_alloc_attrs);
>  
> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> +{
> +	if (!ops)
> +		return true;
> +
> +	/*
> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> +	 * is large enough.
> +	 */
> +	if (dev->dma_ops_bypass) {
> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> +				dma_direct_get_required_mask(dev))
> +			return true;
> +	}


Why not do this in dma_map_direct() as well?
Or simply have just one dma_map_direct()?

And one more general question - we need a way to use non-direct IOMMU
for RAM above certain limit.

Let's say we have a system with:
0 .. 0x1.0000.0000
0x100.0000.0000 .. 0x101.0000.0000

2x4G, each is 1TB aligned. And we can map directly only the first 4GB
(because of the maximum IOMMU table size) but not the other. And 1:1 on
that "pseries" is done with offset=0x0800.0000.0000.0000.

So we want to check every bus address against dev->bus_dma_limit, not
dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
tried. Does this sound reasonable? Thanks,


> +
> +	return false;
> +}
> +
>  /*
>   * Create scatter-list for the already allocated DMA buffer.
>   */
> @@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
>  				size, attrs);
>  	if (!ops->get_sgtable)
> @@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_can_mmap(dev);
>  	return ops->mmap != NULL;
>  }
> @@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
>  				attrs);
>  	if (!ops->mmap)
> @@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		return dma_direct_get_required_mask(dev);
>  	if (ops->get_required_mask)
>  		return ops->get_required_mask(dev);
> @@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  	/* let the implementation decide on the zone to allocate from: */
>  	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
>  	else if (ops->alloc)
>  		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> @@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>  		return;
>  
>  	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
>  	else if (ops->free)
>  		ops->free(dev, size, cpu_addr, dma_handle, attrs);
> @@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (!ops)
>  		return dma_direct_supported(dev, mask);
>  	if (!ops->dma_supported)
>  		return 1;
> @@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		arch_dma_cache_sync(dev, vaddr, size, dir);
>  	else if (ops->cache_sync)
>  		ops->cache_sync(dev, vaddr, size, dir);
> @@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  	size_t size = SIZE_MAX;
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		size = dma_direct_max_mapping_size(dev);
>  	else if (ops && ops->max_mapping_size)
>  		size = ops->max_mapping_size(dev);
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23  1:28     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-23  1:28 UTC (permalink / raw)
  To: Christoph Hellwig, iommu
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Aneesh Kumar K.V, linuxppc-dev, Lu Baolu



On 21/03/2020 01:16, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/device.h      |  6 ++++++
>  include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>  kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>  3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 0cd7c647c16c..09be8bb2c4a6 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -525,6 +525,11 @@ struct dev_links_info {
>   *		  sync_state() callback.
>   * @dma_coherent: this particular device is dma coherent, even if the
>   *		architecture supports non-coherent devices.
> + * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> + *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
> + *		and optionall (if the coherent mask is large enough) also
> + *		for dma allocations.  This flag is managed by the dma ops
> + *		instance from ->dma_supported.
>   *
>   * At the lowest level, every device in a Linux system is represented by an
>   * instance of struct device. The device structure contains the information
> @@ -625,6 +630,7 @@ struct device {
>      defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>  	bool			dma_coherent:1;
>  #endif
> +	bool			dma_ops_bypass : 1;
>  };
>  
>  static inline struct device *kobj_to_dev(struct kobject *kobj)
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 330ad58fbf4d..c3af0cf5e435 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
>  }
>  #endif /* CONFIG_DMA_DECLARE_COHERENT */
>  
> -static inline bool dma_is_direct(const struct dma_map_ops *ops)
> +/*
> + * Check if the devices uses a direct mapping for streaming DMA operations.
> + * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
> + * enough.
> + */
> +static inline bool dma_map_direct(struct device *dev,
> +		const struct dma_map_ops *ops)
>  {
> -	return likely(!ops);
> +	return likely(!ops) || dev->dma_ops_bypass;
>  }
>  
>  /*
> @@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>  	dma_addr_t addr;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>  	else
>  		addr = ops->map_page(dev, page, offset, size, dir, attrs);
> @@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_unmap_page(dev, addr, size, dir, attrs);
>  	else if (ops->unmap_page)
>  		ops->unmap_page(dev, addr, size, dir, attrs);
> @@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>  	int ents;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>  	else
>  		ents = ops->map_sg(dev, sg, nents, dir, attrs);
> @@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  	debug_dma_unmap_sg(dev, sg, nents, dir);
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
>  	else if (ops->unmap_sg)
>  		ops->unmap_sg(dev, sg, nents, dir, attrs);
> @@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>  	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>  		return DMA_MAPPING_ERROR;
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
>  	else if (ops->map_resource)
>  		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
> @@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (!dma_is_direct(ops) && ops->unmap_resource)
> +	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
>  		ops->unmap_resource(dev, addr, size, dir, attrs);
>  	debug_dma_unmap_resource(dev, addr, size, dir);
>  }
> @@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>  	else if (ops->sync_single_for_cpu)
>  		ops->sync_single_for_cpu(dev, addr, size, dir);
> @@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_single_for_device(dev, addr, size, dir);
>  	else if (ops->sync_single_for_device)
>  		ops->sync_single_for_device(dev, addr, size, dir);
> @@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>  	else if (ops->sync_sg_for_cpu)
>  		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
> @@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
>  	else if (ops->sync_sg_for_device)
>  		ops->sync_sg_for_device(dev, sg, nelems, dir);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 12ff766ec1fa..fdea45574345 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  }
>  EXPORT_SYMBOL(dmam_alloc_attrs);
>  
> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> +{
> +	if (!ops)
> +		return true;
> +
> +	/*
> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> +	 * is large enough.
> +	 */
> +	if (dev->dma_ops_bypass) {
> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> +				dma_direct_get_required_mask(dev))
> +			return true;
> +	}


Why not do this in dma_map_direct() as well?
Or simply have just one dma_map_direct()?

And one more general question - we need a way to use non-direct IOMMU
for RAM above certain limit.

Let's say we have a system with:
0 .. 0x1.0000.0000
0x100.0000.0000 .. 0x101.0000.0000

2x4G, each is 1TB aligned. And we can map directly only the first 4GB
(because of the maximum IOMMU table size) but not the other. And 1:1 on
that "pseries" is done with offset=0x0800.0000.0000.0000.

So we want to check every bus address against dev->bus_dma_limit, not
dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
tried. Does this sound reasonable? Thanks,


> +
> +	return false;
> +}
> +
>  /*
>   * Create scatter-list for the already allocated DMA buffer.
>   */
> @@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
>  				size, attrs);
>  	if (!ops->get_sgtable)
> @@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_can_mmap(dev);
>  	return ops->mmap != NULL;
>  }
> @@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
>  				attrs);
>  	if (!ops->mmap)
> @@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		return dma_direct_get_required_mask(dev);
>  	if (ops->get_required_mask)
>  		return ops->get_required_mask(dev);
> @@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  	/* let the implementation decide on the zone to allocate from: */
>  	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
>  	else if (ops->alloc)
>  		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> @@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>  		return;
>  
>  	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
>  	else if (ops->free)
>  		ops->free(dev, size, cpu_addr, dma_handle, attrs);
> @@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (!ops)
>  		return dma_direct_supported(dev, mask);
>  	if (!ops->dma_supported)
>  		return 1;
> @@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		arch_dma_cache_sync(dev, vaddr, size, dir);
>  	else if (ops->cache_sync)
>  		ops->cache_sync(dev, vaddr, size, dir);
> @@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  	size_t size = SIZE_MAX;
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		size = dma_direct_max_mapping_size(dev);
>  	else if (ops && ops->max_mapping_size)
>  		size = ops->max_mapping_size(dev);
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23  1:28     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-23  1:28 UTC (permalink / raw)
  To: Christoph Hellwig, iommu
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, Aneesh Kumar K.V,
	linuxppc-dev



On 21/03/2020 01:16, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/device.h      |  6 ++++++
>  include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>  kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>  3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 0cd7c647c16c..09be8bb2c4a6 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -525,6 +525,11 @@ struct dev_links_info {
>   *		  sync_state() callback.
>   * @dma_coherent: this particular device is dma coherent, even if the
>   *		architecture supports non-coherent devices.
> + * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> + *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
> + *		and optionall (if the coherent mask is large enough) also
> + *		for dma allocations.  This flag is managed by the dma ops
> + *		instance from ->dma_supported.
>   *
>   * At the lowest level, every device in a Linux system is represented by an
>   * instance of struct device. The device structure contains the information
> @@ -625,6 +630,7 @@ struct device {
>      defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>  	bool			dma_coherent:1;
>  #endif
> +	bool			dma_ops_bypass : 1;
>  };
>  
>  static inline struct device *kobj_to_dev(struct kobject *kobj)
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 330ad58fbf4d..c3af0cf5e435 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
>  }
>  #endif /* CONFIG_DMA_DECLARE_COHERENT */
>  
> -static inline bool dma_is_direct(const struct dma_map_ops *ops)
> +/*
> + * Check if the devices uses a direct mapping for streaming DMA operations.
> + * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
> + * enough.
> + */
> +static inline bool dma_map_direct(struct device *dev,
> +		const struct dma_map_ops *ops)
>  {
> -	return likely(!ops);
> +	return likely(!ops) || dev->dma_ops_bypass;
>  }
>  
>  /*
> @@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>  	dma_addr_t addr;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>  	else
>  		addr = ops->map_page(dev, page, offset, size, dir, attrs);
> @@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_unmap_page(dev, addr, size, dir, attrs);
>  	else if (ops->unmap_page)
>  		ops->unmap_page(dev, addr, size, dir, attrs);
> @@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>  	int ents;
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>  	else
>  		ents = ops->map_sg(dev, sg, nents, dir, attrs);
> @@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  	debug_dma_unmap_sg(dev, sg, nents, dir);
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
>  	else if (ops->unmap_sg)
>  		ops->unmap_sg(dev, sg, nents, dir, attrs);
> @@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>  	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>  		return DMA_MAPPING_ERROR;
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
>  	else if (ops->map_resource)
>  		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
> @@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (!dma_is_direct(ops) && ops->unmap_resource)
> +	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
>  		ops->unmap_resource(dev, addr, size, dir, attrs);
>  	debug_dma_unmap_resource(dev, addr, size, dir);
>  }
> @@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>  	else if (ops->sync_single_for_cpu)
>  		ops->sync_single_for_cpu(dev, addr, size, dir);
> @@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_single_for_device(dev, addr, size, dir);
>  	else if (ops->sync_single_for_device)
>  		ops->sync_single_for_device(dev, addr, size, dir);
> @@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>  	else if (ops->sync_sg_for_cpu)
>  		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
> @@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
>  	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
>  	else if (ops->sync_sg_for_device)
>  		ops->sync_sg_for_device(dev, sg, nelems, dir);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 12ff766ec1fa..fdea45574345 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  }
>  EXPORT_SYMBOL(dmam_alloc_attrs);
>  
> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> +{
> +	if (!ops)
> +		return true;
> +
> +	/*
> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> +	 * is large enough.
> +	 */
> +	if (dev->dma_ops_bypass) {
> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> +				dma_direct_get_required_mask(dev))
> +			return true;
> +	}


Why not do this in dma_map_direct() as well?
Or simply have just one dma_map_direct()?

And one more general question - we need a way to use non-direct IOMMU
for RAM above certain limit.

Let's say we have a system with:
0 .. 0x1.0000.0000
0x100.0000.0000 .. 0x101.0000.0000

2x4G, each is 1TB aligned. And we can map directly only the first 4GB
(because of the maximum IOMMU table size) but not the other. And 1:1 on
that "pseries" is done with offset=0x0800.0000.0000.0000.

So we want to check every bus address against dev->bus_dma_limit, not
dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
tried. Does this sound reasonable? Thanks,


> +
> +	return false;
> +}
> +
>  /*
>   * Create scatter-list for the already allocated DMA buffer.
>   */
> @@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
>  				size, attrs);
>  	if (!ops->get_sgtable)
> @@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_can_mmap(dev);
>  	return ops->mmap != NULL;
>  }
> @@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
>  				attrs);
>  	if (!ops->mmap)
> @@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		return dma_direct_get_required_mask(dev);
>  	if (ops->get_required_mask)
>  		return ops->get_required_mask(dev);
> @@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  	/* let the implementation decide on the zone to allocate from: */
>  	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
>  	else if (ops->alloc)
>  		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> @@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>  		return;
>  
>  	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
>  	else if (ops->free)
>  		ops->free(dev, size, cpu_addr, dma_handle, attrs);
> @@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
>  {
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  
> -	if (dma_is_direct(ops))
> +	if (!ops)
>  		return dma_direct_supported(dev, mask);
>  	if (!ops->dma_supported)
>  		return 1;
> @@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>  
>  	BUG_ON(!valid_dma_direction(dir));
>  
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>  		arch_dma_cache_sync(dev, vaddr, size, dir);
>  	else if (ops->cache_sync)
>  		ops->cache_sync(dev, vaddr, size, dir);
> @@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
>  	const struct dma_map_ops *ops = get_dma_ops(dev);
>  	size_t size = SIZE_MAX;
>  
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>  		size = dma_direct_max_mapping_size(dev);
>  	else if (ops && ops->max_mapping_size)
>  		size = ops->max_mapping_size(dev);
> 

-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23  1:28     ` Alexey Kardashevskiy
  (?)
@ 2020-03-23  8:37       ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23  8:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Aneesh Kumar K.V

On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote:

[full quote deleted, please follow proper quoting rules]

> > +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> > +{
> > +	if (!ops)
> > +		return true;
> > +
> > +	/*
> > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> > +	 * is large enough.
> > +	 */
> > +	if (dev->dma_ops_bypass) {
> > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> > +				dma_direct_get_required_mask(dev))
> > +			return true;
> > +	}
> 
> 
> Why not do this in dma_map_direct() as well?

Mostly beacuse it is a relatively expensive operation, including a
fls64.

> Or simply have just one dma_map_direct()?

What do you mean with that?

> And one more general question - we need a way to use non-direct IOMMU
> for RAM above certain limit.
> 
> Let's say we have a system with:
> 0 .. 0x1.0000.0000
> 0x100.0000.0000 .. 0x101.0000.0000
> 
> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> that "pseries" is done with offset=0x0800.0000.0000.0000.
> 
> So we want to check every bus address against dev->bus_dma_limit, not
> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
> tried. Does this sound reasonable? Thanks,

bus_dma_limit is just another limiting factor applied on top of
coherent_dma_mask or dma_mask respectively.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23  8:37       ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23  8:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote:

[full quote deleted, please follow proper quoting rules]

> > +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> > +{
> > +	if (!ops)
> > +		return true;
> > +
> > +	/*
> > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> > +	 * is large enough.
> > +	 */
> > +	if (dev->dma_ops_bypass) {
> > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> > +				dma_direct_get_required_mask(dev))
> > +			return true;
> > +	}
> 
> 
> Why not do this in dma_map_direct() as well?

Mostly beacuse it is a relatively expensive operation, including a
fls64.

> Or simply have just one dma_map_direct()?

What do you mean with that?

> And one more general question - we need a way to use non-direct IOMMU
> for RAM above certain limit.
> 
> Let's say we have a system with:
> 0 .. 0x1.0000.0000
> 0x100.0000.0000 .. 0x101.0000.0000
> 
> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> that "pseries" is done with offset=0x0800.0000.0000.0000.
> 
> So we want to check every bus address against dev->bus_dma_limit, not
> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
> tried. Does this sound reasonable? Thanks,

bus_dma_limit is just another limiting factor applied on top of
coherent_dma_mask or dma_mask respectively.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23  8:37       ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23  8:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig

On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote:

[full quote deleted, please follow proper quoting rules]

> > +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> > +{
> > +	if (!ops)
> > +		return true;
> > +
> > +	/*
> > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> > +	 * is large enough.
> > +	 */
> > +	if (dev->dma_ops_bypass) {
> > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> > +				dma_direct_get_required_mask(dev))
> > +			return true;
> > +	}
> 
> 
> Why not do this in dma_map_direct() as well?

Mostly beacuse it is a relatively expensive operation, including a
fls64.

> Or simply have just one dma_map_direct()?

What do you mean with that?

> And one more general question - we need a way to use non-direct IOMMU
> for RAM above certain limit.
> 
> Let's say we have a system with:
> 0 .. 0x1.0000.0000
> 0x100.0000.0000 .. 0x101.0000.0000
> 
> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> that "pseries" is done with offset=0x0800.0000.0000.0000.
> 
> So we want to check every bus address against dev->bus_dma_limit, not
> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
> tried. Does this sound reasonable? Thanks,

bus_dma_limit is just another limiting factor applied on top of
coherent_dma_mask or dma_mask respectively.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23  8:37       ` Christoph Hellwig
  (?)
@ 2020-03-23  8:50         ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23  8:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Aneesh Kumar K.V

On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote:
> > > +	/*
> > > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> > > +	 * is large enough.
> > > +	 */
> > > +	if (dev->dma_ops_bypass) {
> > > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> > > +				dma_direct_get_required_mask(dev))
> > > +			return true;
> > > +	}
> > 
> > 
> > Why not do this in dma_map_direct() as well?
> 
> Mostly beacuse it is a relatively expensive operation, including a
> fls64.

Which I guess isn't too bad compared to a dynamic IOMMU mapping.  Can
you just send a draft patch for what you'd like to see for ppc?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23  8:50         ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23  8:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote:
> > > +	/*
> > > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> > > +	 * is large enough.
> > > +	 */
> > > +	if (dev->dma_ops_bypass) {
> > > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> > > +				dma_direct_get_required_mask(dev))
> > > +			return true;
> > > +	}
> > 
> > 
> > Why not do this in dma_map_direct() as well?
> 
> Mostly beacuse it is a relatively expensive operation, including a
> fls64.

Which I guess isn't too bad compared to a dynamic IOMMU mapping.  Can
you just send a draft patch for what you'd like to see for ppc?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23  8:50         ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23  8:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig

On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote:
> > > +	/*
> > > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> > > +	 * is large enough.
> > > +	 */
> > > +	if (dev->dma_ops_bypass) {
> > > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> > > +				dma_direct_get_required_mask(dev))
> > > +			return true;
> > > +	}
> > 
> > 
> > Why not do this in dma_map_direct() as well?
> 
> Mostly beacuse it is a relatively expensive operation, including a
> fls64.

Which I guess isn't too bad compared to a dynamic IOMMU mapping.  Can
you just send a draft patch for what you'd like to see for ppc?
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23  8:37       ` Christoph Hellwig
  (?)
@ 2020-03-23  8:58         ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-23  8:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel, Aneesh Kumar K.V



On 23/03/2020 19:37, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote:
> 
> [full quote deleted, please follow proper quoting rules]
> 
>>> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
>>> +{
>>> +	if (!ops)
>>> +		return true;
>>> +
>>> +	/*
>>> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
>>> +	 * is large enough.
>>> +	 */
>>> +	if (dev->dma_ops_bypass) {
>>> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
>>> +				dma_direct_get_required_mask(dev))
>>> +			return true;
>>> +	}
>>
>>
>> Why not do this in dma_map_direct() as well?
> 
> Mostly beacuse it is a relatively expensive operation, including a
> fls64.

Ah, ok.

>> Or simply have just one dma_map_direct()?
> 
> What do you mean with that?

I mean use dma_alloc_direct() instead of dma_map_direct() everywhere,
you explained just above.

> 
>> And one more general question - we need a way to use non-direct IOMMU
>> for RAM above certain limit.
>>
>> Let's say we have a system with:
>> 0 .. 0x1.0000.0000
>> 0x100.0000.0000 .. 0x101.0000.0000
>>
>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>
>> So we want to check every bus address against dev->bus_dma_limit, not
>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>> tried. Does this sound reasonable? Thanks,
> 
> bus_dma_limit is just another limiting factor applied on top of
> coherent_dma_mask or dma_mask respectively.

This is not enough for the task: in my example, I'd set bus limit to
0x0800.0001.0000.0000 but this would disable bypass for all RAM
addresses - the first and the second 4GB blocks.


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23  8:58         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-23  8:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Greg Kroah-Hartman, Joerg Roedel, linuxppc-dev, linux-kernel,
	iommu, Aneesh Kumar K.V, Robin Murphy, Lu Baolu



On 23/03/2020 19:37, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote:
> 
> [full quote deleted, please follow proper quoting rules]
> 
>>> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
>>> +{
>>> +	if (!ops)
>>> +		return true;
>>> +
>>> +	/*
>>> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
>>> +	 * is large enough.
>>> +	 */
>>> +	if (dev->dma_ops_bypass) {
>>> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
>>> +				dma_direct_get_required_mask(dev))
>>> +			return true;
>>> +	}
>>
>>
>> Why not do this in dma_map_direct() as well?
> 
> Mostly beacuse it is a relatively expensive operation, including a
> fls64.

Ah, ok.

>> Or simply have just one dma_map_direct()?
> 
> What do you mean with that?

I mean use dma_alloc_direct() instead of dma_map_direct() everywhere,
you explained just above.

> 
>> And one more general question - we need a way to use non-direct IOMMU
>> for RAM above certain limit.
>>
>> Let's say we have a system with:
>> 0 .. 0x1.0000.0000
>> 0x100.0000.0000 .. 0x101.0000.0000
>>
>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>
>> So we want to check every bus address against dev->bus_dma_limit, not
>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>> tried. Does this sound reasonable? Thanks,
> 
> bus_dma_limit is just another limiting factor applied on top of
> coherent_dma_mask or dma_mask respectively.

This is not enough for the task: in my example, I'd set bus limit to
0x0800.0001.0000.0000 but this would disable bypass for all RAM
addresses - the first and the second 4GB blocks.


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23  8:58         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-23  8:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Greg Kroah-Hartman, linuxppc-dev, linux-kernel, iommu,
	Aneesh Kumar K.V, Robin Murphy



On 23/03/2020 19:37, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote:
> 
> [full quote deleted, please follow proper quoting rules]
> 
>>> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
>>> +{
>>> +	if (!ops)
>>> +		return true;
>>> +
>>> +	/*
>>> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
>>> +	 * is large enough.
>>> +	 */
>>> +	if (dev->dma_ops_bypass) {
>>> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
>>> +				dma_direct_get_required_mask(dev))
>>> +			return true;
>>> +	}
>>
>>
>> Why not do this in dma_map_direct() as well?
> 
> Mostly beacuse it is a relatively expensive operation, including a
> fls64.

Ah, ok.

>> Or simply have just one dma_map_direct()?
> 
> What do you mean with that?

I mean use dma_alloc_direct() instead of dma_map_direct() everywhere,
you explained just above.

> 
>> And one more general question - we need a way to use non-direct IOMMU
>> for RAM above certain limit.
>>
>> Let's say we have a system with:
>> 0 .. 0x1.0000.0000
>> 0x100.0000.0000 .. 0x101.0000.0000
>>
>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>
>> So we want to check every bus address against dev->bus_dma_limit, not
>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>> tried. Does this sound reasonable? Thanks,
> 
> bus_dma_limit is just another limiting factor applied on top of
> coherent_dma_mask or dma_mask respectively.

This is not enough for the task: in my example, I'd set bus limit to
0x0800.0001.0000.0000 but this would disable bypass for all RAM
addresses - the first and the second 4GB blocks.


-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-20 14:16   ` Christoph Hellwig
  (?)
@ 2020-03-23 12:14     ` Robin Murphy
  -1 siblings, 0 replies; 94+ messages in thread
From: Robin Murphy @ 2020-03-23 12:14 UTC (permalink / raw)
  To: Christoph Hellwig, iommu, Alexey Kardashevskiy
  Cc: linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, linux-kernel

On 2020-03-20 2:16 pm, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.

Hmm, this is _almost_, but not quite the same as the case where drivers 
are managing their own IOMMU mappings, but still need to use streaming 
DMA for cache maintenance on the underlying pages. For that we need the 
ops bypass to be a "true" bypass and also avoid SWIOTLB regardless of 
the device's DMA mask. That behaviour should in fact be fine for the 
opportunistic bypass case here as well, since the mask being "big 
enough" implies by definition that this should never need to bounce either.

For the (hopefully less common) third case where, due to groups or user 
overrides, we end up giving an identity DMA domain to a device with 
limited DMA masks which _does_ need SWIOTLB, I'd like to think we can 
solve that by not giving the device IOMMU DMA ops in the first place, 
such that it never needs to engage the bypass mechanism at all.

Thoughts?

Robin.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/device.h      |  6 ++++++
>   include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>   kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>   3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 0cd7c647c16c..09be8bb2c4a6 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -525,6 +525,11 @@ struct dev_links_info {
>    *		  sync_state() callback.
>    * @dma_coherent: this particular device is dma coherent, even if the
>    *		architecture supports non-coherent devices.
> + * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> + *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
> + *		and optionall (if the coherent mask is large enough) also
> + *		for dma allocations.  This flag is managed by the dma ops
> + *		instance from ->dma_supported.
>    *
>    * At the lowest level, every device in a Linux system is represented by an
>    * instance of struct device. The device structure contains the information
> @@ -625,6 +630,7 @@ struct device {
>       defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>   	bool			dma_coherent:1;
>   #endif
> +	bool			dma_ops_bypass : 1;
>   };
>   
>   static inline struct device *kobj_to_dev(struct kobject *kobj)
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 330ad58fbf4d..c3af0cf5e435 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
>   }
>   #endif /* CONFIG_DMA_DECLARE_COHERENT */
>   
> -static inline bool dma_is_direct(const struct dma_map_ops *ops)
> +/*
> + * Check if the devices uses a direct mapping for streaming DMA operations.
> + * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
> + * enough.
> + */
> +static inline bool dma_map_direct(struct device *dev,
> +		const struct dma_map_ops *ops)
>   {
> -	return likely(!ops);
> +	return likely(!ops) || dev->dma_ops_bypass;
>   }
>   
>   /*
> @@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>   	dma_addr_t addr;
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>   	else
>   		addr = ops->map_page(dev, page, offset, size, dir, attrs);
> @@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_unmap_page(dev, addr, size, dir, attrs);
>   	else if (ops->unmap_page)
>   		ops->unmap_page(dev, addr, size, dir, attrs);
> @@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   	int ents;
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>   	else
>   		ents = ops->map_sg(dev, sg, nents, dir, attrs);
> @@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   	debug_dma_unmap_sg(dev, sg, nents, dir);
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
>   	else if (ops->unmap_sg)
>   		ops->unmap_sg(dev, sg, nents, dir, attrs);
> @@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>   	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>   		return DMA_MAPPING_ERROR;
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
>   	else if (ops->map_resource)
>   		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
> @@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (!dma_is_direct(ops) && ops->unmap_resource)
> +	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
>   		ops->unmap_resource(dev, addr, size, dir, attrs);
>   	debug_dma_unmap_resource(dev, addr, size, dir);
>   }
> @@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>   	else if (ops->sync_single_for_cpu)
>   		ops->sync_single_for_cpu(dev, addr, size, dir);
> @@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_single_for_device(dev, addr, size, dir);
>   	else if (ops->sync_single_for_device)
>   		ops->sync_single_for_device(dev, addr, size, dir);
> @@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>   	else if (ops->sync_sg_for_cpu)
>   		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
> @@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
>   	else if (ops->sync_sg_for_device)
>   		ops->sync_sg_for_device(dev, sg, nelems, dir);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 12ff766ec1fa..fdea45574345 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>   }
>   EXPORT_SYMBOL(dmam_alloc_attrs);
>   
> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> +{
> +	if (!ops)
> +		return true;
> +
> +	/*
> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> +	 * is large enough.
> +	 */
> +	if (dev->dma_ops_bypass) {
> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> +				dma_direct_get_required_mask(dev))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>   /*
>    * Create scatter-list for the already allocated DMA buffer.
>    */
> @@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
>   				size, attrs);
>   	if (!ops->get_sgtable)
> @@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_can_mmap(dev);
>   	return ops->mmap != NULL;
>   }
> @@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
>   				attrs);
>   	if (!ops->mmap)
> @@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		return dma_direct_get_required_mask(dev);
>   	if (ops->get_required_mask)
>   		return ops->get_required_mask(dev);
> @@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>   	/* let the implementation decide on the zone to allocate from: */
>   	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
>   	else if (ops->alloc)
>   		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> @@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>   		return;
>   
>   	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
>   	else if (ops->free)
>   		ops->free(dev, size, cpu_addr, dma_handle, attrs);
> @@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (!ops)
>   		return dma_direct_supported(dev, mask);
>   	if (!ops->dma_supported)
>   		return 1;
> @@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		arch_dma_cache_sync(dev, vaddr, size, dir);
>   	else if (ops->cache_sync)
>   		ops->cache_sync(dev, vaddr, size, dir);
> @@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   	size_t size = SIZE_MAX;
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		size = dma_direct_max_mapping_size(dev);
>   	else if (ops && ops->max_mapping_size)
>   		size = ops->max_mapping_size(dev);
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 12:14     ` Robin Murphy
  0 siblings, 0 replies; 94+ messages in thread
From: Robin Murphy @ 2020-03-23 12:14 UTC (permalink / raw)
  To: Christoph Hellwig, iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, linuxppc-dev, linux-kernel, Lu Baolu

On 2020-03-20 2:16 pm, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.

Hmm, this is _almost_, but not quite the same as the case where drivers 
are managing their own IOMMU mappings, but still need to use streaming 
DMA for cache maintenance on the underlying pages. For that we need the 
ops bypass to be a "true" bypass and also avoid SWIOTLB regardless of 
the device's DMA mask. That behaviour should in fact be fine for the 
opportunistic bypass case here as well, since the mask being "big 
enough" implies by definition that this should never need to bounce either.

For the (hopefully less common) third case where, due to groups or user 
overrides, we end up giving an identity DMA domain to a device with 
limited DMA masks which _does_ need SWIOTLB, I'd like to think we can 
solve that by not giving the device IOMMU DMA ops in the first place, 
such that it never needs to engage the bypass mechanism at all.

Thoughts?

Robin.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/device.h      |  6 ++++++
>   include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>   kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>   3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 0cd7c647c16c..09be8bb2c4a6 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -525,6 +525,11 @@ struct dev_links_info {
>    *		  sync_state() callback.
>    * @dma_coherent: this particular device is dma coherent, even if the
>    *		architecture supports non-coherent devices.
> + * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> + *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
> + *		and optionall (if the coherent mask is large enough) also
> + *		for dma allocations.  This flag is managed by the dma ops
> + *		instance from ->dma_supported.
>    *
>    * At the lowest level, every device in a Linux system is represented by an
>    * instance of struct device. The device structure contains the information
> @@ -625,6 +630,7 @@ struct device {
>       defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>   	bool			dma_coherent:1;
>   #endif
> +	bool			dma_ops_bypass : 1;
>   };
>   
>   static inline struct device *kobj_to_dev(struct kobject *kobj)
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 330ad58fbf4d..c3af0cf5e435 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
>   }
>   #endif /* CONFIG_DMA_DECLARE_COHERENT */
>   
> -static inline bool dma_is_direct(const struct dma_map_ops *ops)
> +/*
> + * Check if the devices uses a direct mapping for streaming DMA operations.
> + * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
> + * enough.
> + */
> +static inline bool dma_map_direct(struct device *dev,
> +		const struct dma_map_ops *ops)
>   {
> -	return likely(!ops);
> +	return likely(!ops) || dev->dma_ops_bypass;
>   }
>   
>   /*
> @@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>   	dma_addr_t addr;
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>   	else
>   		addr = ops->map_page(dev, page, offset, size, dir, attrs);
> @@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_unmap_page(dev, addr, size, dir, attrs);
>   	else if (ops->unmap_page)
>   		ops->unmap_page(dev, addr, size, dir, attrs);
> @@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   	int ents;
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>   	else
>   		ents = ops->map_sg(dev, sg, nents, dir, attrs);
> @@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   	debug_dma_unmap_sg(dev, sg, nents, dir);
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
>   	else if (ops->unmap_sg)
>   		ops->unmap_sg(dev, sg, nents, dir, attrs);
> @@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>   	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>   		return DMA_MAPPING_ERROR;
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
>   	else if (ops->map_resource)
>   		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
> @@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (!dma_is_direct(ops) && ops->unmap_resource)
> +	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
>   		ops->unmap_resource(dev, addr, size, dir, attrs);
>   	debug_dma_unmap_resource(dev, addr, size, dir);
>   }
> @@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>   	else if (ops->sync_single_for_cpu)
>   		ops->sync_single_for_cpu(dev, addr, size, dir);
> @@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_single_for_device(dev, addr, size, dir);
>   	else if (ops->sync_single_for_device)
>   		ops->sync_single_for_device(dev, addr, size, dir);
> @@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>   	else if (ops->sync_sg_for_cpu)
>   		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
> @@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
>   	else if (ops->sync_sg_for_device)
>   		ops->sync_sg_for_device(dev, sg, nelems, dir);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 12ff766ec1fa..fdea45574345 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>   }
>   EXPORT_SYMBOL(dmam_alloc_attrs);
>   
> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> +{
> +	if (!ops)
> +		return true;
> +
> +	/*
> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> +	 * is large enough.
> +	 */
> +	if (dev->dma_ops_bypass) {
> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> +				dma_direct_get_required_mask(dev))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>   /*
>    * Create scatter-list for the already allocated DMA buffer.
>    */
> @@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
>   				size, attrs);
>   	if (!ops->get_sgtable)
> @@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_can_mmap(dev);
>   	return ops->mmap != NULL;
>   }
> @@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
>   				attrs);
>   	if (!ops->mmap)
> @@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		return dma_direct_get_required_mask(dev);
>   	if (ops->get_required_mask)
>   		return ops->get_required_mask(dev);
> @@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>   	/* let the implementation decide on the zone to allocate from: */
>   	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
>   	else if (ops->alloc)
>   		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> @@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>   		return;
>   
>   	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
>   	else if (ops->free)
>   		ops->free(dev, size, cpu_addr, dma_handle, attrs);
> @@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (!ops)
>   		return dma_direct_supported(dev, mask);
>   	if (!ops->dma_supported)
>   		return 1;
> @@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		arch_dma_cache_sync(dev, vaddr, size, dir);
>   	else if (ops->cache_sync)
>   		ops->cache_sync(dev, vaddr, size, dir);
> @@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   	size_t size = SIZE_MAX;
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		size = dma_direct_max_mapping_size(dev);
>   	else if (ops && ops->max_mapping_size)
>   		size = ops->max_mapping_size(dev);
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 12:14     ` Robin Murphy
  0 siblings, 0 replies; 94+ messages in thread
From: Robin Murphy @ 2020-03-23 12:14 UTC (permalink / raw)
  To: Christoph Hellwig, iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, linuxppc-dev, linux-kernel

On 2020-03-20 2:16 pm, Christoph Hellwig wrote:
> Several IOMMU drivers have a bypass mode where they can use a direct
> mapping if the devices DMA mask is large enough.  Add generic support
> to the core dma-mapping code to do that to switch those drivers to
> a common solution.

Hmm, this is _almost_, but not quite the same as the case where drivers 
are managing their own IOMMU mappings, but still need to use streaming 
DMA for cache maintenance on the underlying pages. For that we need the 
ops bypass to be a "true" bypass and also avoid SWIOTLB regardless of 
the device's DMA mask. That behaviour should in fact be fine for the 
opportunistic bypass case here as well, since the mask being "big 
enough" implies by definition that this should never need to bounce either.

For the (hopefully less common) third case where, due to groups or user 
overrides, we end up giving an identity DMA domain to a device with 
limited DMA masks which _does_ need SWIOTLB, I'd like to think we can 
solve that by not giving the device IOMMU DMA ops in the first place, 
such that it never needs to engage the bypass mechanism at all.

Thoughts?

Robin.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/device.h      |  6 ++++++
>   include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
>   kernel/dma/mapping.c        | 36 +++++++++++++++++++++++++++---------
>   3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 0cd7c647c16c..09be8bb2c4a6 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -525,6 +525,11 @@ struct dev_links_info {
>    *		  sync_state() callback.
>    * @dma_coherent: this particular device is dma coherent, even if the
>    *		architecture supports non-coherent devices.
> + * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
> + *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
> + *		and optionall (if the coherent mask is large enough) also
> + *		for dma allocations.  This flag is managed by the dma ops
> + *		instance from ->dma_supported.
>    *
>    * At the lowest level, every device in a Linux system is represented by an
>    * instance of struct device. The device structure contains the information
> @@ -625,6 +630,7 @@ struct device {
>       defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>   	bool			dma_coherent:1;
>   #endif
> +	bool			dma_ops_bypass : 1;
>   };
>   
>   static inline struct device *kobj_to_dev(struct kobject *kobj)
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 330ad58fbf4d..c3af0cf5e435 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
>   }
>   #endif /* CONFIG_DMA_DECLARE_COHERENT */
>   
> -static inline bool dma_is_direct(const struct dma_map_ops *ops)
> +/*
> + * Check if the devices uses a direct mapping for streaming DMA operations.
> + * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
> + * enough.
> + */
> +static inline bool dma_map_direct(struct device *dev,
> +		const struct dma_map_ops *ops)
>   {
> -	return likely(!ops);
> +	return likely(!ops) || dev->dma_ops_bypass;
>   }
>   
>   /*
> @@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
>   	dma_addr_t addr;
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>   	else
>   		addr = ops->map_page(dev, page, offset, size, dir, attrs);
> @@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_unmap_page(dev, addr, size, dir, attrs);
>   	else if (ops->unmap_page)
>   		ops->unmap_page(dev, addr, size, dir, attrs);
> @@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   	int ents;
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>   	else
>   		ents = ops->map_sg(dev, sg, nents, dir, attrs);
> @@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   	debug_dma_unmap_sg(dev, sg, nents, dir);
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
>   	else if (ops->unmap_sg)
>   		ops->unmap_sg(dev, sg, nents, dir, attrs);
> @@ -352,7 +358,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
>   	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>   		return DMA_MAPPING_ERROR;
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
>   	else if (ops->map_resource)
>   		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
> @@ -368,7 +374,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (!dma_is_direct(ops) && ops->unmap_resource)
> +	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
>   		ops->unmap_resource(dev, addr, size, dir, attrs);
>   	debug_dma_unmap_resource(dev, addr, size, dir);
>   }
> @@ -380,7 +386,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
>   	else if (ops->sync_single_for_cpu)
>   		ops->sync_single_for_cpu(dev, addr, size, dir);
> @@ -394,7 +400,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_single_for_device(dev, addr, size, dir);
>   	else if (ops->sync_single_for_device)
>   		ops->sync_single_for_device(dev, addr, size, dir);
> @@ -408,7 +414,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>   	else if (ops->sync_sg_for_cpu)
>   		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
> @@ -422,7 +428,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
>   	BUG_ON(!valid_dma_direction(dir));
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
>   	else if (ops->sync_sg_for_device)
>   		ops->sync_sg_for_device(dev, sg, nelems, dir);
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 12ff766ec1fa..fdea45574345 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -105,6 +105,24 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>   }
>   EXPORT_SYMBOL(dmam_alloc_attrs);
>   
> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> +{
> +	if (!ops)
> +		return true;
> +
> +	/*
> +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> +	 * is large enough.
> +	 */
> +	if (dev->dma_ops_bypass) {
> +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> +				dma_direct_get_required_mask(dev))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>   /*
>    * Create scatter-list for the already allocated DMA buffer.
>    */
> @@ -138,7 +156,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
>   				size, attrs);
>   	if (!ops->get_sgtable)
> @@ -206,7 +224,7 @@ bool dma_can_mmap(struct device *dev)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_can_mmap(dev);
>   	return ops->mmap != NULL;
>   }
> @@ -231,7 +249,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
>   				attrs);
>   	if (!ops->mmap)
> @@ -244,7 +262,7 @@ u64 dma_get_required_mask(struct device *dev)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		return dma_direct_get_required_mask(dev);
>   	if (ops->get_required_mask)
>   		return ops->get_required_mask(dev);
> @@ -275,7 +293,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>   	/* let the implementation decide on the zone to allocate from: */
>   	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
>   	else if (ops->alloc)
>   		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> @@ -307,7 +325,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>   		return;
>   
>   	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
>   	else if (ops->free)
>   		ops->free(dev, size, cpu_addr, dma_handle, attrs);
> @@ -318,7 +336,7 @@ int dma_supported(struct device *dev, u64 mask)
>   {
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   
> -	if (dma_is_direct(ops))
> +	if (!ops)
>   		return dma_direct_supported(dev, mask);
>   	if (!ops->dma_supported)
>   		return 1;
> @@ -374,7 +392,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>   
>   	BUG_ON(!valid_dma_direction(dir));
>   
> -	if (dma_is_direct(ops))
> +	if (dma_alloc_direct(dev, ops))
>   		arch_dma_cache_sync(dev, vaddr, size, dir);
>   	else if (ops->cache_sync)
>   		ops->cache_sync(dev, vaddr, size, dir);
> @@ -386,7 +404,7 @@ size_t dma_max_mapping_size(struct device *dev)
>   	const struct dma_map_ops *ops = get_dma_ops(dev);
>   	size_t size = SIZE_MAX;
>   
> -	if (dma_is_direct(ops))
> +	if (dma_map_direct(dev, ops))
>   		size = dma_direct_max_mapping_size(dev);
>   	else if (ops && ops->max_mapping_size)
>   		size = ops->max_mapping_size(dev);
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23 12:14     ` Robin Murphy
  (?)
@ 2020-03-23 12:55       ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 12:55 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, iommu, Alexey Kardashevskiy, linuxppc-dev,
	Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, linux-kernel

On Mon, Mar 23, 2020 at 12:14:08PM +0000, Robin Murphy wrote:
> On 2020-03-20 2:16 pm, Christoph Hellwig wrote:
>> Several IOMMU drivers have a bypass mode where they can use a direct
>> mapping if the devices DMA mask is large enough.  Add generic support
>> to the core dma-mapping code to do that to switch those drivers to
>> a common solution.
>
> Hmm, this is _almost_, but not quite the same as the case where drivers are 
> managing their own IOMMU mappings, but still need to use streaming DMA for 
> cache maintenance on the underlying pages.

In that case they should simply not call the DMA API at all.  We'll just
need versions of the cache maintainance APIs that tie in with the raw
IOMMU API.

> For that we need the ops bypass 
> to be a "true" bypass and also avoid SWIOTLB regardless of the device's DMA 
> mask. That behaviour should in fact be fine for the opportunistic bypass 
> case here as well, since the mask being "big enough" implies by definition 
> that this should never need to bounce either.

In practice it does.  But that means adding yet another code path
vs the simple direct call to dma_direct_* vs calling the DMA ops
which I'd rather avoid.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 12:55       ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 12:55 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Alexey Kardashevskiy, Greg Kroah-Hartman, Joerg Roedel,
	linux-kernel, iommu, linuxppc-dev, Christoph Hellwig, Lu Baolu

On Mon, Mar 23, 2020 at 12:14:08PM +0000, Robin Murphy wrote:
> On 2020-03-20 2:16 pm, Christoph Hellwig wrote:
>> Several IOMMU drivers have a bypass mode where they can use a direct
>> mapping if the devices DMA mask is large enough.  Add generic support
>> to the core dma-mapping code to do that to switch those drivers to
>> a common solution.
>
> Hmm, this is _almost_, but not quite the same as the case where drivers are 
> managing their own IOMMU mappings, but still need to use streaming DMA for 
> cache maintenance on the underlying pages.

In that case they should simply not call the DMA API at all.  We'll just
need versions of the cache maintainance APIs that tie in with the raw
IOMMU API.

> For that we need the ops bypass 
> to be a "true" bypass and also avoid SWIOTLB regardless of the device's DMA 
> mask. That behaviour should in fact be fine for the opportunistic bypass 
> case here as well, since the mask being "big enough" implies by definition 
> that this should never need to bounce either.

In practice it does.  But that means adding yet another code path
vs the simple direct call to dma_direct_* vs calling the DMA ops
which I'd rather avoid.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 12:55       ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 12:55 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Greg Kroah-Hartman, linux-kernel, iommu, linuxppc-dev, Christoph Hellwig

On Mon, Mar 23, 2020 at 12:14:08PM +0000, Robin Murphy wrote:
> On 2020-03-20 2:16 pm, Christoph Hellwig wrote:
>> Several IOMMU drivers have a bypass mode where they can use a direct
>> mapping if the devices DMA mask is large enough.  Add generic support
>> to the core dma-mapping code to do that to switch those drivers to
>> a common solution.
>
> Hmm, this is _almost_, but not quite the same as the case where drivers are 
> managing their own IOMMU mappings, but still need to use streaming DMA for 
> cache maintenance on the underlying pages.

In that case they should simply not call the DMA API at all.  We'll just
need versions of the cache maintainance APIs that tie in with the raw
IOMMU API.

> For that we need the ops bypass 
> to be a "true" bypass and also avoid SWIOTLB regardless of the device's DMA 
> mask. That behaviour should in fact be fine for the opportunistic bypass 
> case here as well, since the mask being "big enough" implies by definition 
> that this should never need to bounce either.

In practice it does.  But that means adding yet another code path
vs the simple direct call to dma_direct_* vs calling the DMA ops
which I'd rather avoid.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23  8:50         ` Christoph Hellwig
  (?)
@ 2020-03-23 15:37           ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 94+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-23 15:37 UTC (permalink / raw)
  To: Christoph Hellwig, Alexey Kardashevskiy
  Cc: Christoph Hellwig, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel

Christoph Hellwig <hch@lst.de> writes:

> On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote:
>> > > +	/*
>> > > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
>> > > +	 * is large enough.
>> > > +	 */
>> > > +	if (dev->dma_ops_bypass) {
>> > > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
>> > > +				dma_direct_get_required_mask(dev))
>> > > +			return true;
>> > > +	}
>> > 
>> > 
>> > Why not do this in dma_map_direct() as well?
>> 
>> Mostly beacuse it is a relatively expensive operation, including a
>> fls64.
>
> Which I guess isn't too bad compared to a dynamic IOMMU mapping.  Can
> you just send a draft patch for what you'd like to see for ppc?

This is what I was trying, but considering I am new to DMA subsystem, I
am not sure I got all the details correct. The idea is to look at the
cpu addr and see if that can be used in direct map fashion(is
bus_dma_limit the right restriction here?) if not fallback to dynamic
IOMMU mapping.

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de2..bc7e6a8b2caa 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -31,6 +31,87 @@ static inline bool dma_iommu_map_bypass(struct device *dev,
 		(!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
 }
 
+static inline bool __dma_direct_map_capable(struct device *dev, struct page *page,
+					    unsigned long offset, size_t size)
+{
+	phys_addr_t phys = page_to_phys(page) + offset;
+	dma_addr_t dma_addr = phys_to_dma(dev, phys);
+	dma_addr_t end = dma_addr + size - 1;
+
+	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
+}
+
+static inline bool dma_direct_map_capable(struct device *dev, struct page *page,
+					  unsigned long offset, size_t size,
+					  unsigned long attrs)
+{
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	return __dma_direct_map_capable(dev, page, offset, size);
+}
+
+
+static inline bool dma_direct_unmap_capable(struct device *dev, dma_addr_t addr, size_t size,
+					    unsigned long attrs)
+{
+	dma_addr_t end = addr + size - 1;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
+}
+
+static inline bool dma_direct_sg_map_capable(struct device *dev, struct scatterlist *sglist,
+					     int nelems, unsigned long attrs)
+{
+	int i;
+	struct scatterlist *sg;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	for_each_sg(sglist, sg, nelems, i) {
+		if (!__dma_direct_map_capable(dev, sg_page(sg),
+					      sg->offset, sg->length))
+			return false;
+	}
+	return true;
+}
+
+static inline bool dma_direct_sg_unmap_capable(struct device *dev, struct scatterlist *sglist,
+					       int nelems, unsigned long attrs)
+{
+	int i;
+	dma_addr_t end;
+	struct scatterlist *sg;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	for_each_sg(sglist, sg, nelems, i) {
+		end = sg->dma_address + sg_dma_len(sg);
+
+		if (end > min_not_zero(*dev->dma_mask, dev->bus_dma_limit))
+			return false;
+	}
+	return true;
+}
+
+
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -67,7 +148,7 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
 				     enum dma_data_direction direction,
 				     unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
+	if (dma_direct_map_capable(dev, page, offset, size, attrs))
 		return dma_direct_map_page(dev, page, offset, size, direction,
 				attrs);
 	return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
@@ -79,7 +160,7 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
 				 size_t size, enum dma_data_direction direction,
 				 unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
+	if (!dma_direct_unmap_capable(dev, dma_handle, size, attrs))
 		iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
 				direction,  attrs);
 	else
@@ -91,7 +172,7 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
 			    int nelems, enum dma_data_direction direction,
 			    unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
+	if (dma_direct_sg_map_capable(dev, sglist, nelems, attrs))
 		return dma_direct_map_sg(dev, sglist, nelems, direction, attrs);
 	return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
 				dma_get_mask(dev), direction, attrs);
@@ -101,7 +182,7 @@ static void dma_iommu_unmap_sg(struct device *dev, struct scatterlist *sglist,
 		int nelems, enum dma_data_direction direction,
 		unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
+	if (!dma_direct_sg_unmap_capable(dev, sglist, nelems, attrs))
 		ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
 			   direction, attrs);
 	else
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 99f72162dd85..702a680f5766 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1119,6 +1119,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	spin_unlock(&direct_window_list_lock);
 
 	dma_addr = be64_to_cpu(ddwprop->dma_base);
+	dev->dev.bus_dma_limit = dma_addr + query.largest_available_block;
 	goto out_unlock;
 
 out_free_window:

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 15:37           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 94+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-23 15:37 UTC (permalink / raw)
  To: Christoph Hellwig, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, linuxppc-dev, Christoph Hellwig, Lu Baolu

Christoph Hellwig <hch@lst.de> writes:

> On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote:
>> > > +	/*
>> > > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
>> > > +	 * is large enough.
>> > > +	 */
>> > > +	if (dev->dma_ops_bypass) {
>> > > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
>> > > +				dma_direct_get_required_mask(dev))
>> > > +			return true;
>> > > +	}
>> > 
>> > 
>> > Why not do this in dma_map_direct() as well?
>> 
>> Mostly beacuse it is a relatively expensive operation, including a
>> fls64.
>
> Which I guess isn't too bad compared to a dynamic IOMMU mapping.  Can
> you just send a draft patch for what you'd like to see for ppc?

This is what I was trying, but considering I am new to DMA subsystem, I
am not sure I got all the details correct. The idea is to look at the
cpu addr and see if that can be used in direct map fashion(is
bus_dma_limit the right restriction here?) if not fallback to dynamic
IOMMU mapping.

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de2..bc7e6a8b2caa 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -31,6 +31,87 @@ static inline bool dma_iommu_map_bypass(struct device *dev,
 		(!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
 }
 
+static inline bool __dma_direct_map_capable(struct device *dev, struct page *page,
+					    unsigned long offset, size_t size)
+{
+	phys_addr_t phys = page_to_phys(page) + offset;
+	dma_addr_t dma_addr = phys_to_dma(dev, phys);
+	dma_addr_t end = dma_addr + size - 1;
+
+	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
+}
+
+static inline bool dma_direct_map_capable(struct device *dev, struct page *page,
+					  unsigned long offset, size_t size,
+					  unsigned long attrs)
+{
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	return __dma_direct_map_capable(dev, page, offset, size);
+}
+
+
+static inline bool dma_direct_unmap_capable(struct device *dev, dma_addr_t addr, size_t size,
+					    unsigned long attrs)
+{
+	dma_addr_t end = addr + size - 1;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
+}
+
+static inline bool dma_direct_sg_map_capable(struct device *dev, struct scatterlist *sglist,
+					     int nelems, unsigned long attrs)
+{
+	int i;
+	struct scatterlist *sg;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	for_each_sg(sglist, sg, nelems, i) {
+		if (!__dma_direct_map_capable(dev, sg_page(sg),
+					      sg->offset, sg->length))
+			return false;
+	}
+	return true;
+}
+
+static inline bool dma_direct_sg_unmap_capable(struct device *dev, struct scatterlist *sglist,
+					       int nelems, unsigned long attrs)
+{
+	int i;
+	dma_addr_t end;
+	struct scatterlist *sg;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	for_each_sg(sglist, sg, nelems, i) {
+		end = sg->dma_address + sg_dma_len(sg);
+
+		if (end > min_not_zero(*dev->dma_mask, dev->bus_dma_limit))
+			return false;
+	}
+	return true;
+}
+
+
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -67,7 +148,7 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
 				     enum dma_data_direction direction,
 				     unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
+	if (dma_direct_map_capable(dev, page, offset, size, attrs))
 		return dma_direct_map_page(dev, page, offset, size, direction,
 				attrs);
 	return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
@@ -79,7 +160,7 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
 				 size_t size, enum dma_data_direction direction,
 				 unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
+	if (!dma_direct_unmap_capable(dev, dma_handle, size, attrs))
 		iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
 				direction,  attrs);
 	else
@@ -91,7 +172,7 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
 			    int nelems, enum dma_data_direction direction,
 			    unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
+	if (dma_direct_sg_map_capable(dev, sglist, nelems, attrs))
 		return dma_direct_map_sg(dev, sglist, nelems, direction, attrs);
 	return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
 				dma_get_mask(dev), direction, attrs);
@@ -101,7 +182,7 @@ static void dma_iommu_unmap_sg(struct device *dev, struct scatterlist *sglist,
 		int nelems, enum dma_data_direction direction,
 		unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
+	if (!dma_direct_sg_unmap_capable(dev, sglist, nelems, attrs))
 		ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
 			   direction, attrs);
 	else
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 99f72162dd85..702a680f5766 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1119,6 +1119,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	spin_unlock(&direct_window_list_lock);
 
 	dma_addr = be64_to_cpu(ddwprop->dma_base);
+	dev->dev.bus_dma_limit = dma_addr + query.largest_available_block;
 	goto out_unlock;
 
 out_free_window:

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 15:37           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 94+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-23 15:37 UTC (permalink / raw)
  To: Christoph Hellwig, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	linuxppc-dev, Christoph Hellwig

Christoph Hellwig <hch@lst.de> writes:

> On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote:
>> > > +	/*
>> > > +	 * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
>> > > +	 * is large enough.
>> > > +	 */
>> > > +	if (dev->dma_ops_bypass) {
>> > > +		if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
>> > > +				dma_direct_get_required_mask(dev))
>> > > +			return true;
>> > > +	}
>> > 
>> > 
>> > Why not do this in dma_map_direct() as well?
>> 
>> Mostly beacuse it is a relatively expensive operation, including a
>> fls64.
>
> Which I guess isn't too bad compared to a dynamic IOMMU mapping.  Can
> you just send a draft patch for what you'd like to see for ppc?

This is what I was trying, but considering I am new to DMA subsystem, I
am not sure I got all the details correct. The idea is to look at the
cpu addr and see if that can be used in direct map fashion(is
bus_dma_limit the right restriction here?) if not fallback to dynamic
IOMMU mapping.

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de2..bc7e6a8b2caa 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -31,6 +31,87 @@ static inline bool dma_iommu_map_bypass(struct device *dev,
 		(!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
 }
 
+static inline bool __dma_direct_map_capable(struct device *dev, struct page *page,
+					    unsigned long offset, size_t size)
+{
+	phys_addr_t phys = page_to_phys(page) + offset;
+	dma_addr_t dma_addr = phys_to_dma(dev, phys);
+	dma_addr_t end = dma_addr + size - 1;
+
+	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
+}
+
+static inline bool dma_direct_map_capable(struct device *dev, struct page *page,
+					  unsigned long offset, size_t size,
+					  unsigned long attrs)
+{
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	return __dma_direct_map_capable(dev, page, offset, size);
+}
+
+
+static inline bool dma_direct_unmap_capable(struct device *dev, dma_addr_t addr, size_t size,
+					    unsigned long attrs)
+{
+	dma_addr_t end = addr + size - 1;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
+}
+
+static inline bool dma_direct_sg_map_capable(struct device *dev, struct scatterlist *sglist,
+					     int nelems, unsigned long attrs)
+{
+	int i;
+	struct scatterlist *sg;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	for_each_sg(sglist, sg, nelems, i) {
+		if (!__dma_direct_map_capable(dev, sg_page(sg),
+					      sg->offset, sg->length))
+			return false;
+	}
+	return true;
+}
+
+static inline bool dma_direct_sg_unmap_capable(struct device *dev, struct scatterlist *sglist,
+					       int nelems, unsigned long attrs)
+{
+	int i;
+	dma_addr_t end;
+	struct scatterlist *sg;
+
+	if (!dma_iommu_map_bypass(dev, attrs))
+		return false;
+
+	if (!dev->dma_mask)
+		return false;
+
+	for_each_sg(sglist, sg, nelems, i) {
+		end = sg->dma_address + sg_dma_len(sg);
+
+		if (end > min_not_zero(*dev->dma_mask, dev->bus_dma_limit))
+			return false;
+	}
+	return true;
+}
+
+
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -67,7 +148,7 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, struct page *page,
 				     enum dma_data_direction direction,
 				     unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
+	if (dma_direct_map_capable(dev, page, offset, size, attrs))
 		return dma_direct_map_page(dev, page, offset, size, direction,
 				attrs);
 	return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
@@ -79,7 +160,7 @@ static void dma_iommu_unmap_page(struct device *dev, dma_addr_t dma_handle,
 				 size_t size, enum dma_data_direction direction,
 				 unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
+	if (!dma_direct_unmap_capable(dev, dma_handle, size, attrs))
 		iommu_unmap_page(get_iommu_table_base(dev), dma_handle, size,
 				direction,  attrs);
 	else
@@ -91,7 +172,7 @@ static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
 			    int nelems, enum dma_data_direction direction,
 			    unsigned long attrs)
 {
-	if (dma_iommu_map_bypass(dev, attrs))
+	if (dma_direct_sg_map_capable(dev, sglist, nelems, attrs))
 		return dma_direct_map_sg(dev, sglist, nelems, direction, attrs);
 	return ppc_iommu_map_sg(dev, get_iommu_table_base(dev), sglist, nelems,
 				dma_get_mask(dev), direction, attrs);
@@ -101,7 +182,7 @@ static void dma_iommu_unmap_sg(struct device *dev, struct scatterlist *sglist,
 		int nelems, enum dma_data_direction direction,
 		unsigned long attrs)
 {
-	if (!dma_iommu_map_bypass(dev, attrs))
+	if (!dma_direct_sg_unmap_capable(dev, sglist, nelems, attrs))
 		ppc_iommu_unmap_sg(get_iommu_table_base(dev), sglist, nelems,
 			   direction, attrs);
 	else
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 99f72162dd85..702a680f5766 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1119,6 +1119,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 	spin_unlock(&direct_window_list_lock);
 
 	dma_addr = be64_to_cpu(ddwprop->dma_base);
+	dev->dev.bus_dma_limit = dma_addr + query.largest_available_block;
 	goto out_unlock;
 
 out_free_window:
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23  8:58         ` Alexey Kardashevskiy
  (?)
@ 2020-03-23 17:20           ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 17:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Aneesh Kumar K.V

On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
> >> 0x100.0000.0000 .. 0x101.0000.0000
> >>
> >> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> >> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> >> that "pseries" is done with offset=0x0800.0000.0000.0000.
> >>
> >> So we want to check every bus address against dev->bus_dma_limit, not
> >> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> >> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
> >> tried. Does this sound reasonable? Thanks,
> > 
> > bus_dma_limit is just another limiting factor applied on top of
> > coherent_dma_mask or dma_mask respectively.
> 
> This is not enough for the task: in my example, I'd set bus limit to
> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
> addresses - the first and the second 4GB blocks.

So what about something like the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 17:20           ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 17:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
> >> 0x100.0000.0000 .. 0x101.0000.0000
> >>
> >> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> >> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> >> that "pseries" is done with offset=0x0800.0000.0000.0000.
> >>
> >> So we want to check every bus address against dev->bus_dma_limit, not
> >> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> >> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
> >> tried. Does this sound reasonable? Thanks,
> > 
> > bus_dma_limit is just another limiting factor applied on top of
> > coherent_dma_mask or dma_mask respectively.
> 
> This is not enough for the task: in my example, I'd set bus limit to
> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
> addresses - the first and the second 4GB blocks.

So what about something like the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 17:20           ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 17:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig

On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
> >> 0x100.0000.0000 .. 0x101.0000.0000
> >>
> >> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> >> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> >> that "pseries" is done with offset=0x0800.0000.0000.0000.
> >>
> >> So we want to check every bus address against dev->bus_dma_limit, not
> >> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> >> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
> >> tried. Does this sound reasonable? Thanks,
> > 
> > bus_dma_limit is just another limiting factor applied on top of
> > coherent_dma_mask or dma_mask respectively.
> 
> This is not enough for the task: in my example, I'd set bus limit to
> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
> addresses - the first and the second 4GB blocks.

So what about something like the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23 15:37           ` Aneesh Kumar K.V
  (?)
@ 2020-03-23 17:22             ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 17:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Christoph Hellwig, Alexey Kardashevskiy, iommu, linuxppc-dev,
	Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Robin Murphy,
	linux-kernel

On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
> 
> This is what I was trying, but considering I am new to DMA subsystem, I
> am not sure I got all the details correct. The idea is to look at the
> cpu addr and see if that can be used in direct map fashion(is
> bus_dma_limit the right restriction here?) if not fallback to dynamic
> IOMMU mapping.

I don't think we can throw all these complications into the dma
mapping code.  At some point I also wonder what the point is,
especially for scatterlist mappings, where the iommu can coalesce.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 17:22             ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 17:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Alexey Kardashevskiy, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel, iommu, linuxppc-dev,
	Christoph Hellwig, Lu Baolu

On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
> 
> This is what I was trying, but considering I am new to DMA subsystem, I
> am not sure I got all the details correct. The idea is to look at the
> cpu addr and see if that can be used in direct map fashion(is
> bus_dma_limit the right restriction here?) if not fallback to dynamic
> IOMMU mapping.

I don't think we can throw all these complications into the dma
mapping code.  At some point I also wonder what the point is,
especially for scatterlist mappings, where the iommu can coalesce.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-23 17:22             ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-23 17:22 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	linuxppc-dev, Christoph Hellwig

On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
> 
> This is what I was trying, but considering I am new to DMA subsystem, I
> am not sure I got all the details correct. The idea is to look at the
> cpu addr and see if that can be used in direct map fashion(is
> bus_dma_limit the right restriction here?) if not fallback to dynamic
> IOMMU mapping.

I don't think we can throw all these complications into the dma
mapping code.  At some point I also wonder what the point is,
especially for scatterlist mappings, where the iommu can coalesce.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23 17:22             ` Christoph Hellwig
  (?)
@ 2020-03-24  3:05               ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  3:05 UTC (permalink / raw)
  To: Christoph Hellwig, Aneesh Kumar K.V
  Cc: iommu, linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel



On 24/03/2020 04:22, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
>>
>> This is what I was trying, but considering I am new to DMA subsystem, I
>> am not sure I got all the details correct. The idea is to look at the
>> cpu addr and see if that can be used in direct map fashion(is
>> bus_dma_limit the right restriction here?) if not fallback to dynamic
>> IOMMU mapping.
> 
> I don't think we can throw all these complications into the dma
> mapping code.  At some point I also wonder what the point is,
> especially for scatterlist mappings, where the iommu can coalesce.

This is for persistent memory which you can DMA to/from but yet it does
not appear in the system as a normal memory and therefore requires
special handling anyway (O_DIRECT or DAX, I do not know the exact
mechanics). All other devices in the system should just run as usual,
i.e. use 1:1 mapping if possible.


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  3:05               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  3:05 UTC (permalink / raw)
  To: Christoph Hellwig, Aneesh Kumar K.V
  Cc: Greg Kroah-Hartman, Joerg Roedel, linuxppc-dev, linux-kernel,
	iommu, Robin Murphy, Lu Baolu



On 24/03/2020 04:22, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
>>
>> This is what I was trying, but considering I am new to DMA subsystem, I
>> am not sure I got all the details correct. The idea is to look at the
>> cpu addr and see if that can be used in direct map fashion(is
>> bus_dma_limit the right restriction here?) if not fallback to dynamic
>> IOMMU mapping.
> 
> I don't think we can throw all these complications into the dma
> mapping code.  At some point I also wonder what the point is,
> especially for scatterlist mappings, where the iommu can coalesce.

This is for persistent memory which you can DMA to/from but yet it does
not appear in the system as a normal memory and therefore requires
special handling anyway (O_DIRECT or DAX, I do not know the exact
mechanics). All other devices in the system should just run as usual,
i.e. use 1:1 mapping if possible.


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  3:05               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  3:05 UTC (permalink / raw)
  To: Christoph Hellwig, Aneesh Kumar K.V
  Cc: Greg Kroah-Hartman, linuxppc-dev, linux-kernel, iommu, Robin Murphy



On 24/03/2020 04:22, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
>>
>> This is what I was trying, but considering I am new to DMA subsystem, I
>> am not sure I got all the details correct. The idea is to look at the
>> cpu addr and see if that can be used in direct map fashion(is
>> bus_dma_limit the right restriction here?) if not fallback to dynamic
>> IOMMU mapping.
> 
> I don't think we can throw all these complications into the dma
> mapping code.  At some point I also wonder what the point is,
> especially for scatterlist mappings, where the iommu can coalesce.

This is for persistent memory which you can DMA to/from but yet it does
not appear in the system as a normal memory and therefore requires
special handling anyway (O_DIRECT or DAX, I do not know the exact
mechanics). All other devices in the system should just run as usual,
i.e. use 1:1 mapping if possible.


-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-23 17:20           ` Christoph Hellwig
  (?)
@ 2020-03-24  3:37             ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  3:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel, Aneesh Kumar K.V



On 24/03/2020 04:20, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
>>>> 0x100.0000.0000 .. 0x101.0000.0000
>>>>
>>>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>>>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>>>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>>>
>>>> So we want to check every bus address against dev->bus_dma_limit, not
>>>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>>>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>>>> tried. Does this sound reasonable? Thanks,
>>>
>>> bus_dma_limit is just another limiting factor applied on top of
>>> coherent_dma_mask or dma_mask respectively.
>>
>> This is not enough for the task: in my example, I'd set bus limit to
>> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
>> addresses - the first and the second 4GB blocks.
> 
> So what about something like the version here:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3


dma_alloc_direct() and dma_map_direct() do the same thing now which is
good, did I miss anything else?

This lets us disable bypass automatically if this weird memory appears
in the system but does not let us have 1:1 after that even for normal
RAM. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  3:37             ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  3:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Greg Kroah-Hartman, Joerg Roedel, linuxppc-dev, linux-kernel,
	iommu, Aneesh Kumar K.V, Robin Murphy, Lu Baolu



On 24/03/2020 04:20, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
>>>> 0x100.0000.0000 .. 0x101.0000.0000
>>>>
>>>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>>>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>>>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>>>
>>>> So we want to check every bus address against dev->bus_dma_limit, not
>>>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>>>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>>>> tried. Does this sound reasonable? Thanks,
>>>
>>> bus_dma_limit is just another limiting factor applied on top of
>>> coherent_dma_mask or dma_mask respectively.
>>
>> This is not enough for the task: in my example, I'd set bus limit to
>> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
>> addresses - the first and the second 4GB blocks.
> 
> So what about something like the version here:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3


dma_alloc_direct() and dma_map_direct() do the same thing now which is
good, did I miss anything else?

This lets us disable bypass automatically if this weird memory appears
in the system but does not let us have 1:1 after that even for normal
RAM. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  3:37             ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  3:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Greg Kroah-Hartman, linuxppc-dev, linux-kernel, iommu,
	Aneesh Kumar K.V, Robin Murphy



On 24/03/2020 04:20, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
>>>> 0x100.0000.0000 .. 0x101.0000.0000
>>>>
>>>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>>>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>>>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>>>
>>>> So we want to check every bus address against dev->bus_dma_limit, not
>>>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>>>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>>>> tried. Does this sound reasonable? Thanks,
>>>
>>> bus_dma_limit is just another limiting factor applied on top of
>>> coherent_dma_mask or dma_mask respectively.
>>
>> This is not enough for the task: in my example, I'd set bus limit to
>> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
>> addresses - the first and the second 4GB blocks.
> 
> So what about something like the version here:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3


dma_alloc_direct() and dma_map_direct() do the same thing now which is
good, did I miss anything else?

This lets us disable bypass automatically if this weird memory appears
in the system but does not let us have 1:1 after that even for normal
RAM. Thanks,


-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-24  3:37             ` Alexey Kardashevskiy
  (?)
@ 2020-03-24  4:55               ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  4:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel, Aneesh Kumar K.V



On 24/03/2020 14:37, Alexey Kardashevskiy wrote:
> 
> 
> On 24/03/2020 04:20, Christoph Hellwig wrote:
>> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
>>>>> 0x100.0000.0000 .. 0x101.0000.0000
>>>>>
>>>>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>>>>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>>>>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>>>>
>>>>> So we want to check every bus address against dev->bus_dma_limit, not
>>>>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>>>>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>>>>> tried. Does this sound reasonable? Thanks,
>>>>
>>>> bus_dma_limit is just another limiting factor applied on top of
>>>> coherent_dma_mask or dma_mask respectively.
>>>
>>> This is not enough for the task: in my example, I'd set bus limit to
>>> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
>>> addresses - the first and the second 4GB blocks.
>>
>> So what about something like the version here:
>>
>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3
> 
> 
> dma_alloc_direct() and dma_map_direct() do the same thing now which is
> good, did I miss anything else?
> 
> This lets us disable bypass automatically if this weird memory appears
> in the system but does not let us have 1:1 after that even for normal
> RAM. Thanks,

Ah no, does not help much, simple setting dma_ops_bypass will though.


But eventually, in this function:

static inline bool dma_map_direct(struct device *dev,
               const struct dma_map_ops *ops)
{
       if (likely(!ops))
               return true;
       if (!dev->dma_ops_bypass)
               return false;

       return min_not_zero(*dev->dma_mask, dev->bus_dma_limit) >=
                           dma_direct_get_required_mask(dev);
}


we rather want it to take a dma handle and a size, and add

if (dev->bus_dma_limit)
	return dev->bus_dma_limit > dma_handle + size;


where dma_handle=phys_to_dma(dev, phys) (I am not doing it here as unmap
needs the same test and it does not receive phys as a parameter).




-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  4:55               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  4:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Greg Kroah-Hartman, Joerg Roedel, linuxppc-dev, linux-kernel,
	iommu, Aneesh Kumar K.V, Robin Murphy, Lu Baolu



On 24/03/2020 14:37, Alexey Kardashevskiy wrote:
> 
> 
> On 24/03/2020 04:20, Christoph Hellwig wrote:
>> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
>>>>> 0x100.0000.0000 .. 0x101.0000.0000
>>>>>
>>>>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>>>>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>>>>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>>>>
>>>>> So we want to check every bus address against dev->bus_dma_limit, not
>>>>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>>>>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>>>>> tried. Does this sound reasonable? Thanks,
>>>>
>>>> bus_dma_limit is just another limiting factor applied on top of
>>>> coherent_dma_mask or dma_mask respectively.
>>>
>>> This is not enough for the task: in my example, I'd set bus limit to
>>> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
>>> addresses - the first and the second 4GB blocks.
>>
>> So what about something like the version here:
>>
>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3
> 
> 
> dma_alloc_direct() and dma_map_direct() do the same thing now which is
> good, did I miss anything else?
> 
> This lets us disable bypass automatically if this weird memory appears
> in the system but does not let us have 1:1 after that even for normal
> RAM. Thanks,

Ah no, does not help much, simple setting dma_ops_bypass will though.


But eventually, in this function:

static inline bool dma_map_direct(struct device *dev,
               const struct dma_map_ops *ops)
{
       if (likely(!ops))
               return true;
       if (!dev->dma_ops_bypass)
               return false;

       return min_not_zero(*dev->dma_mask, dev->bus_dma_limit) >=
                           dma_direct_get_required_mask(dev);
}


we rather want it to take a dma handle and a size, and add

if (dev->bus_dma_limit)
	return dev->bus_dma_limit > dma_handle + size;


where dma_handle=phys_to_dma(dev, phys) (I am not doing it here as unmap
needs the same test and it does not receive phys as a parameter).




-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  4:55               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-24  4:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Greg Kroah-Hartman, linuxppc-dev, linux-kernel, iommu,
	Aneesh Kumar K.V, Robin Murphy



On 24/03/2020 14:37, Alexey Kardashevskiy wrote:
> 
> 
> On 24/03/2020 04:20, Christoph Hellwig wrote:
>> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
>>>>> 0x100.0000.0000 .. 0x101.0000.0000
>>>>>
>>>>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>>>>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>>>>> that "pseries" is done with offset=0x0800.0000.0000.0000.
>>>>>
>>>>> So we want to check every bus address against dev->bus_dma_limit, not
>>>>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>>>>> 0x0800.0001.0000.0000 and 1:1 mapping for the second 4GB would not be
>>>>> tried. Does this sound reasonable? Thanks,
>>>>
>>>> bus_dma_limit is just another limiting factor applied on top of
>>>> coherent_dma_mask or dma_mask respectively.
>>>
>>> This is not enough for the task: in my example, I'd set bus limit to
>>> 0x0800.0001.0000.0000 but this would disable bypass for all RAM
>>> addresses - the first and the second 4GB blocks.
>>
>> So what about something like the version here:
>>
>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3
> 
> 
> dma_alloc_direct() and dma_map_direct() do the same thing now which is
> good, did I miss anything else?
> 
> This lets us disable bypass automatically if this weird memory appears
> in the system but does not let us have 1:1 after that even for normal
> RAM. Thanks,

Ah no, does not help much, simple setting dma_ops_bypass will though.


But eventually, in this function:

static inline bool dma_map_direct(struct device *dev,
               const struct dma_map_ops *ops)
{
       if (likely(!ops))
               return true;
       if (!dev->dma_ops_bypass)
               return false;

       return min_not_zero(*dev->dma_mask, dev->bus_dma_limit) >=
                           dma_direct_get_required_mask(dev);
}


we rather want it to take a dma handle and a size, and add

if (dev->bus_dma_limit)
	return dev->bus_dma_limit > dma_handle + size;


where dma_handle=phys_to_dma(dev, phys) (I am not doing it here as unmap
needs the same test and it does not receive phys as a parameter).




-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-24  3:05               ` Alexey Kardashevskiy
  (?)
@ 2020-03-24  6:30                 ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 94+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-24  6:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Christoph Hellwig
  Cc: iommu, linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 24/03/2020 04:22, Christoph Hellwig wrote:
>> On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
>>>
>>> This is what I was trying, but considering I am new to DMA subsystem, I
>>> am not sure I got all the details correct. The idea is to look at the
>>> cpu addr and see if that can be used in direct map fashion(is
>>> bus_dma_limit the right restriction here?) if not fallback to dynamic
>>> IOMMU mapping.
>> 
>> I don't think we can throw all these complications into the dma
>> mapping code.  At some point I also wonder what the point is,
>> especially for scatterlist mappings, where the iommu can coalesce.
>
> This is for persistent memory which you can DMA to/from but yet it does
> not appear in the system as a normal memory and therefore requires
> special handling anyway (O_DIRECT or DAX, I do not know the exact
> mechanics). All other devices in the system should just run as usual,
> i.e. use 1:1 mapping if possible.

This is O_DIRECT with a user buffer that is actually mmap from a dax
mounted file system.

What we really need is something that will falback to iommu_map_page
based on dma_addr. ie. Something equivalent to current
dma_direct_map_page(), but instead of fallback to swiotlb_map page we
should fallback to iommu_map_page().

Something like?

dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
		unsigned long offset, size_t size, enum dma_data_direction dir,
		unsigned long attrs)
{
	phys_addr_t phys = page_to_phys(page) + offset;
	dma_addr_t dma_addr = phys_to_dma(dev, phys);

	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
			return iommu_map(dev, phys, size, dir, attrs);

		return DMA_MAPPING_ERROR;
	}

....
...


-aneesh

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  6:30                 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 94+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-24  6:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Christoph Hellwig
  Cc: Greg Kroah-Hartman, Joerg Roedel, linuxppc-dev, linux-kernel,
	iommu, Robin Murphy, Lu Baolu

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 24/03/2020 04:22, Christoph Hellwig wrote:
>> On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
>>>
>>> This is what I was trying, but considering I am new to DMA subsystem, I
>>> am not sure I got all the details correct. The idea is to look at the
>>> cpu addr and see if that can be used in direct map fashion(is
>>> bus_dma_limit the right restriction here?) if not fallback to dynamic
>>> IOMMU mapping.
>> 
>> I don't think we can throw all these complications into the dma
>> mapping code.  At some point I also wonder what the point is,
>> especially for scatterlist mappings, where the iommu can coalesce.
>
> This is for persistent memory which you can DMA to/from but yet it does
> not appear in the system as a normal memory and therefore requires
> special handling anyway (O_DIRECT or DAX, I do not know the exact
> mechanics). All other devices in the system should just run as usual,
> i.e. use 1:1 mapping if possible.

This is O_DIRECT with a user buffer that is actually mmap from a dax
mounted file system.

What we really need is something that will falback to iommu_map_page
based on dma_addr. ie. Something equivalent to current
dma_direct_map_page(), but instead of fallback to swiotlb_map page we
should fallback to iommu_map_page().

Something like?

dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
		unsigned long offset, size_t size, enum dma_data_direction dir,
		unsigned long attrs)
{
	phys_addr_t phys = page_to_phys(page) + offset;
	dma_addr_t dma_addr = phys_to_dma(dev, phys);

	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
			return iommu_map(dev, phys, size, dir, attrs);

		return DMA_MAPPING_ERROR;
	}

....
...


-aneesh

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  6:30                 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 94+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-24  6:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Christoph Hellwig
  Cc: Greg Kroah-Hartman, linuxppc-dev, linux-kernel, iommu, Robin Murphy

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 24/03/2020 04:22, Christoph Hellwig wrote:
>> On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
>>>
>>> This is what I was trying, but considering I am new to DMA subsystem, I
>>> am not sure I got all the details correct. The idea is to look at the
>>> cpu addr and see if that can be used in direct map fashion(is
>>> bus_dma_limit the right restriction here?) if not fallback to dynamic
>>> IOMMU mapping.
>> 
>> I don't think we can throw all these complications into the dma
>> mapping code.  At some point I also wonder what the point is,
>> especially for scatterlist mappings, where the iommu can coalesce.
>
> This is for persistent memory which you can DMA to/from but yet it does
> not appear in the system as a normal memory and therefore requires
> special handling anyway (O_DIRECT or DAX, I do not know the exact
> mechanics). All other devices in the system should just run as usual,
> i.e. use 1:1 mapping if possible.

This is O_DIRECT with a user buffer that is actually mmap from a dax
mounted file system.

What we really need is something that will falback to iommu_map_page
based on dma_addr. ie. Something equivalent to current
dma_direct_map_page(), but instead of fallback to swiotlb_map page we
should fallback to iommu_map_page().

Something like?

dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
		unsigned long offset, size_t size, enum dma_data_direction dir,
		unsigned long attrs)
{
	phys_addr_t phys = page_to_phys(page) + offset;
	dma_addr_t dma_addr = phys_to_dma(dev, phys);

	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
			return iommu_map(dev, phys, size, dir, attrs);

		return DMA_MAPPING_ERROR;
	}

....
...


-aneesh
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-24  3:37             ` Alexey Kardashevskiy
  (?)
@ 2020-03-24  7:52               ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Aneesh Kumar K.V

On Tue, Mar 24, 2020 at 02:37:59PM +1100, Alexey Kardashevskiy wrote:
> dma_alloc_direct() and dma_map_direct() do the same thing now which is
> good, did I miss anything else?

dma_alloc_direct looks at coherent_dma_mask, dma_map_direct looks
at dma_mask.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  7:52               ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Tue, Mar 24, 2020 at 02:37:59PM +1100, Alexey Kardashevskiy wrote:
> dma_alloc_direct() and dma_map_direct() do the same thing now which is
> good, did I miss anything else?

dma_alloc_direct looks at coherent_dma_mask, dma_map_direct looks
at dma_mask.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  7:52               ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig

On Tue, Mar 24, 2020 at 02:37:59PM +1100, Alexey Kardashevskiy wrote:
> dma_alloc_direct() and dma_map_direct() do the same thing now which is
> good, did I miss anything else?

dma_alloc_direct looks at coherent_dma_mask, dma_map_direct looks
at dma_mask.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-24  3:05               ` Alexey Kardashevskiy
  (?)
@ 2020-03-24  7:54                 ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, Aneesh Kumar K.V, iommu, linuxppc-dev,
	Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Robin Murphy,
	linux-kernel

On Tue, Mar 24, 2020 at 02:05:54PM +1100, Alexey Kardashevskiy wrote:
> This is for persistent memory which you can DMA to/from but yet it does
> not appear in the system as a normal memory and therefore requires
> special handling anyway (O_DIRECT or DAX, I do not know the exact
> mechanics). All other devices in the system should just run as usual,
> i.e. use 1:1 mapping if possible.

On other systems (x86 and arm) pmem as long as it is page backed does
not require any special handling.  This must be some weird way powerpc
fucked up again, and I suspect you'll have to suffer from it.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  7:54                 ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Tue, Mar 24, 2020 at 02:05:54PM +1100, Alexey Kardashevskiy wrote:
> This is for persistent memory which you can DMA to/from but yet it does
> not appear in the system as a normal memory and therefore requires
> special handling anyway (O_DIRECT or DAX, I do not know the exact
> mechanics). All other devices in the system should just run as usual,
> i.e. use 1:1 mapping if possible.

On other systems (x86 and arm) pmem as long as it is page backed does
not require any special handling.  This must be some weird way powerpc
fucked up again, and I suspect you'll have to suffer from it.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  7:54                 ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig

On Tue, Mar 24, 2020 at 02:05:54PM +1100, Alexey Kardashevskiy wrote:
> This is for persistent memory which you can DMA to/from but yet it does
> not appear in the system as a normal memory and therefore requires
> special handling anyway (O_DIRECT or DAX, I do not know the exact
> mechanics). All other devices in the system should just run as usual,
> i.e. use 1:1 mapping if possible.

On other systems (x86 and arm) pmem as long as it is page backed does
not require any special handling.  This must be some weird way powerpc
fucked up again, and I suspect you'll have to suffer from it.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-24  6:30                 ` Aneesh Kumar K.V
  (?)
@ 2020-03-24  7:55                   ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Alexey Kardashevskiy, Christoph Hellwig, iommu, linuxppc-dev,
	Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Robin Murphy,
	linux-kernel

On Tue, Mar 24, 2020 at 12:00:09PM +0530, Aneesh Kumar K.V wrote:
> dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
> 		unsigned long offset, size_t size, enum dma_data_direction dir,
> 		unsigned long attrs)
> {
> 	phys_addr_t phys = page_to_phys(page) + offset;
> 	dma_addr_t dma_addr = phys_to_dma(dev, phys);
> 
> 	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> 			return iommu_map(dev, phys, size, dir, attrs);
> 
> 		return DMA_MAPPING_ERROR;

If powerpc hardware / firmware people really come up with crap that
stupid you'll have to handle it yourself and will always pay the
indirect call penality.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  7:55                   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Alexey Kardashevskiy, Greg Kroah-Hartman, Joerg Roedel,
	Robin Murphy, linux-kernel, iommu, linuxppc-dev,
	Christoph Hellwig, Lu Baolu

On Tue, Mar 24, 2020 at 12:00:09PM +0530, Aneesh Kumar K.V wrote:
> dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
> 		unsigned long offset, size_t size, enum dma_data_direction dir,
> 		unsigned long attrs)
> {
> 	phys_addr_t phys = page_to_phys(page) + offset;
> 	dma_addr_t dma_addr = phys_to_dma(dev, phys);
> 
> 	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> 			return iommu_map(dev, phys, size, dir, attrs);
> 
> 		return DMA_MAPPING_ERROR;

If powerpc hardware / firmware people really come up with crap that
stupid you'll have to handle it yourself and will always pay the
indirect call penality.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-24  7:55                   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-24  7:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	linuxppc-dev, Christoph Hellwig

On Tue, Mar 24, 2020 at 12:00:09PM +0530, Aneesh Kumar K.V wrote:
> dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
> 		unsigned long offset, size_t size, enum dma_data_direction dir,
> 		unsigned long attrs)
> {
> 	phys_addr_t phys = page_to_phys(page) + offset;
> 	dma_addr_t dma_addr = phys_to_dma(dev, phys);
> 
> 	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> 			return iommu_map(dev, phys, size, dir, attrs);
> 
> 		return DMA_MAPPING_ERROR;

If powerpc hardware / firmware people really come up with crap that
stupid you'll have to handle it yourself and will always pay the
indirect call penality.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-24  7:54                 ` Christoph Hellwig
  (?)
@ 2020-03-25  4:51                   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-25  4:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel



On 24/03/2020 18:54, Christoph Hellwig wrote:
> On Tue, Mar 24, 2020 at 02:05:54PM +1100, Alexey Kardashevskiy wrote:
>> This is for persistent memory which you can DMA to/from but yet it does
>> not appear in the system as a normal memory and therefore requires
>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>> mechanics). All other devices in the system should just run as usual,
>> i.e. use 1:1 mapping if possible.
> 
> On other systems (x86 and arm) pmem as long as it is page backed does
> not require any special handling.  This must be some weird way powerpc
> fucked up again, and I suspect you'll have to suffer from it.


It does not matter if it is backed by pages or not, the problem may also
appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
MMIO might be mapped way too high in the system address space and make
1:1 impossible.


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-25  4:51                   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-25  4:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Greg Kroah-Hartman, linuxppc-dev, Lu Baolu



On 24/03/2020 18:54, Christoph Hellwig wrote:
> On Tue, Mar 24, 2020 at 02:05:54PM +1100, Alexey Kardashevskiy wrote:
>> This is for persistent memory which you can DMA to/from but yet it does
>> not appear in the system as a normal memory and therefore requires
>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>> mechanics). All other devices in the system should just run as usual,
>> i.e. use 1:1 mapping if possible.
> 
> On other systems (x86 and arm) pmem as long as it is page backed does
> not require any special handling.  This must be some weird way powerpc
> fucked up again, and I suspect you'll have to suffer from it.


It does not matter if it is backed by pages or not, the problem may also
appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
MMIO might be mapped way too high in the system address space and make
1:1 impossible.


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-25  4:51                   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-25  4:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Robin Murphy, linux-kernel, iommu,
	Greg Kroah-Hartman, linuxppc-dev



On 24/03/2020 18:54, Christoph Hellwig wrote:
> On Tue, Mar 24, 2020 at 02:05:54PM +1100, Alexey Kardashevskiy wrote:
>> This is for persistent memory which you can DMA to/from but yet it does
>> not appear in the system as a normal memory and therefore requires
>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>> mechanics). All other devices in the system should just run as usual,
>> i.e. use 1:1 mapping if possible.
> 
> On other systems (x86 and arm) pmem as long as it is page backed does
> not require any special handling.  This must be some weird way powerpc
> fucked up again, and I suspect you'll have to suffer from it.


It does not matter if it is backed by pages or not, the problem may also
appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
MMIO might be mapped way too high in the system address space and make
1:1 impossible.


-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-25  4:51                   ` Alexey Kardashevskiy
  (?)
@ 2020-03-25  8:37                     ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-25  8:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, Aneesh Kumar K.V, iommu, linuxppc-dev,
	Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Robin Murphy,
	linux-kernel

On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
> >> This is for persistent memory which you can DMA to/from but yet it does
> >> not appear in the system as a normal memory and therefore requires
> >> special handling anyway (O_DIRECT or DAX, I do not know the exact
> >> mechanics). All other devices in the system should just run as usual,
> >> i.e. use 1:1 mapping if possible.
> > 
> > On other systems (x86 and arm) pmem as long as it is page backed does
> > not require any special handling.  This must be some weird way powerpc
> > fucked up again, and I suspect you'll have to suffer from it.
> 
> 
> It does not matter if it is backed by pages or not, the problem may also
> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
> MMIO might be mapped way too high in the system address space and make
> 1:1 impossible.

How can it be mapped too high for a direct mapping with a 64-bit DMA
mask?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-25  8:37                     ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-25  8:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
> >> This is for persistent memory which you can DMA to/from but yet it does
> >> not appear in the system as a normal memory and therefore requires
> >> special handling anyway (O_DIRECT or DAX, I do not know the exact
> >> mechanics). All other devices in the system should just run as usual,
> >> i.e. use 1:1 mapping if possible.
> > 
> > On other systems (x86 and arm) pmem as long as it is page backed does
> > not require any special handling.  This must be some weird way powerpc
> > fucked up again, and I suspect you'll have to suffer from it.
> 
> 
> It does not matter if it is backed by pages or not, the problem may also
> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
> MMIO might be mapped way too high in the system address space and make
> 1:1 impossible.

How can it be mapped too high for a direct mapping with a 64-bit DMA
mask?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-25  8:37                     ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-03-25  8:37 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig

On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
> >> This is for persistent memory which you can DMA to/from but yet it does
> >> not appear in the system as a normal memory and therefore requires
> >> special handling anyway (O_DIRECT or DAX, I do not know the exact
> >> mechanics). All other devices in the system should just run as usual,
> >> i.e. use 1:1 mapping if possible.
> > 
> > On other systems (x86 and arm) pmem as long as it is page backed does
> > not require any special handling.  This must be some weird way powerpc
> > fucked up again, and I suspect you'll have to suffer from it.
> 
> 
> It does not matter if it is backed by pages or not, the problem may also
> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
> MMIO might be mapped way too high in the system address space and make
> 1:1 impossible.

How can it be mapped too high for a direct mapping with a 64-bit DMA
mask?
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-25  8:37                     ` Christoph Hellwig
  (?)
@ 2020-03-26  1:26                       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-26  1:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel



On 25/03/2020 19:37, Christoph Hellwig wrote:
> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>> not appear in the system as a normal memory and therefore requires
>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>> mechanics). All other devices in the system should just run as usual,
>>>> i.e. use 1:1 mapping if possible.
>>>
>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>> not require any special handling.  This must be some weird way powerpc
>>> fucked up again, and I suspect you'll have to suffer from it.
>>
>>
>> It does not matter if it is backed by pages or not, the problem may also
>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>> MMIO might be mapped way too high in the system address space and make
>> 1:1 impossible.
> 
> How can it be mapped too high for a direct mapping with a 64-bit DMA
> mask?

The window size is limited and often it is not even sparse. It requires
an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
guest physical address space outside of this 1TB (on PPC).


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-26  1:26                       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-26  1:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Greg Kroah-Hartman, linuxppc-dev, Lu Baolu



On 25/03/2020 19:37, Christoph Hellwig wrote:
> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>> not appear in the system as a normal memory and therefore requires
>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>> mechanics). All other devices in the system should just run as usual,
>>>> i.e. use 1:1 mapping if possible.
>>>
>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>> not require any special handling.  This must be some weird way powerpc
>>> fucked up again, and I suspect you'll have to suffer from it.
>>
>>
>> It does not matter if it is backed by pages or not, the problem may also
>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>> MMIO might be mapped way too high in the system address space and make
>> 1:1 impossible.
> 
> How can it be mapped too high for a direct mapping with a 64-bit DMA
> mask?

The window size is limited and often it is not even sparse. It requires
an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
guest physical address space outside of this 1TB (on PPC).


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-03-26  1:26                       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-03-26  1:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Robin Murphy, linux-kernel, iommu,
	Greg Kroah-Hartman, linuxppc-dev



On 25/03/2020 19:37, Christoph Hellwig wrote:
> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>> not appear in the system as a normal memory and therefore requires
>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>> mechanics). All other devices in the system should just run as usual,
>>>> i.e. use 1:1 mapping if possible.
>>>
>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>> not require any special handling.  This must be some weird way powerpc
>>> fucked up again, and I suspect you'll have to suffer from it.
>>
>>
>> It does not matter if it is backed by pages or not, the problem may also
>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>> MMIO might be mapped way too high in the system address space and make
>> 1:1 impossible.
> 
> How can it be mapped too high for a direct mapping with a 64-bit DMA
> mask?

The window size is limited and often it is not even sparse. It requires
an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
guest physical address space outside of this 1TB (on PPC).


-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-03-26  1:26                       ` Alexey Kardashevskiy
  (?)
@ 2020-04-03  8:38                         ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-03  8:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Michael Ellerman



On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
> 
> 
> On 25/03/2020 19:37, Christoph Hellwig wrote:
>> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>>> not appear in the system as a normal memory and therefore requires
>>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>>> mechanics). All other devices in the system should just run as usual,
>>>>> i.e. use 1:1 mapping if possible.
>>>>
>>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>>> not require any special handling.  This must be some weird way powerpc
>>>> fucked up again, and I suspect you'll have to suffer from it.
>>>
>>>
>>> It does not matter if it is backed by pages or not, the problem may also
>>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>>> MMIO might be mapped way too high in the system address space and make
>>> 1:1 impossible.
>>
>> How can it be mapped too high for a direct mapping with a 64-bit DMA
>> mask?
> 
> The window size is limited and often it is not even sparse. It requires
> an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
> 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
> guest physical address space outside of this 1TB (on PPC).
> 
> 

I am trying now this approach on top of yours "dma-bypass.3" (it is
"wip", needs an upper boundary check):

https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d

Do you see any serious problem with this approach? Thanks!



-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-03  8:38                         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-03  8:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Greg Kroah-Hartman, linuxppc-dev, Lu Baolu



On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
> 
> 
> On 25/03/2020 19:37, Christoph Hellwig wrote:
>> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>>> not appear in the system as a normal memory and therefore requires
>>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>>> mechanics). All other devices in the system should just run as usual,
>>>>> i.e. use 1:1 mapping if possible.
>>>>
>>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>>> not require any special handling.  This must be some weird way powerpc
>>>> fucked up again, and I suspect you'll have to suffer from it.
>>>
>>>
>>> It does not matter if it is backed by pages or not, the problem may also
>>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>>> MMIO might be mapped way too high in the system address space and make
>>> 1:1 impossible.
>>
>> How can it be mapped too high for a direct mapping with a 64-bit DMA
>> mask?
> 
> The window size is limited and often it is not even sparse. It requires
> an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
> 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
> guest physical address space outside of this 1TB (on PPC).
> 
> 

I am trying now this approach on top of yours "dma-bypass.3" (it is
"wip", needs an upper boundary check):

https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d

Do you see any serious problem with this approach? Thanks!



-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-03  8:38                         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-03  8:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Robin Murphy, linux-kernel, iommu,
	Michael Ellerman, Greg Kroah-Hartman, linuxppc-dev



On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
> 
> 
> On 25/03/2020 19:37, Christoph Hellwig wrote:
>> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>>> not appear in the system as a normal memory and therefore requires
>>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>>> mechanics). All other devices in the system should just run as usual,
>>>>> i.e. use 1:1 mapping if possible.
>>>>
>>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>>> not require any special handling.  This must be some weird way powerpc
>>>> fucked up again, and I suspect you'll have to suffer from it.
>>>
>>>
>>> It does not matter if it is backed by pages or not, the problem may also
>>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>>> MMIO might be mapped way too high in the system address space and make
>>> 1:1 impossible.
>>
>> How can it be mapped too high for a direct mapping with a 64-bit DMA
>> mask?
> 
> The window size is limited and often it is not even sparse. It requires
> an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
> 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
> guest physical address space outside of this 1TB (on PPC).
> 
> 

I am trying now this approach on top of yours "dma-bypass.3" (it is
"wip", needs an upper boundary check):

https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d

Do you see any serious problem with this approach? Thanks!



-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-04-03  8:38                         ` Alexey Kardashevskiy
  (?)
@ 2020-04-06 11:50                           ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-06 11:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, Aneesh Kumar K.V, iommu, linuxppc-dev,
	Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Robin Murphy,
	linux-kernel, Michael Ellerman

On Fri, Apr 03, 2020 at 07:38:11PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 25/03/2020 19:37, Christoph Hellwig wrote:
> >> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
> >>>>> This is for persistent memory which you can DMA to/from but yet it does
> >>>>> not appear in the system as a normal memory and therefore requires
> >>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
> >>>>> mechanics). All other devices in the system should just run as usual,
> >>>>> i.e. use 1:1 mapping if possible.
> >>>>
> >>>> On other systems (x86 and arm) pmem as long as it is page backed does
> >>>> not require any special handling.  This must be some weird way powerpc
> >>>> fucked up again, and I suspect you'll have to suffer from it.
> >>>
> >>>
> >>> It does not matter if it is backed by pages or not, the problem may also
> >>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
> >>> MMIO might be mapped way too high in the system address space and make
> >>> 1:1 impossible.
> >>
> >> How can it be mapped too high for a direct mapping with a 64-bit DMA
> >> mask?
> > 
> > The window size is limited and often it is not even sparse. It requires
> > an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
> > 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
> > guest physical address space outside of this 1TB (on PPC).
> > 
> > 
> 
> I am trying now this approach on top of yours "dma-bypass.3" (it is
> "wip", needs an upper boundary check):
> 
> https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d
> 
> Do you see any serious problem with this approach? Thanks!

Do you have a link to the whole branch?  The github UI is unfortunately
unusable for that (or I'm missing something).

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-06 11:50                           ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-06 11:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Fri, Apr 03, 2020 at 07:38:11PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 25/03/2020 19:37, Christoph Hellwig wrote:
> >> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
> >>>>> This is for persistent memory which you can DMA to/from but yet it does
> >>>>> not appear in the system as a normal memory and therefore requires
> >>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
> >>>>> mechanics). All other devices in the system should just run as usual,
> >>>>> i.e. use 1:1 mapping if possible.
> >>>>
> >>>> On other systems (x86 and arm) pmem as long as it is page backed does
> >>>> not require any special handling.  This must be some weird way powerpc
> >>>> fucked up again, and I suspect you'll have to suffer from it.
> >>>
> >>>
> >>> It does not matter if it is backed by pages or not, the problem may also
> >>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
> >>> MMIO might be mapped way too high in the system address space and make
> >>> 1:1 impossible.
> >>
> >> How can it be mapped too high for a direct mapping with a 64-bit DMA
> >> mask?
> > 
> > The window size is limited and often it is not even sparse. It requires
> > an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
> > 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
> > guest physical address space outside of this 1TB (on PPC).
> > 
> > 
> 
> I am trying now this approach on top of yours "dma-bypass.3" (it is
> "wip", needs an upper boundary check):
> 
> https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d
> 
> Do you see any serious problem with this approach? Thanks!

Do you have a link to the whole branch?  The github UI is unfortunately
unusable for that (or I'm missing something).

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-06 11:50                           ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-06 11:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Michael Ellerman, Aneesh Kumar K.V, linuxppc-dev,
	Christoph Hellwig

On Fri, Apr 03, 2020 at 07:38:11PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 25/03/2020 19:37, Christoph Hellwig wrote:
> >> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
> >>>>> This is for persistent memory which you can DMA to/from but yet it does
> >>>>> not appear in the system as a normal memory and therefore requires
> >>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
> >>>>> mechanics). All other devices in the system should just run as usual,
> >>>>> i.e. use 1:1 mapping if possible.
> >>>>
> >>>> On other systems (x86 and arm) pmem as long as it is page backed does
> >>>> not require any special handling.  This must be some weird way powerpc
> >>>> fucked up again, and I suspect you'll have to suffer from it.
> >>>
> >>>
> >>> It does not matter if it is backed by pages or not, the problem may also
> >>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
> >>> MMIO might be mapped way too high in the system address space and make
> >>> 1:1 impossible.
> >>
> >> How can it be mapped too high for a direct mapping with a 64-bit DMA
> >> mask?
> > 
> > The window size is limited and often it is not even sparse. It requires
> > an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
> > 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
> > guest physical address space outside of this 1TB (on PPC).
> > 
> > 
> 
> I am trying now this approach on top of yours "dma-bypass.3" (it is
> "wip", needs an upper boundary check):
> 
> https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d
> 
> Do you see any serious problem with this approach? Thanks!

Do you have a link to the whole branch?  The github UI is unfortunately
unusable for that (or I'm missing something).
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-04-06 11:50                           ` Christoph Hellwig
  (?)
@ 2020-04-06 13:25                             ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06 13:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Michael Ellerman



On 06/04/2020 21:50, Christoph Hellwig wrote:
> On Fri, Apr 03, 2020 at 07:38:11PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 25/03/2020 19:37, Christoph Hellwig wrote:
>>>> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>>>>> not appear in the system as a normal memory and therefore requires
>>>>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>>>>> mechanics). All other devices in the system should just run as usual,
>>>>>>> i.e. use 1:1 mapping if possible.
>>>>>>
>>>>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>>>>> not require any special handling.  This must be some weird way powerpc
>>>>>> fucked up again, and I suspect you'll have to suffer from it.
>>>>>
>>>>>
>>>>> It does not matter if it is backed by pages or not, the problem may also
>>>>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>>>>> MMIO might be mapped way too high in the system address space and make
>>>>> 1:1 impossible.
>>>>
>>>> How can it be mapped too high for a direct mapping with a 64-bit DMA
>>>> mask?
>>>
>>> The window size is limited and often it is not even sparse. It requires
>>> an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
>>> 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
>>> guest physical address space outside of this 1TB (on PPC).
>>>
>>>
>>
>> I am trying now this approach on top of yours "dma-bypass.3" (it is
>> "wip", needs an upper boundary check):
>>
>> https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d
>>
>> Do you see any serious problem with this approach? Thanks!
> 
> Do you have a link to the whole branch?  The github UI is unfortunately
> unusable for that (or I'm missing something).

The UI shows the branch but since I rebased and forcepushed it, it does
not. Here is the current one with:

https://github.com/aik/linux/commits/dma-bypass.3


Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-06 13:25                             ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06 13:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Greg Kroah-Hartman, linuxppc-dev, Lu Baolu



On 06/04/2020 21:50, Christoph Hellwig wrote:
> On Fri, Apr 03, 2020 at 07:38:11PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 25/03/2020 19:37, Christoph Hellwig wrote:
>>>> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>>>>> not appear in the system as a normal memory and therefore requires
>>>>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>>>>> mechanics). All other devices in the system should just run as usual,
>>>>>>> i.e. use 1:1 mapping if possible.
>>>>>>
>>>>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>>>>> not require any special handling.  This must be some weird way powerpc
>>>>>> fucked up again, and I suspect you'll have to suffer from it.
>>>>>
>>>>>
>>>>> It does not matter if it is backed by pages or not, the problem may also
>>>>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>>>>> MMIO might be mapped way too high in the system address space and make
>>>>> 1:1 impossible.
>>>>
>>>> How can it be mapped too high for a direct mapping with a 64-bit DMA
>>>> mask?
>>>
>>> The window size is limited and often it is not even sparse. It requires
>>> an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
>>> 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
>>> guest physical address space outside of this 1TB (on PPC).
>>>
>>>
>>
>> I am trying now this approach on top of yours "dma-bypass.3" (it is
>> "wip", needs an upper boundary check):
>>
>> https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d
>>
>> Do you see any serious problem with this approach? Thanks!
> 
> Do you have a link to the whole branch?  The github UI is unfortunately
> unusable for that (or I'm missing something).

The UI shows the branch but since I rebased and forcepushed it, it does
not. Here is the current one with:

https://github.com/aik/linux/commits/dma-bypass.3


Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-06 13:25                             ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-06 13:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Robin Murphy, linux-kernel, iommu,
	Michael Ellerman, Greg Kroah-Hartman, linuxppc-dev



On 06/04/2020 21:50, Christoph Hellwig wrote:
> On Fri, Apr 03, 2020 at 07:38:11PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 26/03/2020 12:26, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 25/03/2020 19:37, Christoph Hellwig wrote:
>>>> On Wed, Mar 25, 2020 at 03:51:36PM +1100, Alexey Kardashevskiy wrote:
>>>>>>> This is for persistent memory which you can DMA to/from but yet it does
>>>>>>> not appear in the system as a normal memory and therefore requires
>>>>>>> special handling anyway (O_DIRECT or DAX, I do not know the exact
>>>>>>> mechanics). All other devices in the system should just run as usual,
>>>>>>> i.e. use 1:1 mapping if possible.
>>>>>>
>>>>>> On other systems (x86 and arm) pmem as long as it is page backed does
>>>>>> not require any special handling.  This must be some weird way powerpc
>>>>>> fucked up again, and I suspect you'll have to suffer from it.
>>>>>
>>>>>
>>>>> It does not matter if it is backed by pages or not, the problem may also
>>>>> appear if we wanted for example p2p PCI via IOMMU (between PHBs) and
>>>>> MMIO might be mapped way too high in the system address space and make
>>>>> 1:1 impossible.
>>>>
>>>> How can it be mapped too high for a direct mapping with a 64-bit DMA
>>>> mask?
>>>
>>> The window size is limited and often it is not even sparse. It requires
>>> an 8 byte entry per an IOMMU page (which is most commonly is 64k max) so
>>> 1TB limit (a guest RAM size) is a quite real thing. MMIO is mapped to
>>> guest physical address space outside of this 1TB (on PPC).
>>>
>>>
>>
>> I am trying now this approach on top of yours "dma-bypass.3" (it is
>> "wip", needs an upper boundary check):
>>
>> https://github.com/aik/linux/commit/49d73c7771e3f6054804f6cfa80b4e320111662d
>>
>> Do you see any serious problem with this approach? Thanks!
> 
> Do you have a link to the whole branch?  The github UI is unfortunately
> unusable for that (or I'm missing something).

The UI shows the branch but since I rebased and forcepushed it, it does
not. Here is the current one with:

https://github.com/aik/linux/commits/dma-bypass.3


Thanks,


-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-04-06 13:25                             ` Alexey Kardashevskiy
  (?)
@ 2020-04-06 17:17                               ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-06 17:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, Aneesh Kumar K.V, iommu, linuxppc-dev,
	Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Robin Murphy,
	linux-kernel, Michael Ellerman

On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
> >> Do you see any serious problem with this approach? Thanks!
> > 
> > Do you have a link to the whole branch?  The github UI is unfortunately
> > unusable for that (or I'm missing something).
> 
> The UI shows the branch but since I rebased and forcepushed it, it does
> not. Here is the current one with:
> 
> https://github.com/aik/linux/commits/dma-bypass.3

Ok, so we use the core bypass without persistent memory, and then
have another bypass mode on top.  Not great, but I can't think
of anything better.  Note that your checks for the map_sg case
aren't very efficient - for one it would make sense to calculate
the limit only once, but also it would make sense to reuse the
calculted diecect mapping addresses instead of doing another pass
later on in the dma-direct code.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-06 17:17                               ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-06 17:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
> >> Do you see any serious problem with this approach? Thanks!
> > 
> > Do you have a link to the whole branch?  The github UI is unfortunately
> > unusable for that (or I'm missing something).
> 
> The UI shows the branch but since I rebased and forcepushed it, it does
> not. Here is the current one with:
> 
> https://github.com/aik/linux/commits/dma-bypass.3

Ok, so we use the core bypass without persistent memory, and then
have another bypass mode on top.  Not great, but I can't think
of anything better.  Note that your checks for the map_sg case
aren't very efficient - for one it would make sense to calculate
the limit only once, but also it would make sense to reuse the
calculted diecect mapping addresses instead of doing another pass
later on in the dma-direct code.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-06 17:17                               ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-06 17:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Michael Ellerman, Aneesh Kumar K.V, linuxppc-dev,
	Christoph Hellwig

On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
> >> Do you see any serious problem with this approach? Thanks!
> > 
> > Do you have a link to the whole branch?  The github UI is unfortunately
> > unusable for that (or I'm missing something).
> 
> The UI shows the branch but since I rebased and forcepushed it, it does
> not. Here is the current one with:
> 
> https://github.com/aik/linux/commits/dma-bypass.3

Ok, so we use the core bypass without persistent memory, and then
have another bypass mode on top.  Not great, but I can't think
of anything better.  Note that your checks for the map_sg case
aren't very efficient - for one it would make sense to calculate
the limit only once, but also it would make sense to reuse the
calculted diecect mapping addresses instead of doing another pass
later on in the dma-direct code.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-04-06 17:17                               ` Christoph Hellwig
  (?)
@ 2020-04-07 10:12                                 ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-07 10:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Michael Ellerman



On 07/04/2020 03:17, Christoph Hellwig wrote:
> On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
>>>> Do you see any serious problem with this approach? Thanks!
>>>
>>> Do you have a link to the whole branch?  The github UI is unfortunately
>>> unusable for that (or I'm missing something).
>>
>> The UI shows the branch but since I rebased and forcepushed it, it does
>> not. Here is the current one with:
>>
>> https://github.com/aik/linux/commits/dma-bypass.3
> 
> Ok, so we use the core bypass without persistent memory, and then
> have another bypass mode on top.  Not great, but I can't think
> of anything better.  Note that your checks for the map_sg case
> aren't very efficient - for one it would make sense to calculate
> the limit only once, 

Good points, I'll post revised version when you post your v3 of this.

> but also it would make sense to reuse the
> calculted diecect mapping addresses instead of doing another pass
> later on in the dma-direct code.

Probably but I wonder what kind of hardware we need to see the
difference. I might try, just need to ride to the office to plug the
cable in my 100GBit eth machines :) Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-07 10:12                                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-07 10:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Greg Kroah-Hartman, linuxppc-dev, Lu Baolu



On 07/04/2020 03:17, Christoph Hellwig wrote:
> On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
>>>> Do you see any serious problem with this approach? Thanks!
>>>
>>> Do you have a link to the whole branch?  The github UI is unfortunately
>>> unusable for that (or I'm missing something).
>>
>> The UI shows the branch but since I rebased and forcepushed it, it does
>> not. Here is the current one with:
>>
>> https://github.com/aik/linux/commits/dma-bypass.3
> 
> Ok, so we use the core bypass without persistent memory, and then
> have another bypass mode on top.  Not great, but I can't think
> of anything better.  Note that your checks for the map_sg case
> aren't very efficient - for one it would make sense to calculate
> the limit only once, 

Good points, I'll post revised version when you post your v3 of this.

> but also it would make sense to reuse the
> calculted diecect mapping addresses instead of doing another pass
> later on in the dma-direct code.

Probably but I wonder what kind of hardware we need to see the
difference. I might try, just need to ride to the office to plug the
cable in my 100GBit eth machines :) Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-07 10:12                                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-07 10:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Robin Murphy, linux-kernel, iommu,
	Michael Ellerman, Greg Kroah-Hartman, linuxppc-dev



On 07/04/2020 03:17, Christoph Hellwig wrote:
> On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
>>>> Do you see any serious problem with this approach? Thanks!
>>>
>>> Do you have a link to the whole branch?  The github UI is unfortunately
>>> unusable for that (or I'm missing something).
>>
>> The UI shows the branch but since I rebased and forcepushed it, it does
>> not. Here is the current one with:
>>
>> https://github.com/aik/linux/commits/dma-bypass.3
> 
> Ok, so we use the core bypass without persistent memory, and then
> have another bypass mode on top.  Not great, but I can't think
> of anything better.  Note that your checks for the map_sg case
> aren't very efficient - for one it would make sense to calculate
> the limit only once, 

Good points, I'll post revised version when you post your v3 of this.

> but also it would make sense to reuse the
> calculted diecect mapping addresses instead of doing another pass
> later on in the dma-direct code.

Probably but I wonder what kind of hardware we need to see the
difference. I might try, just need to ride to the office to plug the
cable in my 100GBit eth machines :) Thanks,


-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-04-07 10:12                                 ` Alexey Kardashevskiy
  (?)
@ 2020-04-14  6:21                                   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-14  6:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, iommu, linuxppc-dev, Lu Baolu,
	Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	Michael Ellerman



On 07/04/2020 20:12, Alexey Kardashevskiy wrote:
> 
> 
> On 07/04/2020 03:17, Christoph Hellwig wrote:
>> On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
>>>>> Do you see any serious problem with this approach? Thanks!
>>>>
>>>> Do you have a link to the whole branch?  The github UI is unfortunately
>>>> unusable for that (or I'm missing something).
>>>
>>> The UI shows the branch but since I rebased and forcepushed it, it does
>>> not. Here is the current one with:
>>>
>>> https://github.com/aik/linux/commits/dma-bypass.3
>>
>> Ok, so we use the core bypass without persistent memory, and then
>> have another bypass mode on top.  Not great, but I can't think
>> of anything better.  Note that your checks for the map_sg case
>> aren't very efficient - for one it would make sense to calculate
>> the limit only once, 
> 
> Good points, I'll post revised version when you post your v3 of this.



Any plans on posting v3 of this? Thanks,


> 
>> but also it would make sense to reuse the
>> calculted diecect mapping addresses instead of doing another pass
>> later on in the dma-direct code.
> 
> Probably but I wonder what kind of hardware we need to see the
> difference. I might try, just need to ride to the office to plug the
> cable in my 100GBit eth machines :) Thanks,
> 
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-14  6:21                                   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-14  6:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Greg Kroah-Hartman, linuxppc-dev, Lu Baolu



On 07/04/2020 20:12, Alexey Kardashevskiy wrote:
> 
> 
> On 07/04/2020 03:17, Christoph Hellwig wrote:
>> On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
>>>>> Do you see any serious problem with this approach? Thanks!
>>>>
>>>> Do you have a link to the whole branch?  The github UI is unfortunately
>>>> unusable for that (or I'm missing something).
>>>
>>> The UI shows the branch but since I rebased and forcepushed it, it does
>>> not. Here is the current one with:
>>>
>>> https://github.com/aik/linux/commits/dma-bypass.3
>>
>> Ok, so we use the core bypass without persistent memory, and then
>> have another bypass mode on top.  Not great, but I can't think
>> of anything better.  Note that your checks for the map_sg case
>> aren't very efficient - for one it would make sense to calculate
>> the limit only once, 
> 
> Good points, I'll post revised version when you post your v3 of this.



Any plans on posting v3 of this? Thanks,


> 
>> but also it would make sense to reuse the
>> calculted diecect mapping addresses instead of doing another pass
>> later on in the dma-direct code.
> 
> Probably but I wonder what kind of hardware we need to see the
> difference. I might try, just need to ride to the office to plug the
> cable in my 100GBit eth machines :) Thanks,
> 
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-14  6:21                                   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 94+ messages in thread
From: Alexey Kardashevskiy @ 2020-04-14  6:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Aneesh Kumar K.V, Robin Murphy, linux-kernel, iommu,
	Michael Ellerman, Greg Kroah-Hartman, linuxppc-dev



On 07/04/2020 20:12, Alexey Kardashevskiy wrote:
> 
> 
> On 07/04/2020 03:17, Christoph Hellwig wrote:
>> On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
>>>>> Do you see any serious problem with this approach? Thanks!
>>>>
>>>> Do you have a link to the whole branch?  The github UI is unfortunately
>>>> unusable for that (or I'm missing something).
>>>
>>> The UI shows the branch but since I rebased and forcepushed it, it does
>>> not. Here is the current one with:
>>>
>>> https://github.com/aik/linux/commits/dma-bypass.3
>>
>> Ok, so we use the core bypass without persistent memory, and then
>> have another bypass mode on top.  Not great, but I can't think
>> of anything better.  Note that your checks for the map_sg case
>> aren't very efficient - for one it would make sense to calculate
>> the limit only once, 
> 
> Good points, I'll post revised version when you post your v3 of this.



Any plans on posting v3 of this? Thanks,


> 
>> but also it would make sense to reuse the
>> calculted diecect mapping addresses instead of doing another pass
>> later on in the dma-direct code.
> 
> Probably but I wonder what kind of hardware we need to see the
> difference. I might try, just need to ride to the office to plug the
> cable in my 100GBit eth machines :) Thanks,
> 
> 

-- 
Alexey
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2020-04-14  6:21                                   ` Alexey Kardashevskiy
  (?)
@ 2020-04-14  6:30                                     ` Christoph Hellwig
  -1 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-14  6:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Christoph Hellwig, Aneesh Kumar K.V, iommu, linuxppc-dev,
	Lu Baolu, Greg Kroah-Hartman, Joerg Roedel, Robin Murphy,
	linux-kernel, Michael Ellerman

On Tue, Apr 14, 2020 at 04:21:27PM +1000, Alexey Kardashevskiy wrote:
> > Good points, I'll post revised version when you post your v3 of this.
> 
> 
> 
> Any plans on posting v3 of this? Thanks,

Just back from a long weekend.  I'll take a stab at it soon.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-14  6:30                                     ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-14  6:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Joerg Roedel, Robin Murphy, linux-kernel,
	iommu, Aneesh Kumar K.V, linuxppc-dev, Christoph Hellwig,
	Lu Baolu

On Tue, Apr 14, 2020 at 04:21:27PM +1000, Alexey Kardashevskiy wrote:
> > Good points, I'll post revised version when you post your v3 of this.
> 
> 
> 
> Any plans on posting v3 of this? Thanks,

Just back from a long weekend.  I'll take a stab at it soon.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2020-04-14  6:30                                     ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2020-04-14  6:30 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, Robin Murphy, linux-kernel, iommu,
	Michael Ellerman, Aneesh Kumar K.V, linuxppc-dev,
	Christoph Hellwig

On Tue, Apr 14, 2020 at 04:21:27PM +1000, Alexey Kardashevskiy wrote:
> > Good points, I'll post revised version when you post your v3 of this.
> 
> 
> 
> Any plans on posting v3 of this? Thanks,

Just back from a long weekend.  I'll take a stab at it soon.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to, struct device
@ 2020-03-24  9:39 Christian Zigotzky
  0 siblings, 0 replies; 94+ messages in thread
From: Christian Zigotzky @ 2020-03-24  9:39 UTC (permalink / raw)
  To: linuxppc-dev

Hi All,

The DMA mapping works great on our PowerPC machines currently. It was a 
long way to get the new DMA mapping code to work successfully on our 
PowerPC machines.

P L E A S E  don't modify the good working DMA mapping code. There are 
many other topics which needs improvements. For us (first level + second 
level support) it is really laborious to find your problematic code and 
patch it. It takes a long time to find the problematic code because we 
have to do it besides our main work.

P L E A S E test your code on PowerPC machines before you add it to the 
mainline vanilla kernel.

Thanks,
Christian


On Tue, Mar 24, 2020 at 12:00:09PM +0530, Aneesh Kumar K.V wrote:
 > dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 >         unsigned long offset, size_t size, enum dma_data_direction dir,
 >         unsigned long attrs)
 > {
 >     phys_addr_t phys = page_to_phys(page) + offset;
 >     dma_addr_t dma_addr = phys_to_dma(dev, phys);
 >
 >     if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
 >             return iommu_map(dev, phys, size, dir, attrs);
 >
 >         return DMA_MAPPING_ERROR;

If powerpc hardware / firmware people really come up with crap that
stupid you'll have to handle it yourself and will always pay the
indirect call penality.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
  2019-11-13 13:37 generic DMA bypass flag Christoph Hellwig
  2019-11-13 13:37   ` Christoph Hellwig
@ 2019-11-13 13:37   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2019-11-13 13:37 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: linuxppc-dev, Lu Baolu, Greg Kroah-Hartman, Robin Murphy, linux-kernel

Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  4 ++++
 include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
 kernel/dma/mapping.c        | 35 ++++++++++++++++++++++++++---------
 3 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 297239a08bb7..b8a3b4ec46bd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1217,6 +1217,9 @@ struct dev_links_info {
  *              device.
  * @dma_coherent: this particular device is dma coherent, even if the
  *		architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*).
+ *		This flag is managed by the dma_ops from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -1316,6 +1319,7 @@ struct device {
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
 	bool			dma_coherent:1;
 #endif
+	bool			dma_ops_bypass : 1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4d450672b7d6..22fe74179e02 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -191,9 +191,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_map_direct(struct device *dev,
+		const struct dma_map_ops *ops)
 {
-	return likely(!ops);
+	return likely(!ops) || dev->dma_ops_bypass;
 }
 
 /*
@@ -282,7 +288,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -297,7 +303,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
@@ -316,7 +322,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	int ents;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
 	else
 		ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -334,7 +340,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_unmap_sg(dev, sg, nents, dir);
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
 	else if (ops->unmap_sg)
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -355,7 +361,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
 	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
 		return DMA_MAPPING_ERROR;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
 	else if (ops->map_resource)
 		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
@@ -371,7 +377,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (!dma_is_direct(ops) && ops->unmap_resource)
+	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
 		ops->unmap_resource(dev, addr, size, dir, attrs);
 	debug_dma_unmap_resource(dev, addr, size, dir);
 }
@@ -383,7 +389,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 	else if (ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr, size, dir);
@@ -397,7 +403,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_device(dev, addr, size, dir);
 	else if (ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr, size, dir);
@@ -411,7 +417,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_cpu)
 		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
@@ -425,7 +431,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_device)
 		ops->sync_sg_for_device(dev, sg, nelems, dir);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 12ff766ec1fa..fdb6e16c1b00 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,6 +105,19 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 }
 EXPORT_SYMBOL(dmam_alloc_attrs);
 
+/*
+ * Check if the devices uses a direct mapping for DMA allocations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_alloc_direct(struct device *dev,
+		const struct dma_map_ops *ops)
+{
+	return likely(!ops) ||
+		(dev->dma_ops_bypass &&
+		 dma_direct_supported(dev, dev->coherent_dma_mask));
+}
+
 /*
  * Create scatter-list for the already allocated DMA buffer.
  */
@@ -138,7 +151,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
 				size, attrs);
 	if (!ops->get_sgtable)
@@ -206,7 +219,7 @@ bool dma_can_mmap(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_can_mmap(dev);
 	return ops->mmap != NULL;
 }
@@ -231,7 +244,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
 				attrs);
 	if (!ops->mmap)
@@ -244,7 +257,7 @@ u64 dma_get_required_mask(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		return dma_direct_get_required_mask(dev);
 	if (ops->get_required_mask)
 		return ops->get_required_mask(dev);
@@ -275,7 +288,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	/* let the implementation decide on the zone to allocate from: */
 	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	else if (ops->alloc)
 		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
@@ -307,7 +320,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 		return;
 
 	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
 	else if (ops->free)
 		ops->free(dev, size, cpu_addr, dma_handle, attrs);
@@ -318,7 +331,11 @@ int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	/*
+	 * Only call the dma-direct version if we really do not have any ops
+	 * set, as the dma_supported op will set the dma_ops_bypass flag.
+	 */
+	if (!ops)
 		return dma_direct_supported(dev, mask);
 	if (!ops->dma_supported)
 		return 1;
@@ -374,7 +391,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		arch_dma_cache_sync(dev, vaddr, size, dir);
 	else if (ops->cache_sync)
 		ops->cache_sync(dev, vaddr, size, dir);
@@ -386,7 +403,7 @@ size_t dma_max_mapping_size(struct device *dev)
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	size_t size = SIZE_MAX;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		size = dma_direct_max_mapping_size(dev);
 	else if (ops && ops->max_mapping_size)
 		size = ops->max_mapping_size(dev);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2019-11-13 13:37   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2019-11-13 13:37 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, linuxppc-dev, Robin Murphy, linux-kernel, Lu Baolu

Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  4 ++++
 include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
 kernel/dma/mapping.c        | 35 ++++++++++++++++++++++++++---------
 3 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 297239a08bb7..b8a3b4ec46bd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1217,6 +1217,9 @@ struct dev_links_info {
  *              device.
  * @dma_coherent: this particular device is dma coherent, even if the
  *		architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*).
+ *		This flag is managed by the dma_ops from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -1316,6 +1319,7 @@ struct device {
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
 	bool			dma_coherent:1;
 #endif
+	bool			dma_ops_bypass : 1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4d450672b7d6..22fe74179e02 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -191,9 +191,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_map_direct(struct device *dev,
+		const struct dma_map_ops *ops)
 {
-	return likely(!ops);
+	return likely(!ops) || dev->dma_ops_bypass;
 }
 
 /*
@@ -282,7 +288,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -297,7 +303,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
@@ -316,7 +322,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	int ents;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
 	else
 		ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -334,7 +340,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_unmap_sg(dev, sg, nents, dir);
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
 	else if (ops->unmap_sg)
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -355,7 +361,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
 	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
 		return DMA_MAPPING_ERROR;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
 	else if (ops->map_resource)
 		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
@@ -371,7 +377,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (!dma_is_direct(ops) && ops->unmap_resource)
+	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
 		ops->unmap_resource(dev, addr, size, dir, attrs);
 	debug_dma_unmap_resource(dev, addr, size, dir);
 }
@@ -383,7 +389,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 	else if (ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr, size, dir);
@@ -397,7 +403,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_device(dev, addr, size, dir);
 	else if (ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr, size, dir);
@@ -411,7 +417,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_cpu)
 		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
@@ -425,7 +431,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_device)
 		ops->sync_sg_for_device(dev, sg, nelems, dir);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 12ff766ec1fa..fdb6e16c1b00 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,6 +105,19 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 }
 EXPORT_SYMBOL(dmam_alloc_attrs);
 
+/*
+ * Check if the devices uses a direct mapping for DMA allocations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_alloc_direct(struct device *dev,
+		const struct dma_map_ops *ops)
+{
+	return likely(!ops) ||
+		(dev->dma_ops_bypass &&
+		 dma_direct_supported(dev, dev->coherent_dma_mask));
+}
+
 /*
  * Create scatter-list for the already allocated DMA buffer.
  */
@@ -138,7 +151,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
 				size, attrs);
 	if (!ops->get_sgtable)
@@ -206,7 +219,7 @@ bool dma_can_mmap(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_can_mmap(dev);
 	return ops->mmap != NULL;
 }
@@ -231,7 +244,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
 				attrs);
 	if (!ops->mmap)
@@ -244,7 +257,7 @@ u64 dma_get_required_mask(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		return dma_direct_get_required_mask(dev);
 	if (ops->get_required_mask)
 		return ops->get_required_mask(dev);
@@ -275,7 +288,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	/* let the implementation decide on the zone to allocate from: */
 	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	else if (ops->alloc)
 		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
@@ -307,7 +320,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 		return;
 
 	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
 	else if (ops->free)
 		ops->free(dev, size, cpu_addr, dma_handle, attrs);
@@ -318,7 +331,11 @@ int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	/*
+	 * Only call the dma-direct version if we really do not have any ops
+	 * set, as the dma_supported op will set the dma_ops_bypass flag.
+	 */
+	if (!ops)
 		return dma_direct_supported(dev, mask);
 	if (!ops->dma_supported)
 		return 1;
@@ -374,7 +391,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		arch_dma_cache_sync(dev, vaddr, size, dir);
 	else if (ops->cache_sync)
 		ops->cache_sync(dev, vaddr, size, dir);
@@ -386,7 +403,7 @@ size_t dma_max_mapping_size(struct device *dev)
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	size_t size = SIZE_MAX;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		size = dma_direct_max_mapping_size(dev);
 	else if (ops && ops->max_mapping_size)
 		size = ops->max_mapping_size(dev);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device
@ 2019-11-13 13:37   ` Christoph Hellwig
  0 siblings, 0 replies; 94+ messages in thread
From: Christoph Hellwig @ 2019-11-13 13:37 UTC (permalink / raw)
  To: iommu, Alexey Kardashevskiy
  Cc: Greg Kroah-Hartman, linuxppc-dev, Robin Murphy, linux-kernel

Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  4 ++++
 include/linux/dma-mapping.h | 30 ++++++++++++++++++------------
 kernel/dma/mapping.c        | 35 ++++++++++++++++++++++++++---------
 3 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 297239a08bb7..b8a3b4ec46bd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1217,6 +1217,9 @@ struct dev_links_info {
  *              device.
  * @dma_coherent: this particular device is dma coherent, even if the
  *		architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ *		streaming DMA operations (->map_* / ->unmap_* / ->sync_*).
+ *		This flag is managed by the dma_ops from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -1316,6 +1319,7 @@ struct device {
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
 	bool			dma_coherent:1;
 #endif
+	bool			dma_ops_bypass : 1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4d450672b7d6..22fe74179e02 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -191,9 +191,15 @@ static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_map_direct(struct device *dev,
+		const struct dma_map_ops *ops)
 {
-	return likely(!ops);
+	return likely(!ops) || dev->dma_ops_bypass;
 }
 
 /*
@@ -282,7 +288,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
 	else
 		addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -297,7 +303,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_page(dev, addr, size, dir, attrs);
 	else if (ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
@@ -316,7 +322,7 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	int ents;
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
 	else
 		ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -334,7 +340,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_unmap_sg(dev, sg, nents, dir);
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
 	else if (ops->unmap_sg)
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -355,7 +361,7 @@ static inline dma_addr_t dma_map_resource(struct device *dev,
 	if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
 		return DMA_MAPPING_ERROR;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
 	else if (ops->map_resource)
 		addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
@@ -371,7 +377,7 @@ static inline void dma_unmap_resource(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (!dma_is_direct(ops) && ops->unmap_resource)
+	if (!dma_map_direct(dev, ops) && ops->unmap_resource)
 		ops->unmap_resource(dev, addr, size, dir, attrs);
 	debug_dma_unmap_resource(dev, addr, size, dir);
 }
@@ -383,7 +389,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 	else if (ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr, size, dir);
@@ -397,7 +403,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_single_for_device(dev, addr, size, dir);
 	else if (ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr, size, dir);
@@ -411,7 +417,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_cpu)
 		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
@@ -425,7 +431,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		dma_direct_sync_sg_for_device(dev, sg, nelems, dir);
 	else if (ops->sync_sg_for_device)
 		ops->sync_sg_for_device(dev, sg, nelems, dir);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 12ff766ec1fa..fdb6e16c1b00 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,6 +105,19 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 }
 EXPORT_SYMBOL(dmam_alloc_attrs);
 
+/*
+ * Check if the devices uses a direct mapping for DMA allocations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_alloc_direct(struct device *dev,
+		const struct dma_map_ops *ops)
+{
+	return likely(!ops) ||
+		(dev->dma_ops_bypass &&
+		 dma_direct_supported(dev, dev->coherent_dma_mask));
+}
+
 /*
  * Create scatter-list for the already allocated DMA buffer.
  */
@@ -138,7 +151,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr,
 				size, attrs);
 	if (!ops->get_sgtable)
@@ -206,7 +219,7 @@ bool dma_can_mmap(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_can_mmap(dev);
 	return ops->mmap != NULL;
 }
@@ -231,7 +244,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size,
 				attrs);
 	if (!ops->mmap)
@@ -244,7 +257,7 @@ u64 dma_get_required_mask(struct device *dev)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		return dma_direct_get_required_mask(dev);
 	if (ops->get_required_mask)
 		return ops->get_required_mask(dev);
@@ -275,7 +288,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
 	/* let the implementation decide on the zone to allocate from: */
 	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
 	else if (ops->alloc)
 		cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
@@ -307,7 +320,7 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
 		return;
 
 	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		dma_direct_free(dev, size, cpu_addr, dma_handle, attrs);
 	else if (ops->free)
 		ops->free(dev, size, cpu_addr, dma_handle, attrs);
@@ -318,7 +331,11 @@ int dma_supported(struct device *dev, u64 mask)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
-	if (dma_is_direct(ops))
+	/*
+	 * Only call the dma-direct version if we really do not have any ops
+	 * set, as the dma_supported op will set the dma_ops_bypass flag.
+	 */
+	if (!ops)
 		return dma_direct_supported(dev, mask);
 	if (!ops->dma_supported)
 		return 1;
@@ -374,7 +391,7 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 
 	BUG_ON(!valid_dma_direction(dir));
 
-	if (dma_is_direct(ops))
+	if (dma_alloc_direct(dev, ops))
 		arch_dma_cache_sync(dev, vaddr, size, dir);
 	else if (ops->cache_sync)
 		ops->cache_sync(dev, vaddr, size, dir);
@@ -386,7 +403,7 @@ size_t dma_max_mapping_size(struct device *dev)
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	size_t size = SIZE_MAX;
 
-	if (dma_is_direct(ops))
+	if (dma_map_direct(dev, ops))
 		size = dma_direct_max_mapping_size(dev);
 	else if (ops && ops->max_mapping_size)
 		size = ops->max_mapping_size(dev);
-- 
2.20.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2020-04-14  6:32 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-20 14:16 generic DMA bypass flag v2 Christoph Hellwig
2020-03-20 14:16 ` Christoph Hellwig
2020-03-20 14:16 ` Christoph Hellwig
2020-03-20 14:16 ` [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device Christoph Hellwig
2020-03-20 14:16   ` Christoph Hellwig
2020-03-20 14:16   ` Christoph Hellwig
2020-03-20 15:02   ` Greg Kroah-Hartman
2020-03-20 15:02     ` Greg Kroah-Hartman
2020-03-20 15:02     ` Greg Kroah-Hartman
2020-03-23  1:28   ` Alexey Kardashevskiy
2020-03-23  1:28     ` Alexey Kardashevskiy
2020-03-23  1:28     ` Alexey Kardashevskiy
2020-03-23  8:37     ` Christoph Hellwig
2020-03-23  8:37       ` Christoph Hellwig
2020-03-23  8:37       ` Christoph Hellwig
2020-03-23  8:50       ` Christoph Hellwig
2020-03-23  8:50         ` Christoph Hellwig
2020-03-23  8:50         ` Christoph Hellwig
2020-03-23 15:37         ` Aneesh Kumar K.V
2020-03-23 15:37           ` Aneesh Kumar K.V
2020-03-23 15:37           ` Aneesh Kumar K.V
2020-03-23 17:22           ` Christoph Hellwig
2020-03-23 17:22             ` Christoph Hellwig
2020-03-23 17:22             ` Christoph Hellwig
2020-03-24  3:05             ` Alexey Kardashevskiy
2020-03-24  3:05               ` Alexey Kardashevskiy
2020-03-24  3:05               ` Alexey Kardashevskiy
2020-03-24  6:30               ` Aneesh Kumar K.V
2020-03-24  6:30                 ` Aneesh Kumar K.V
2020-03-24  6:30                 ` Aneesh Kumar K.V
2020-03-24  7:55                 ` Christoph Hellwig
2020-03-24  7:55                   ` Christoph Hellwig
2020-03-24  7:55                   ` Christoph Hellwig
2020-03-24  7:54               ` Christoph Hellwig
2020-03-24  7:54                 ` Christoph Hellwig
2020-03-24  7:54                 ` Christoph Hellwig
2020-03-25  4:51                 ` Alexey Kardashevskiy
2020-03-25  4:51                   ` Alexey Kardashevskiy
2020-03-25  4:51                   ` Alexey Kardashevskiy
2020-03-25  8:37                   ` Christoph Hellwig
2020-03-25  8:37                     ` Christoph Hellwig
2020-03-25  8:37                     ` Christoph Hellwig
2020-03-26  1:26                     ` Alexey Kardashevskiy
2020-03-26  1:26                       ` Alexey Kardashevskiy
2020-03-26  1:26                       ` Alexey Kardashevskiy
2020-04-03  8:38                       ` Alexey Kardashevskiy
2020-04-03  8:38                         ` Alexey Kardashevskiy
2020-04-03  8:38                         ` Alexey Kardashevskiy
2020-04-06 11:50                         ` Christoph Hellwig
2020-04-06 11:50                           ` Christoph Hellwig
2020-04-06 11:50                           ` Christoph Hellwig
2020-04-06 13:25                           ` Alexey Kardashevskiy
2020-04-06 13:25                             ` Alexey Kardashevskiy
2020-04-06 13:25                             ` Alexey Kardashevskiy
2020-04-06 17:17                             ` Christoph Hellwig
2020-04-06 17:17                               ` Christoph Hellwig
2020-04-06 17:17                               ` Christoph Hellwig
2020-04-07 10:12                               ` Alexey Kardashevskiy
2020-04-07 10:12                                 ` Alexey Kardashevskiy
2020-04-07 10:12                                 ` Alexey Kardashevskiy
2020-04-14  6:21                                 ` Alexey Kardashevskiy
2020-04-14  6:21                                   ` Alexey Kardashevskiy
2020-04-14  6:21                                   ` Alexey Kardashevskiy
2020-04-14  6:30                                   ` Christoph Hellwig
2020-04-14  6:30                                     ` Christoph Hellwig
2020-04-14  6:30                                     ` Christoph Hellwig
2020-03-23  8:58       ` Alexey Kardashevskiy
2020-03-23  8:58         ` Alexey Kardashevskiy
2020-03-23  8:58         ` Alexey Kardashevskiy
2020-03-23 17:20         ` Christoph Hellwig
2020-03-23 17:20           ` Christoph Hellwig
2020-03-23 17:20           ` Christoph Hellwig
2020-03-24  3:37           ` Alexey Kardashevskiy
2020-03-24  3:37             ` Alexey Kardashevskiy
2020-03-24  3:37             ` Alexey Kardashevskiy
2020-03-24  4:55             ` Alexey Kardashevskiy
2020-03-24  4:55               ` Alexey Kardashevskiy
2020-03-24  4:55               ` Alexey Kardashevskiy
2020-03-24  7:52             ` Christoph Hellwig
2020-03-24  7:52               ` Christoph Hellwig
2020-03-24  7:52               ` Christoph Hellwig
2020-03-23 12:14   ` Robin Murphy
2020-03-23 12:14     ` Robin Murphy
2020-03-23 12:14     ` Robin Murphy
2020-03-23 12:55     ` Christoph Hellwig
2020-03-23 12:55       ` Christoph Hellwig
2020-03-23 12:55       ` Christoph Hellwig
2020-03-20 14:16 ` [PATCH 2/2] powerpc: use the generic dma_ops_bypass mode Christoph Hellwig
2020-03-20 14:16   ` Christoph Hellwig
2020-03-20 14:16   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2020-03-24  9:39 [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to, struct device Christian Zigotzky
2019-11-13 13:37 generic DMA bypass flag Christoph Hellwig
2019-11-13 13:37 ` [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device Christoph Hellwig
2019-11-13 13:37   ` Christoph Hellwig
2019-11-13 13:37   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.