linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* small powerpc DMA fixes and cleanups
@ 2018-12-16 17:19 Christoph Hellwig
  2018-12-16 17:19 ` [PATCH 1/8] powerpc: allow NOT_COHERENT_CACHE for amigaone Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

Hi all,

this series has a few smaller fixes and cleanups for the powerpc
DMA code.

The biggest changes are related to the not cache coherent DMA
support, which as far as I can tell was not consistent in its
cache maintainance operations.  Also the AMCC ppc44x crypto
driver looks like it was affected by that (or otherwise just
odd) and used a helper internal to the powerpc DMA code to
work around that fact.  I've added Christian Lamparter who has
been trying to beat that driver into shape in the last years
for that reason.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/8] powerpc: allow NOT_COHERENT_CACHE for amigaone
  2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
@ 2018-12-16 17:19 ` Christoph Hellwig
  2018-12-22  9:54   ` [1/8] " Michael Ellerman
  2018-12-16 17:19 ` [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

AMIGAONE selects NOT_COHERENT_CACHE, so we better allow it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/platforms/Kconfig.cputype | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 7e130035bbba..ab176fd3dfb5 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -400,7 +400,8 @@ config NR_CPUS
 
 config NOT_COHERENT_CACHE
 	bool
-	depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
+	depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || \
+		GAMECUBE_COMMON || AMIGAONE
 	default n if PPC_47x
 	default y
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods
  2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
  2018-12-16 17:19 ` [PATCH 1/8] powerpc: allow NOT_COHERENT_CACHE for amigaone Christoph Hellwig
@ 2018-12-16 17:19 ` Christoph Hellwig
  2018-12-17  6:41   ` Christophe Leroy
  2018-12-17 11:37   ` Michael Ellerman
  2018-12-16 17:19 ` [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

The unmap methods need to transfer memory ownership back from the device
to the cpu by identical means as dma_sync_*_to_cpu.  I'm not sure powerpc
needs to do any work in this transfer direction, but given that it does
invalidate the caches in dma_sync_*_to_cpu already we should make sure
we also do so on unmapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/kernel/dma.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index dbfc7056d7df..d442d23e182b 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -210,10 +210,15 @@ static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
 	return nents;
 }
 
-static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sg,
+static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sgl,
 				int nents, enum dma_data_direction direction,
 				unsigned long attrs)
 {
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sgl, sg, nents, i)
+		__dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
 }
 
 static u64 dma_nommu_get_required_mask(struct device *dev)
@@ -247,6 +252,8 @@ static inline void dma_nommu_unmap_page(struct device *dev,
 					 enum dma_data_direction direction,
 					 unsigned long attrs)
 {
+	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		__dma_sync(bus_to_virt(dma_address), size, dir);
 }
 
 #ifdef CONFIG_NOT_COHERENT_CACHE
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page
  2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
  2018-12-16 17:19 ` [PATCH 1/8] powerpc: allow NOT_COHERENT_CACHE for amigaone Christoph Hellwig
  2018-12-16 17:19 ` [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods Christoph Hellwig
@ 2018-12-16 17:19 ` Christoph Hellwig
  2018-12-16 18:28   ` Christian Lamparter
  2018-12-19 10:13   ` Christian Lamparter
  2018-12-16 17:19 ` [PATCH 4/8] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

This function is internal to the DMA API implementation.  Instead use the
DMA API to properly unmap.  Note that the DMA API usage in this driver
is a disaster and urgently needs some work - it is missing all the unmaps,
seems to do a secondary map where it looks like it should to a unmap
in one place to work around cache coherency and the directions passed in
seem to be partially wrong.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/crypto/amcc/crypto4xx_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/amcc/crypto4xx_core.c b/drivers/crypto/amcc/crypto4xx_core.c
index 6eaec9ba0f68..63cb6956c948 100644
--- a/drivers/crypto/amcc/crypto4xx_core.c
+++ b/drivers/crypto/amcc/crypto4xx_core.c
@@ -596,7 +596,7 @@ static void crypto4xx_aead_done(struct crypto4xx_device *dev,
 					  pd->pd_ctl_len.bf.pkt_len,
 					  dst);
 	} else {
-		__dma_sync_page(sg_page(dst), dst->offset, dst->length,
+		dma_unmap_page(dev->core_dev->device, pd->dest, dst->length,
 				DMA_FROM_DEVICE);
 	}
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/8] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define
  2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-12-16 17:19 ` [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page Christoph Hellwig
@ 2018-12-16 17:19 ` Christoph Hellwig
  2018-12-16 17:19 ` [PATCH 5/8] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index 8fa394520af6..f2a4a7142b1e 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -112,7 +112,5 @@ extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
 extern u64 __dma_get_required_mask(struct device *dev);
 
-#define ARCH_HAS_DMA_MMAP_COHERENT
-
 #endif /* __KERNEL__ */
 #endif	/* _ASM_DMA_MAPPING_H */
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/8] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export
  2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-12-16 17:19 ` [PATCH 4/8] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define Christoph Hellwig
@ 2018-12-16 17:19 ` Christoph Hellwig
  2018-12-16 17:19 ` [PATCH 6/8] powerpc/dma: remove the unused dma_iommu_ops export Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/setup_32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index be6ee8dec98d..947f904688b0 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -59,7 +59,6 @@ unsigned long ISA_DMA_THRESHOLD;
 unsigned int DMA_MODE_READ;
 unsigned int DMA_MODE_WRITE;
 
-EXPORT_SYMBOL(ISA_DMA_THRESHOLD);
 EXPORT_SYMBOL(DMA_MODE_READ);
 EXPORT_SYMBOL(DMA_MODE_WRITE);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/8] powerpc/dma: remove the unused dma_iommu_ops export
  2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-12-16 17:19 ` [PATCH 5/8] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export Christoph Hellwig
@ 2018-12-16 17:19 ` Christoph Hellwig
  2018-12-16 17:19 ` [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations Christoph Hellwig
  2018-12-16 17:19 ` [PATCH 8/8] cxl: drop the dma_set_mask callback from vphb Christoph Hellwig
  7 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/powerpc/kernel/dma-iommu.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index f9fe2080ceb9..2ca6cfaebf65 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -6,7 +6,6 @@
  * busses using the iommu infrastructure
  */
 
-#include <linux/export.h>
 #include <asm/iommu.h>
 
 /*
@@ -123,4 +122,3 @@ struct dma_map_ops dma_iommu_ops = {
 	.get_required_mask	= dma_iommu_get_required_mask,
 	.mapping_error		= dma_iommu_mapping_error,
 };
-EXPORT_SYMBOL(dma_iommu_ops);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations
  2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-12-16 17:19 ` [PATCH 6/8] powerpc/dma: remove the unused dma_iommu_ops export Christoph Hellwig
@ 2018-12-16 17:19 ` Christoph Hellwig
  2018-12-17  6:51   ` Christophe Leroy
  2018-12-16 17:19 ` [PATCH 8/8] cxl: drop the dma_set_mask callback from vphb Christoph Hellwig
  7 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
any code with the one for systems with coherent caches.  Split it off
and merge it with the helpers in dma-noncoherent.c that have no other
callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/dma-mapping.h |  5 -----
 arch/powerpc/kernel/dma.c              | 14 ++------------
 arch/powerpc/mm/dma-noncoherent.c      | 15 +++++++--------
 arch/powerpc/platforms/44x/warp.c      |  2 +-
 4 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index f2a4a7142b1e..dacd0f93f2b2 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -39,9 +39,6 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
  * to ensure it is consistent.
  */
 struct device;
-extern void *__dma_alloc_coherent(struct device *dev, size_t size,
-				  dma_addr_t *handle, gfp_t gfp);
-extern void __dma_free_coherent(size_t size, void *vaddr);
 extern void __dma_sync(void *vaddr, size_t size, int direction);
 extern void __dma_sync_page(struct page *page, unsigned long offset,
 				 size_t size, int direction);
@@ -52,8 +49,6 @@ extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr);
  * Cache coherent cores.
  */
 
-#define __dma_alloc_coherent(dev, gfp, size, handle)	NULL
-#define __dma_free_coherent(size, addr)		((void)0)
 #define __dma_sync(addr, size, rw)		((void)0)
 #define __dma_sync_page(pg, off, sz, rw)	((void)0)
 
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index d442d23e182b..270b2911c437 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -62,18 +62,12 @@ static int dma_nommu_dma_supported(struct device *dev, u64 mask)
 #endif
 }
 
+#ifndef CONFIG_NOT_COHERENT_CACHE
 void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
 				  dma_addr_t *dma_handle, gfp_t flag,
 				  unsigned long attrs)
 {
 	void *ret;
-#ifdef CONFIG_NOT_COHERENT_CACHE
-	ret = __dma_alloc_coherent(dev, size, dma_handle, flag);
-	if (ret == NULL)
-		return NULL;
-	*dma_handle += get_dma_offset(dev);
-	return ret;
-#else
 	struct page *page;
 	int node = dev_to_node(dev);
 #ifdef CONFIG_FSL_SOC
@@ -113,19 +107,15 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
 	*dma_handle = __pa(ret) + get_dma_offset(dev);
 
 	return ret;
-#endif
 }
 
 void __dma_nommu_free_coherent(struct device *dev, size_t size,
 				void *vaddr, dma_addr_t dma_handle,
 				unsigned long attrs)
 {
-#ifdef CONFIG_NOT_COHERENT_CACHE
-	__dma_free_coherent(size, vaddr);
-#else
 	free_pages((unsigned long)vaddr, get_order(size));
-#endif
 }
+#endif /* !CONFIG_NOT_COHERENT_CACHE */
 
 static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
 				       dma_addr_t *dma_handle, gfp_t flag,
diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
index b6e7b5952ab5..e955539686a4 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -29,7 +29,7 @@
 #include <linux/string.h>
 #include <linux/types.h>
 #include <linux/highmem.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-direct.h>
 #include <linux/export.h>
 
 #include <asm/tlbflush.h>
@@ -151,8 +151,8 @@ static struct ppc_vm_region *ppc_vm_region_find(struct ppc_vm_region *head, unsi
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-__dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
+		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
 	struct page *page;
 	struct ppc_vm_region *c;
@@ -223,7 +223,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t
 		/*
 		 * Set the "dma handle"
 		 */
-		*handle = page_to_phys(page);
+		*dma_handle = phys_to_dma(dev, page_to_phys(page));
 
 		do {
 			SetPageReserved(page);
@@ -249,12 +249,12 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t
  no_page:
 	return NULL;
 }
-EXPORT_SYMBOL(__dma_alloc_coherent);
 
 /*
  * free a page as defined by the above mapping.
  */
-void __dma_free_coherent(size_t size, void *vaddr)
+void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
+		dma_addr_t dma_handle, unsigned long attrs)
 {
 	struct ppc_vm_region *c;
 	unsigned long flags, addr;
@@ -309,7 +309,6 @@ void __dma_free_coherent(size_t size, void *vaddr)
 	       __func__, vaddr);
 	dump_stack();
 }
-EXPORT_SYMBOL(__dma_free_coherent);
 
 /*
  * make an area consistent.
@@ -401,7 +400,7 @@ EXPORT_SYMBOL(__dma_sync_page);
 
 /*
  * Return the PFN for a given cpu virtual address returned by
- * __dma_alloc_coherent. This is used by dma_mmap_coherent()
+ * __dma_nommu_alloc_coherent. This is used by dma_mmap_coherent()
  */
 unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr)
 {
diff --git a/arch/powerpc/platforms/44x/warp.c b/arch/powerpc/platforms/44x/warp.c
index a886c2c22097..7e4f8ca19ce8 100644
--- a/arch/powerpc/platforms/44x/warp.c
+++ b/arch/powerpc/platforms/44x/warp.c
@@ -47,7 +47,7 @@ static int __init warp_probe(void)
 	if (!of_machine_is_compatible("pika,warp"))
 		return 0;
 
-	/* For __dma_alloc_coherent */
+	/* For __dma_nommu_alloc_coherent */
 	ISA_DMA_THRESHOLD = ~0L;
 
 	return 1;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 8/8] cxl: drop the dma_set_mask callback from vphb
  2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-12-16 17:19 ` [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations Christoph Hellwig
@ 2018-12-16 17:19 ` Christoph Hellwig
  2018-12-16 23:10   ` Andrew Donnellan
  7 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-16 17:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

The CXL code never even looks at the dma mask, so there is no good
reason for this sanity check.  Remove it because it gets in the way
of the dma ops refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/misc/cxl/vphb.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index 7908633d9204..49da2f744bbf 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -11,17 +11,6 @@
 #include <misc/cxl.h>
 #include "cxl.h"
 
-static int cxl_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
-{
-	if (dma_mask < DMA_BIT_MASK(64)) {
-		pr_info("%s only 64bit DMA supported on CXL", __func__);
-		return -EIO;
-	}
-
-	*(pdev->dev.dma_mask) = dma_mask;
-	return 0;
-}
-
 static int cxl_pci_probe_mode(struct pci_bus *bus)
 {
 	return PCI_PROBE_NORMAL;
@@ -220,7 +209,6 @@ static struct pci_controller_ops cxl_pci_controller_ops =
 	.reset_secondary_bus = cxl_pci_reset_secondary_bus,
 	.setup_msi_irqs = cxl_setup_msi_irqs,
 	.teardown_msi_irqs = cxl_teardown_msi_irqs,
-	.dma_set_mask = cxl_dma_set_mask,
 };
 
 int cxl_pci_vphb_add(struct cxl_afu *afu)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page
  2018-12-16 17:19 ` [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page Christoph Hellwig
@ 2018-12-16 18:28   ` Christian Lamparter
  2018-12-17  7:27     ` Christoph Hellwig
  2018-12-19 10:13   ` Christian Lamparter
  1 sibling, 1 reply; 22+ messages in thread
From: Christian Lamparter @ 2018-12-16 18:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: David Miller, Paul Mackerras, linuxppc-dev

On Sunday, December 16, 2018 6:19:46 PM CET Christoph Hellwig wrote:
> This function is internal to the DMA API implementation.  Instead use the
> DMA API to properly unmap.  Note that the DMA API usage in this driver
> is a disaster and urgently needs some work - it is missing all the unmaps,
> seems to do a secondary map where it looks like it should to a unmap
> in one place to work around cache coherency and the directions passed in
> seem to be partially wrong.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/crypto/amcc/crypto4xx_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/crypto/amcc/crypto4xx_core.c b/drivers/crypto/amcc/crypto4xx_core.c
> index 6eaec9ba0f68..63cb6956c948 100644
> --- a/drivers/crypto/amcc/crypto4xx_core.c
> +++ b/drivers/crypto/amcc/crypto4xx_core.c
> @@ -596,7 +596,7 @@ static void crypto4xx_aead_done(struct crypto4xx_device *dev,
>  					  pd->pd_ctl_len.bf.pkt_len,
>  					  dst);
>  	} else {
> -		__dma_sync_page(sg_page(dst), dst->offset, dst->length,
> +		dma_unmap_page(dev->core_dev->device, pd->dest, dst->length,
>  				DMA_FROM_DEVICE);
>  	}
>  
> 
Yeap, the crypto4xx is a real piece of work. However, ibm emac network driver
is even worse:
|/*
| * Lack of dma_unmap_???? calls is intentional.
| *
| * API-correct usage requires additional support state information to be
| * maintained for every RX and TX buffer descriptor (BD). Unfortunately, due to
| * EMAC design (e.g. TX buffer passed from network stack can be split into
| * several BDs, dma_map_single/dma_map_page can be used to map particular BD),
| * maintaining such information will add additional overhead.
| * Current DMA API implementation for 4xx processors only ensures cache coherency
| * and dma_unmap_???? routines are empty and are likely to stay this way.
| * I decided to omit dma_unmap_??? calls because I don't want to add additional
| * complexity just for the sake of following some abstract API, when it doesn't
| * add any real benefit to the driver. I understand that this decision maybe
| * controversial, but I really tried to make code API-correct and efficient
| * at the same time and didn't come up with code I liked :(.                --ebs
| */
<https://elixir.bootlin.com/linux/v4.20-rc6/source/drivers/net/ethernet/ibm/emac/core.c#L58>

Problem is, I can't really enable the DMA_DEBUG because every PPC4XX/APM82181
device I have is some sort of network appliance. So a proper test would require
to fix the emac driver first since DMA_DEBUG will disable itself.

Regards,
Christian

PS: Since it looks like you are located in Germany as well:
If you are interested (PM me), I could just mail you a "MyBook Live" 
NAS Kit (PCB, Shell, Screws). All you would need: a SATA 3,5" HDD and
a standard 12V PSU with a 5.5mm plug. I maintain a active OpenWrt port
for the device: 
http://downloads.openwrt.org/snapshots/targets/apm821xx/sata/



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 8/8] cxl: drop the dma_set_mask callback from vphb
  2018-12-16 17:19 ` [PATCH 8/8] cxl: drop the dma_set_mask callback from vphb Christoph Hellwig
@ 2018-12-16 23:10   ` Andrew Donnellan
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Donnellan @ 2018-12-16 23:10 UTC (permalink / raw)
  To: Christoph Hellwig, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter

On 17/12/18 4:19 am, Christoph Hellwig wrote:
> The CXL code never even looks at the dma mask, so there is no good
> reason for this sanity check.  Remove it because it gets in the way
> of the dma ops refactoring.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>   drivers/misc/cxl/vphb.c | 12 ------------
>   1 file changed, 12 deletions(-)
> 
> diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
> index 7908633d9204..49da2f744bbf 100644
> --- a/drivers/misc/cxl/vphb.c
> +++ b/drivers/misc/cxl/vphb.c
> @@ -11,17 +11,6 @@
>   #include <misc/cxl.h>
>   #include "cxl.h"
>   
> -static int cxl_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
> -{
> -	if (dma_mask < DMA_BIT_MASK(64)) {
> -		pr_info("%s only 64bit DMA supported on CXL", __func__);
> -		return -EIO;
> -	}
> -
> -	*(pdev->dev.dma_mask) = dma_mask;
> -	return 0;
> -}
> -
>   static int cxl_pci_probe_mode(struct pci_bus *bus)
>   {
>   	return PCI_PROBE_NORMAL;
> @@ -220,7 +209,6 @@ static struct pci_controller_ops cxl_pci_controller_ops =
>   	.reset_secondary_bus = cxl_pci_reset_secondary_bus,
>   	.setup_msi_irqs = cxl_setup_msi_irqs,
>   	.teardown_msi_irqs = cxl_teardown_msi_irqs,
> -	.dma_set_mask = cxl_dma_set_mask,
>   };
>   
>   int cxl_pci_vphb_add(struct cxl_afu *afu)
> 

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods
  2018-12-16 17:19 ` [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods Christoph Hellwig
@ 2018-12-17  6:41   ` Christophe Leroy
  2018-12-17  7:34     ` Christoph Hellwig
  2018-12-17 11:37   ` Michael Ellerman
  1 sibling, 1 reply; 22+ messages in thread
From: Christophe Leroy @ 2018-12-17  6:41 UTC (permalink / raw)
  To: Christoph Hellwig, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter



Le 16/12/2018 à 18:19, Christoph Hellwig a écrit :
> The unmap methods need to transfer memory ownership back from the device
> to the cpu by identical means as dma_sync_*_to_cpu.  I'm not sure powerpc
> needs to do any work in this transfer direction, but given that it does
> invalidate the caches in dma_sync_*_to_cpu already we should make sure
> we also do so on unmapping.

Why do we have to do that on unmapping ? If we are unmapping it means we 
are retiring the area, so why would we need to use CPU for syncing an 
area than won't be used anymore ?

Christophe


> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   arch/powerpc/kernel/dma.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index dbfc7056d7df..d442d23e182b 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -210,10 +210,15 @@ static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
>   	return nents;
>   }
>   
> -static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sg,
> +static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sgl,
>   				int nents, enum dma_data_direction direction,
>   				unsigned long attrs)
>   {
> +	struct scatterlist *sg;
> +	int i;
> +
> +	for_each_sg(sgl, sg, nents, i)
> +		__dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
>   }
>   
>   static u64 dma_nommu_get_required_mask(struct device *dev)
> @@ -247,6 +252,8 @@ static inline void dma_nommu_unmap_page(struct device *dev,
>   					 enum dma_data_direction direction,
>   					 unsigned long attrs)
>   {
> +	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> +		__dma_sync(bus_to_virt(dma_address), size, dir);
>   }
>   
>   #ifdef CONFIG_NOT_COHERENT_CACHE
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations
  2018-12-16 17:19 ` [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations Christoph Hellwig
@ 2018-12-17  6:51   ` Christophe Leroy
  2018-12-17  7:35     ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Christophe Leroy @ 2018-12-17  6:51 UTC (permalink / raw)
  To: Christoph Hellwig, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, Christian Lamparter



Le 16/12/2018 à 18:19, Christoph Hellwig a écrit :
> The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
> any code with the one for systems with coherent caches.  Split it off
> and merge it with the helpers in dma-noncoherent.c that have no other
> callers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>   arch/powerpc/include/asm/dma-mapping.h |  5 -----
>   arch/powerpc/kernel/dma.c              | 14 ++------------

Instead of all the ifdefs in dma.c, couldn't we split it
in two files, ie dma.c for common parts and dma-coherence.c for specific 
stuff ?

Christophe

>   arch/powerpc/mm/dma-noncoherent.c      | 15 +++++++--------
>   arch/powerpc/platforms/44x/warp.c      |  2 +-
>   4 files changed, 10 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> index f2a4a7142b1e..dacd0f93f2b2 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -39,9 +39,6 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
>    * to ensure it is consistent.
>    */
>   struct device;
> -extern void *__dma_alloc_coherent(struct device *dev, size_t size,
> -				  dma_addr_t *handle, gfp_t gfp);
> -extern void __dma_free_coherent(size_t size, void *vaddr);
>   extern void __dma_sync(void *vaddr, size_t size, int direction);
>   extern void __dma_sync_page(struct page *page, unsigned long offset,
>   				 size_t size, int direction);
> @@ -52,8 +49,6 @@ extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr);
>    * Cache coherent cores.
>    */
>   
> -#define __dma_alloc_coherent(dev, gfp, size, handle)	NULL
> -#define __dma_free_coherent(size, addr)		((void)0)
>   #define __dma_sync(addr, size, rw)		((void)0)
>   #define __dma_sync_page(pg, off, sz, rw)	((void)0)
>   
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index d442d23e182b..270b2911c437 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -62,18 +62,12 @@ static int dma_nommu_dma_supported(struct device *dev, u64 mask)
>   #endif
>   }
>   
> +#ifndef CONFIG_NOT_COHERENT_CACHE
>   void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
>   				  dma_addr_t *dma_handle, gfp_t flag,
>   				  unsigned long attrs)
>   {
>   	void *ret;
> -#ifdef CONFIG_NOT_COHERENT_CACHE
> -	ret = __dma_alloc_coherent(dev, size, dma_handle, flag);
> -	if (ret == NULL)
> -		return NULL;
> -	*dma_handle += get_dma_offset(dev);
> -	return ret;
> -#else
>   	struct page *page;
>   	int node = dev_to_node(dev);
>   #ifdef CONFIG_FSL_SOC
> @@ -113,19 +107,15 @@ void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
>   	*dma_handle = __pa(ret) + get_dma_offset(dev);
>   
>   	return ret;
> -#endif
>   }
>   
>   void __dma_nommu_free_coherent(struct device *dev, size_t size,
>   				void *vaddr, dma_addr_t dma_handle,
>   				unsigned long attrs)
>   {
> -#ifdef CONFIG_NOT_COHERENT_CACHE
> -	__dma_free_coherent(size, vaddr);
> -#else
>   	free_pages((unsigned long)vaddr, get_order(size));
> -#endif
>   }
> +#endif /* !CONFIG_NOT_COHERENT_CACHE */
>   
>   static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
>   				       dma_addr_t *dma_handle, gfp_t flag,
> diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
> index b6e7b5952ab5..e955539686a4 100644
> --- a/arch/powerpc/mm/dma-noncoherent.c
> +++ b/arch/powerpc/mm/dma-noncoherent.c
> @@ -29,7 +29,7 @@
>   #include <linux/string.h>
>   #include <linux/types.h>
>   #include <linux/highmem.h>
> -#include <linux/dma-mapping.h>
> +#include <linux/dma-direct.h>
>   #include <linux/export.h>
>   
>   #include <asm/tlbflush.h>
> @@ -151,8 +151,8 @@ static struct ppc_vm_region *ppc_vm_region_find(struct ppc_vm_region *head, unsi
>    * Allocate DMA-coherent memory space and return both the kernel remapped
>    * virtual and bus address for that space.
>    */
> -void *
> -__dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
> +void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
> +		dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
>   {
>   	struct page *page;
>   	struct ppc_vm_region *c;
> @@ -223,7 +223,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t
>   		/*
>   		 * Set the "dma handle"
>   		 */
> -		*handle = page_to_phys(page);
> +		*dma_handle = phys_to_dma(dev, page_to_phys(page));
>   
>   		do {
>   			SetPageReserved(page);
> @@ -249,12 +249,12 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t
>    no_page:
>   	return NULL;
>   }
> -EXPORT_SYMBOL(__dma_alloc_coherent);
>   
>   /*
>    * free a page as defined by the above mapping.
>    */
> -void __dma_free_coherent(size_t size, void *vaddr)
> +void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
> +		dma_addr_t dma_handle, unsigned long attrs)
>   {
>   	struct ppc_vm_region *c;
>   	unsigned long flags, addr;
> @@ -309,7 +309,6 @@ void __dma_free_coherent(size_t size, void *vaddr)
>   	       __func__, vaddr);
>   	dump_stack();
>   }
> -EXPORT_SYMBOL(__dma_free_coherent);
>   
>   /*
>    * make an area consistent.
> @@ -401,7 +400,7 @@ EXPORT_SYMBOL(__dma_sync_page);
>   
>   /*
>    * Return the PFN for a given cpu virtual address returned by
> - * __dma_alloc_coherent. This is used by dma_mmap_coherent()
> + * __dma_nommu_alloc_coherent. This is used by dma_mmap_coherent()
>    */
>   unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr)
>   {
> diff --git a/arch/powerpc/platforms/44x/warp.c b/arch/powerpc/platforms/44x/warp.c
> index a886c2c22097..7e4f8ca19ce8 100644
> --- a/arch/powerpc/platforms/44x/warp.c
> +++ b/arch/powerpc/platforms/44x/warp.c
> @@ -47,7 +47,7 @@ static int __init warp_probe(void)
>   	if (!of_machine_is_compatible("pika,warp"))
>   		return 0;
>   
> -	/* For __dma_alloc_coherent */
> +	/* For __dma_nommu_alloc_coherent */
>   	ISA_DMA_THRESHOLD = ~0L;
>   
>   	return 1;
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page
  2018-12-16 18:28   ` Christian Lamparter
@ 2018-12-17  7:27     ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-17  7:27 UTC (permalink / raw)
  To: Christian Lamparter
  Cc: Christoph Hellwig, Paul Mackerras, linuxppc-dev, David Miller

On Sun, Dec 16, 2018 at 07:28:32PM +0100, Christian Lamparter wrote:
> Yeap, the crypto4xx is a real piece of work. However, ibm emac network driver
> is even worse:

Oh, fun.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods
  2018-12-17  6:41   ` Christophe Leroy
@ 2018-12-17  7:34     ` Christoph Hellwig
  2018-12-17  7:39       ` Christophe Leroy
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-17  7:34 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Christian Lamparter, Paul Mackerras, linuxppc-dev, Christoph Hellwig

On Mon, Dec 17, 2018 at 07:41:54AM +0100, Christophe Leroy wrote:
>
>
> Le 16/12/2018 à 18:19, Christoph Hellwig a écrit :
>> The unmap methods need to transfer memory ownership back from the device
>> to the cpu by identical means as dma_sync_*_to_cpu.  I'm not sure powerpc
>> needs to do any work in this transfer direction, but given that it does
>> invalidate the caches in dma_sync_*_to_cpu already we should make sure
>> we also do so on unmapping.
>
> Why do we have to do that on unmapping ? If we are unmapping it means we 
> are retiring the area, so why would we need to use CPU for syncing an area 
> than won't be used anymore ?

So in general we need the cache maintaince only at map time if the CPUs
gurantee to never ѕpeculate into memory that might be under DMA, which
for modern CPUs is increasingly rate.  If the CPUs could speculate into
the area that was DMA mapped we need to do another round of cache
maintainance on unmap to make sure we really do not have any data
in the cache.  powerpc currently does that for dma_sync_*_to_cpu, but
not for the unmap case, which not only looks odd but also seems to be
worked around in drivers (see the ppc44x crypto patch).

Note that there are some way to optimize the amount of cache flushing
done even when the cpu speculates, see:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arc/mm/dma.c#n93

But for that I need help with people that actually understand the
non-coherent powerpc SOCs and who can test it.  For now this patch
is conservative.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations
  2018-12-17  6:51   ` Christophe Leroy
@ 2018-12-17  7:35     ` Christoph Hellwig
  2018-12-17 21:34       ` Gerhard Pircher
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-17  7:35 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Christian Lamparter, Paul Mackerras, linuxppc-dev, Christoph Hellwig

On Mon, Dec 17, 2018 at 07:51:05AM +0100, Christophe Leroy wrote:
>
>
> Le 16/12/2018 à 18:19, Christoph Hellwig a écrit :
>> The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
>> any code with the one for systems with coherent caches.  Split it off
>> and merge it with the helpers in dma-noncoherent.c that have no other
>> callers.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> ---
>>   arch/powerpc/include/asm/dma-mapping.h |  5 -----
>>   arch/powerpc/kernel/dma.c              | 14 ++------------
>
> Instead of all the ifdefs in dma.c, couldn't we split it
> in two files, ie dma.c for common parts and dma-coherence.c for specific 
> stuff ?

The end goal is to kill dma.c and keep dma-noncoherent.c only with most
of the code moving to common code.  Here is the current state of that:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.5

But it still has issues on two tested platforms and isn't ready yet.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods
  2018-12-17  7:34     ` Christoph Hellwig
@ 2018-12-17  7:39       ` Christophe Leroy
  2018-12-17  8:16         ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Christophe Leroy @ 2018-12-17  7:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Christian Lamparter, Paul Mackerras, linuxppc-dev



Le 17/12/2018 à 08:34, Christoph Hellwig a écrit :
> On Mon, Dec 17, 2018 at 07:41:54AM +0100, Christophe Leroy wrote:
>>
>>
>> Le 16/12/2018 à 18:19, Christoph Hellwig a écrit :
>>> The unmap methods need to transfer memory ownership back from the device
>>> to the cpu by identical means as dma_sync_*_to_cpu.  I'm not sure powerpc
>>> needs to do any work in this transfer direction, but given that it does
>>> invalidate the caches in dma_sync_*_to_cpu already we should make sure
>>> we also do so on unmapping.
>>
>> Why do we have to do that on unmapping ? If we are unmapping it means we
>> are retiring the area, so why would we need to use CPU for syncing an area
>> than won't be used anymore ?
> 
> So in general we need the cache maintaince only at map time if the CPUs
> gurantee to never ѕpeculate into memory that might be under DMA, which
> for modern CPUs is increasingly rate.  If the CPUs could speculate into
> the area that was DMA mapped we need to do another round of cache
> maintainance on unmap to make sure we really do not have any data
> in the cache.  powerpc currently does that for dma_sync_*_to_cpu, but
> not for the unmap case, which not only looks odd but also seems to be
> worked around in drivers (see the ppc44x crypto patch).
> 
> Note that there are some way to optimize the amount of cache flushing
> done even when the cpu speculates, see:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arc/mm/dma.c#n93
> 
> But for that I need help with people that actually understand the
> non-coherent powerpc SOCs and who can test it.  For now this patch
> is conservative.
> 

I can help you with powerpc 8xx actually.

Christophe

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods
  2018-12-17  7:39       ` Christophe Leroy
@ 2018-12-17  8:16         ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2018-12-17  8:16 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Christian Lamparter, Paul Mackerras, linuxppc-dev, Christoph Hellwig

On Mon, Dec 17, 2018 at 08:39:17AM +0100, Christophe Leroy wrote:
> I can help you with powerpc 8xx actually.

Below is a patch that implements the proper scheme on top of the series
in this thread.  Compile tested with tqm8xx_defconfig and tqm8xx_defconfig
+ CONFIG_HIGHMEM only.

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index dacd0f93f2b2..8df9dd42b351 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -39,19 +39,17 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
  * to ensure it is consistent.
  */
 struct device;
-extern void __dma_sync(void *vaddr, size_t size, int direction);
-extern void __dma_sync_page(struct page *page, unsigned long offset,
-				 size_t size, int direction);
+void ppc_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir);
+void ppc_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir);
 extern unsigned long __dma_get_coherent_pfn(unsigned long cpu_addr);
 
 #else /* ! CONFIG_NOT_COHERENT_CACHE */
-/*
- * Cache coherent cores.
- */
-
-#define __dma_sync(addr, size, rw)		((void)0)
-#define __dma_sync_page(pg, off, sz, rw)	((void)0)
-
+static inline void ppc_sync_dma_for_device(struct device *dev,
+		phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+}
 #endif /* ! CONFIG_NOT_COHERENT_CACHE */
 
 static inline unsigned long device_to_mask(struct device *dev)
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 270b2911c437..0c0bcfebc271 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -6,7 +6,7 @@
  */
 
 #include <linux/device.h>
-#include <linux/dma-mapping.h>
+#include <linux/dma-direct.h>
 #include <linux/dma-debug.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
@@ -194,23 +194,12 @@ static int dma_nommu_map_sg(struct device *dev, struct scatterlist *sgl,
 		if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
 			continue;
 
-		__dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
+		ppc_sync_dma_for_device(dev, sg_phys(sg), sg->length, direction);
 	}
 
 	return nents;
 }
 
-static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sgl,
-				int nents, enum dma_data_direction direction,
-				unsigned long attrs)
-{
-	struct scatterlist *sg;
-	int i;
-
-	for_each_sg(sgl, sg, nents, i)
-		__dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
-}
-
 static u64 dma_nommu_get_required_mask(struct device *dev)
 {
 	u64 end, mask;
@@ -230,39 +219,70 @@ static inline dma_addr_t dma_nommu_map_page(struct device *dev,
 					     enum dma_data_direction dir,
 					     unsigned long attrs)
 {
+	phys_addr_t paddr = page_to_phys(page) + offset;
+
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		__dma_sync_page(page, offset, size, dir);
+		ppc_sync_dma_for_device(dev, paddr, size, dir);
 
-	return page_to_phys(page) + offset + get_dma_offset(dev);
+	return paddr + get_dma_offset(dev);
 }
 
+#ifdef CONFIG_NOT_COHERENT_CACHE
 static inline void dma_nommu_unmap_page(struct device *dev,
 					 dma_addr_t dma_address,
 					 size_t size,
-					 enum dma_data_direction direction,
+					 enum dma_data_direction dir,
 					 unsigned long attrs)
 {
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		__dma_sync(bus_to_virt(dma_address), size, dir);
+		ppc_sync_dma_for_cpu(dev, dma_to_phys(dev, dma_address), size,
+				dir);
 }
 
-#ifdef CONFIG_NOT_COHERENT_CACHE
-static inline void dma_nommu_sync_sg(struct device *dev,
+static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sgl,
+				int nents, enum dma_data_direction direction,
+				unsigned long attrs)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sgl, sg, nents, i)
+		ppc_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, direction);
+}
+
+static inline void dma_nommu_sync_sg_for_device(struct device *dev,
 		struct scatterlist *sgl, int nents,
-		enum dma_data_direction direction)
+		enum dma_data_direction dir)
 {
 	struct scatterlist *sg;
 	int i;
 
 	for_each_sg(sgl, sg, nents, i)
-		__dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
+		ppc_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
+static inline void dma_nommu_sync_sg_for_cpu(struct device *dev,
+		struct scatterlist *sgl, int nents,
+		enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
 
-static inline void dma_nommu_sync_single(struct device *dev,
+	for_each_sg(sgl, sg, nents, i)
+		ppc_sync_dma_for_cpu(dev, sg_phys(sg), sg->length, dir);
+}
+
+static inline void dma_nommu_sync_single_for_device(struct device *dev,
+					  dma_addr_t dma_handle, size_t size,
+					  enum dma_data_direction dir)
+{
+	ppc_sync_dma_for_device(dev, dma_to_phys(dev, dma_handle), size, dir);
+}
+
+static inline void dma_nommu_sync_single_for_cpu(struct device *dev,
 					  dma_addr_t dma_handle, size_t size,
-					  enum dma_data_direction direction)
+					  enum dma_data_direction dir)
 {
-	__dma_sync(bus_to_virt(dma_handle), size, direction);
+	ppc_sync_dma_for_cpu(dev, dma_to_phys(dev, dma_handle), size, dir);
 }
 #endif
 
@@ -271,16 +291,16 @@ const struct dma_map_ops dma_nommu_ops = {
 	.free				= dma_nommu_free_coherent,
 	.mmap				= dma_nommu_mmap_coherent,
 	.map_sg				= dma_nommu_map_sg,
-	.unmap_sg			= dma_nommu_unmap_sg,
 	.dma_supported			= dma_nommu_dma_supported,
 	.map_page			= dma_nommu_map_page,
-	.unmap_page			= dma_nommu_unmap_page,
 	.get_required_mask		= dma_nommu_get_required_mask,
 #ifdef CONFIG_NOT_COHERENT_CACHE
-	.sync_single_for_cpu 		= dma_nommu_sync_single,
-	.sync_single_for_device 	= dma_nommu_sync_single,
-	.sync_sg_for_cpu 		= dma_nommu_sync_sg,
-	.sync_sg_for_device 		= dma_nommu_sync_sg,
+	.unmap_page			= dma_nommu_unmap_page,
+	.unmap_sg			= dma_nommu_unmap_sg,
+	.sync_single_for_cpu 		= dma_nommu_sync_single_for_cpu,
+	.sync_single_for_device 	= dma_nommu_sync_single_for_device,
+	.sync_sg_for_cpu 		= dma_nommu_sync_sg_for_cpu,
+	.sync_sg_for_device 		= dma_nommu_sync_sg_for_device,
 #endif
 };
 EXPORT_SYMBOL(dma_nommu_ops);
diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c
index e955539686a4..b6501f0a5ac8 100644
--- a/arch/powerpc/mm/dma-noncoherent.c
+++ b/arch/powerpc/mm/dma-noncoherent.c
@@ -311,35 +311,17 @@ void __dma_nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
 }
 
 /*
- * make an area consistent.
+ * Invalidate only when cache-line aligned otherwise there is the potential for
+ * discarding uncommitted data from the cache
  */
-void __dma_sync(void *vaddr, size_t size, int direction)
+static void flush_or_invalidate_dcache_range(unsigned long start,
+		unsigned long end)
 {
-	unsigned long start = (unsigned long)vaddr;
-	unsigned long end   = start + size;
-
-	switch (direction) {
-	case DMA_NONE:
-		BUG();
-	case DMA_FROM_DEVICE:
-		/*
-		 * invalidate only when cache-line aligned otherwise there is
-		 * the potential for discarding uncommitted data from the cache
-		 */
-		if ((start | end) & (L1_CACHE_BYTES - 1))
-			flush_dcache_range(start, end);
-		else
-			invalidate_dcache_range(start, end);
-		break;
-	case DMA_TO_DEVICE:		/* writeback only */
-		clean_dcache_range(start, end);
-		break;
-	case DMA_BIDIRECTIONAL:	/* writeback and invalidate */
+	if ((start | end) & (L1_CACHE_BYTES - 1))
 		flush_dcache_range(start, end);
-		break;
-	}
+	else
+		invalidate_dcache_range(start, end);
 }
-EXPORT_SYMBOL(__dma_sync);
 
 #ifdef CONFIG_HIGHMEM
 /*
@@ -351,9 +333,11 @@ EXPORT_SYMBOL(__dma_sync);
  * Note: yes, it is possible and correct to have a buffer extend
  * beyond the first page.
  */
-static inline void __dma_sync_page_highmem(struct page *page,
-		unsigned long offset, size_t size, int direction)
+static inline void __dma_sync_phys(phys_addr_t paddr, size_t size,
+		void (*op)(unsigned long start, unsigned long end))
 {
+	struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
+	unsigned long offset = paddr & ~PAGE_MASK;
 	size_t seg_size = min((size_t)(PAGE_SIZE - offset), size);
 	size_t cur_size = seg_size;
 	unsigned long flags, start, seg_offset = offset;
@@ -366,7 +350,7 @@ static inline void __dma_sync_page_highmem(struct page *page,
 		start = (unsigned long)kmap_atomic(page + seg_nr) + seg_offset;
 
 		/* Sync this buffer segment */
-		__dma_sync((void *)start, seg_size, direction);
+		op(start, start + seg_size);
 		kunmap_atomic((void *)start);
 		seg_nr++;
 
@@ -380,23 +364,47 @@ static inline void __dma_sync_page_highmem(struct page *page,
 
 	local_irq_restore(flags);
 }
+#else
+static inline void __dma_sync_phys(phys_addr_t paddr, size_t size,
+		void (*op)(unsigned long start, unsigned long end))
+{
+	unsigned long start = (unsigned long)phys_to_virt(paddr);
+	op(start, start + size);
+}
 #endif /* CONFIG_HIGHMEM */
 
-/*
- * __dma_sync_page makes memory consistent. identical to __dma_sync, but
- * takes a struct page instead of a virtual address
- */
-void __dma_sync_page(struct page *page, unsigned long offset,
-	size_t size, int direction)
+void ppc_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir)
 {
-#ifdef CONFIG_HIGHMEM
-	__dma_sync_page_highmem(page, offset, size, direction);
-#else
-	unsigned long start = (unsigned long)page_address(page) + offset;
-	__dma_sync((void *)start, size, direction);
-#endif
+	switch (dir) {
+	case DMA_TO_DEVICE:
+		__dma_sync_phys(paddr, size, flush_or_invalidate_dcache_range);
+		break;
+	case DMA_FROM_DEVICE:
+		__dma_sync_phys(paddr, size, clean_dcache_range);
+		break;
+	case DMA_BIDIRECTIONAL:
+		__dma_sync_phys(paddr, size, flush_dcache_range);
+		break;
+	default:
+		break;
+	}
+}
+
+void ppc_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+		size_t size, enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_TO_DEVICE:
+		break;
+	case DMA_FROM_DEVICE:
+	case DMA_BIDIRECTIONAL:
+		__dma_sync_phys(paddr, size, clean_dcache_range);
+		break;
+	default:
+		break;
+	}
 }
-EXPORT_SYMBOL(__dma_sync_page);
 
 /*
  * Return the PFN for a given cpu virtual address returned by

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods
  2018-12-16 17:19 ` [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods Christoph Hellwig
  2018-12-17  6:41   ` Christophe Leroy
@ 2018-12-17 11:37   ` Michael Ellerman
  1 sibling, 0 replies; 22+ messages in thread
From: Michael Ellerman @ 2018-12-17 11:37 UTC (permalink / raw)
  To: Christoph Hellwig, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Christian Lamparter

Christoph Hellwig <hch@lst.de> writes:

> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index dbfc7056d7df..d442d23e182b 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -247,6 +252,8 @@ static inline void dma_nommu_unmap_page(struct device *dev,
>  					 enum dma_data_direction direction,
>  					 unsigned long attrs)
>  {
> +	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> +		__dma_sync(bus_to_virt(dma_address), size, dir);

I did s/dir/direction here.

cheers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations
  2018-12-17  7:35     ` Christoph Hellwig
@ 2018-12-17 21:34       ` Gerhard Pircher
  0 siblings, 0 replies; 22+ messages in thread
From: Gerhard Pircher @ 2018-12-17 21:34 UTC (permalink / raw)
  To: Christoph Hellwig, Christophe Leroy
  Cc: linuxppc-dev, Paul Mackerras, Christian Lamparter

Am 2018-12-17 um 08:35 schrieb Christoph Hellwig:
> On Mon, Dec 17, 2018 at 07:51:05AM +0100, Christophe Leroy wrote:
>>
>>
>> Le 16/12/2018 à 18:19, Christoph Hellwig a écrit :
>>> The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
>>> any code with the one for systems with coherent caches.  Split it off
>>> and merge it with the helpers in dma-noncoherent.c that have no other
>>> callers.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> ---
>>>   arch/powerpc/include/asm/dma-mapping.h |  5 -----
>>>   arch/powerpc/kernel/dma.c              | 14 ++------------
>>
>> Instead of all the ifdefs in dma.c, couldn't we split it
>> in two files, ie dma.c for common parts and dma-coherence.c for specific 
>> stuff ?
> 
> The end goal is to kill dma.c and keep dma-noncoherent.c only with most
> of the code moving to common code.  Here is the current state of that:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.5
> 
> But it still has issues on two tested platforms and isn't ready yet.
I hope that I can give this a try on one of my AmigaOne machines over
Christmas. Unfortunately my main local AmigaOne machine is out of order
and the other one is only remotely accessible, which makes kernel testing
a bit hard. :-)

Gerhard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page
  2018-12-16 17:19 ` [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page Christoph Hellwig
  2018-12-16 18:28   ` Christian Lamparter
@ 2018-12-19 10:13   ` Christian Lamparter
  1 sibling, 0 replies; 22+ messages in thread
From: Christian Lamparter @ 2018-12-19 10:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Paul Mackerras, linuxppc-dev

On Sunday, December 16, 2018 6:19:46 PM CET Christoph Hellwig wrote:
> This function is internal to the DMA API implementation.  Instead use the
> DMA API to properly unmap.  Note that the DMA API usage in this driver
> is a disaster and urgently needs some work - it is missing all the unmaps,
> seems to do a secondary map where it looks like it should to a unmap
> in one place to work around cache coherency and the directions passed in
> seem to be partially wrong.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
I've loaded the series (+dir -> direction patch) onto a cross-compiled
vanilla 4.20-rc7. I can report that the box didn't crash, though I would
have liked to test with DMA_DEBUG.

Tested-by: Christian Lamparter <chunkeey@gmail.com>

> ---
>  drivers/crypto/amcc/crypto4xx_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/crypto/amcc/crypto4xx_core.c b/drivers/crypto/amcc/crypto4xx_core.c
> index 6eaec9ba0f68..63cb6956c948 100644
> --- a/drivers/crypto/amcc/crypto4xx_core.c
> +++ b/drivers/crypto/amcc/crypto4xx_core.c
> @@ -596,7 +596,7 @@ static void crypto4xx_aead_done(struct crypto4xx_device *dev,
>  					  pd->pd_ctl_len.bf.pkt_len,
>  					  dst);
>  	} else {
> -		__dma_sync_page(sg_page(dst), dst->offset, dst->length,
> +		dma_unmap_page(dev->core_dev->device, pd->dest, dst->length,
>  				DMA_FROM_DEVICE);
>  	}
>  
> 





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [1/8] powerpc: allow NOT_COHERENT_CACHE for amigaone
  2018-12-16 17:19 ` [PATCH 1/8] powerpc: allow NOT_COHERENT_CACHE for amigaone Christoph Hellwig
@ 2018-12-22  9:54   ` Michael Ellerman
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Ellerman @ 2018-12-22  9:54 UTC (permalink / raw)
  To: Christoph Hellwig, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, Christian Lamparter

On Sun, 2018-12-16 at 17:19:44 UTC, Christoph Hellwig wrote:
> AMIGAONE selects NOT_COHERENT_CACHE, so we better allow it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9286356907caa2d4737bd9dd8afe0b

cheers

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-12-22 10:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-16 17:19 small powerpc DMA fixes and cleanups Christoph Hellwig
2018-12-16 17:19 ` [PATCH 1/8] powerpc: allow NOT_COHERENT_CACHE for amigaone Christoph Hellwig
2018-12-22  9:54   ` [1/8] " Michael Ellerman
2018-12-16 17:19 ` [PATCH 2/8] powerpc/dma: properly wire up the unmap_page and unmap_sg methods Christoph Hellwig
2018-12-17  6:41   ` Christophe Leroy
2018-12-17  7:34     ` Christoph Hellwig
2018-12-17  7:39       ` Christophe Leroy
2018-12-17  8:16         ` Christoph Hellwig
2018-12-17 11:37   ` Michael Ellerman
2018-12-16 17:19 ` [PATCH 3/8] crypto4xx_core: don't abuse __dma_sync_page Christoph Hellwig
2018-12-16 18:28   ` Christian Lamparter
2018-12-17  7:27     ` Christoph Hellwig
2018-12-19 10:13   ` Christian Lamparter
2018-12-16 17:19 ` [PATCH 4/8] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define Christoph Hellwig
2018-12-16 17:19 ` [PATCH 5/8] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export Christoph Hellwig
2018-12-16 17:19 ` [PATCH 6/8] powerpc/dma: remove the unused dma_iommu_ops export Christoph Hellwig
2018-12-16 17:19 ` [PATCH 7/8] powerpc/dma: split the two __dma_alloc_coherent implementations Christoph Hellwig
2018-12-17  6:51   ` Christophe Leroy
2018-12-17  7:35     ` Christoph Hellwig
2018-12-17 21:34       ` Gerhard Pircher
2018-12-16 17:19 ` [PATCH 8/8] cxl: drop the dma_set_mask callback from vphb Christoph Hellwig
2018-12-16 23:10   ` Andrew Donnellan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).