All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Add dma_mmap_coherent() for other archs
@ 2009-07-10 13:10 Takashi Iwai
  2009-07-10 13:12 ` [PATCH 1/8] mips: implement dma_mmap_coherent() Takashi Iwai
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:10 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

Hi,

currently, there is no uniformed way to mmap the pages allocated
via dma_alloc_coherent() properly.  This is a long-standing issue
in the ALSA PCM code, which causes Oops on some non-coherent
architectures.

ARM has already a function dma_mmap_coherent() for that purpose.
This patch series is for adding the same function to other major
architectures to improve the situation (read: not "solve" perfectly :)

I tried to keep the addition as simple and small as possible.
As I couldn't do build tests on some archs, some patches might be
wrong.  A fix patch would be appreciated.

Also, I added ARCH_HAS_DMA_MMAP_COHERENT for each definition so that
the caller can know whether the function is available.  This is
actually more helpful than giving a dummy inline function (at least in
the case of ALSA code) since it can be optimized at the compile time.
Maybe another name would be better, or there is a more clever way.
In anyways, comments and suggestions are greatly appreciated.

The patches can be found on test/dma-fix branch of sound git tree,
too.
  git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6.git test/dma-fix


thanks,

Takashi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/8] mips: implement dma_mmap_coherent()
  2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
@ 2009-07-10 13:12 ` Takashi Iwai
  2009-07-10 13:13 ` [PATCH 2/8] arm: Define ARCH_HAS_DMA_MMAP_COHERENT Takashi Iwai
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:12 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

A lazy version of dma_mmap_coherent() implementation for MIPS.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 arch/mips/include/asm/dma-mapping.h |    4 ++++
 arch/mips/mm/dma-default.c          |   13 +++++++++++++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/mips/include/asm/dma-mapping.h b/arch/mips/include/asm/dma-mapping.h
index d16afdd..475fad3 100644
--- a/arch/mips/include/asm/dma-mapping.h
+++ b/arch/mips/include/asm/dma-mapping.h
@@ -73,6 +73,10 @@ extern int dma_is_consistent(struct device *dev, dma_addr_t dma_addr);
 extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 	       enum dma_data_direction direction);
 
+#define ARCH_HAS_DMA_MMAP_COHERENT
+extern int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
+			     void *cpu_addr, dma_addr_t handle, size_t size);
+
 #if 0
 #define ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY
 
diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
index 7e48e76..6d71a2b 100644
--- a/arch/mips/mm/dma-default.c
+++ b/arch/mips/mm/dma-default.c
@@ -367,3 +367,16 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 }
 
 EXPORT_SYMBOL(dma_cache_sync);
+
+int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
+		      void *cpu_addr, dma_addr_t handle, size_t size)
+{
+	struct page *pg;
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+	cpu_addr = (void *)dma_addr_to_virt(handle);
+	pg = virt_to_page(cpu_addr);
+	return remap_pfn_range(vma, vma->vm_start,
+			       page_to_pfn(pg) + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(dma_mmap_coherent);
-- 
1.6.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/8] arm: Define ARCH_HAS_DMA_MMAP_COHERENT
  2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
  2009-07-10 13:12 ` [PATCH 1/8] mips: implement dma_mmap_coherent() Takashi Iwai
@ 2009-07-10 13:13 ` Takashi Iwai
  2009-07-10 13:13 ` [PATCH 3/8] parisc: implement dma_mmap_coherent() Takashi Iwai
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:13 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 arch/arm/include/asm/dma-mapping.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index ff46dfa..c249012 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -157,6 +157,7 @@ extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t);
  */
 extern void dma_free_coherent(struct device *, size_t, void *, dma_addr_t);
 
+#define ARCH_HAS_DMA_MMAP_COHERENT
 /**
  * dma_mmap_coherent - map a coherent DMA allocation into user space
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
-- 
1.6.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
  2009-07-10 13:12 ` [PATCH 1/8] mips: implement dma_mmap_coherent() Takashi Iwai
  2009-07-10 13:13 ` [PATCH 2/8] arm: Define ARCH_HAS_DMA_MMAP_COHERENT Takashi Iwai
@ 2009-07-10 13:13 ` Takashi Iwai
  2009-07-10 15:11   ` James Bottomley
  2009-07-10 13:14 ` [PATCH 4/8] sh: " Takashi Iwai
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:13 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

A lazy version of dma_mmap_coherent() implementation for PA-RISC.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 arch/parisc/include/asm/dma-mapping.h |   13 +++++++++++++
 arch/parisc/kernel/pci-dma.c          |   16 ++++++++++++++++
 drivers/parisc/ccio-dma.c             |    1 +
 drivers/parisc/iommu-helpers.h        |   16 ++++++++++++++++
 drivers/parisc/sba_iommu.c            |    1 +
 5 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/parisc/include/asm/dma-mapping.h b/arch/parisc/include/asm/dma-mapping.h
index da69433..b5f810e 100644
--- a/arch/parisc/include/asm/dma-mapping.h
+++ b/arch/parisc/include/asm/dma-mapping.h
@@ -19,6 +19,9 @@ struct hppa_dma_ops {
 	void (*dma_sync_single_for_device)(struct device *dev, dma_addr_t iova, unsigned long offset, size_t size, enum dma_data_direction direction);
 	void (*dma_sync_sg_for_cpu)(struct device *dev, struct scatterlist *sg, int nelems, enum dma_data_direction direction);
 	void (*dma_sync_sg_for_device)(struct device *dev, struct scatterlist *sg, int nelems, enum dma_data_direction direction);
+	int (*mmap_coherent)(struct device *dev, struct vm_area_struct *vma,
+			     void *cpu_addr, dma_addr_t handle, size_t size);
+
 };
 
 /*
@@ -204,6 +207,16 @@ dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		flush_kernel_dcache_range((unsigned long)vaddr, size);
 }
 
+#define ARCH_HAS_DMA_MMAP_COHERENT
+static inline int
+dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
+		  void *cpu_addr, dma_addr_t handle, size_t size)
+{
+	if (!hppa_dma_ops->mmap_coherent)
+		return -ENXIO;
+	return hppa_dma_ops->mmap_coherent(dev, vma, cpu_addr, handle, size);
+}
+
 static inline void *
 parisc_walk_tree(struct device *dev)
 {
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index c07f618..e8c68e3 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -539,6 +539,20 @@ static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *
 		flush_kernel_dcache_range(sg_virt_addr(sglist), sglist->length);
 }
 
+static int pa11_dma_mmap_coherent(struct device *dev,
+				  struct vm_area_struct *vma,
+				  void *cpu_addr, dma_addr_t handle,
+				  size_t size)
+{
+	struct page *pg;
+	pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE;
+	cpu_addr = __va(handle);
+	pg = virt_to_page(cpu_addr);
+	return remap_pfn_range(vma, vma->vm_start,
+			       page_to_pfn(pg) + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
+
 struct hppa_dma_ops pcxl_dma_ops = {
 	.dma_supported =	pa11_dma_supported,
 	.alloc_consistent =	pa11_dma_alloc_consistent,
@@ -552,6 +566,7 @@ struct hppa_dma_ops pcxl_dma_ops = {
 	.dma_sync_single_for_device = pa11_dma_sync_single_for_device,
 	.dma_sync_sg_for_cpu = pa11_dma_sync_sg_for_cpu,
 	.dma_sync_sg_for_device = pa11_dma_sync_sg_for_device,
+	.mmap_coherent =	pa11_dma_mmap_coherent,
 };
 
 static void *fail_alloc_consistent(struct device *dev, size_t size,
@@ -592,4 +607,5 @@ struct hppa_dma_ops pcx_dma_ops = {
 	.dma_sync_single_for_device =	pa11_dma_sync_single_for_device,
 	.dma_sync_sg_for_cpu =		pa11_dma_sync_sg_for_cpu,
 	.dma_sync_sg_for_device =	pa11_dma_sync_sg_for_device,
+	.mmap_coherent =	pa11_dma_mmap_coherent,
 };
diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index 0f0e0b9..0dd67ae 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -1016,6 +1016,7 @@ static struct hppa_dma_ops ccio_ops = {
 	.dma_sync_single_for_device =	NULL,	/* NOP for U2/Uturn */
 	.dma_sync_sg_for_cpu =		NULL,	/* ditto */
 	.dma_sync_sg_for_device =		NULL,	/* ditto */
+	.mmap_coherent =	iommu_dma_mmap_coherent,
 };
 
 #ifdef CONFIG_PROC_FS
diff --git a/drivers/parisc/iommu-helpers.h b/drivers/parisc/iommu-helpers.h
index a9c46cc..9f86f3f 100644
--- a/drivers/parisc/iommu-helpers.h
+++ b/drivers/parisc/iommu-helpers.h
@@ -174,3 +174,19 @@ iommu_coalesce_chunks(struct ioc *ioc, struct device *dev,
 	return n_mappings;
 }
 
+/* dma_mmap_coherent callback function */
+/* Note that this is no inline function -- this function is eventually
+ * included in ccio_ops in both ccio-dma.c and sba_iommu.c
+ */
+static int iommu_dma_mmap_coherent(struct device *dev,
+				   struct vm_area_struct *vma,
+				   void *cpu_addr, dma_addr_t handle,
+				   size_t size)
+{
+	struct page *pg;
+	pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE;
+	pg = virt_to_page(cpu_addr);
+	return remap_pfn_range(vma, vma->vm_start,
+			       page_to_pfn(pg) + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index 123d8fe..8ccb3e4 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -1070,6 +1070,7 @@ static struct hppa_dma_ops sba_ops = {
 	.dma_sync_single_for_device =	NULL,
 	.dma_sync_sg_for_cpu =		NULL,
 	.dma_sync_sg_for_device =	NULL,
+	.mmap_coherent =	iommu_dma_mmap_coherent,
 };
 
 
-- 
1.6.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/8] sh: implement dma_mmap_coherent()
  2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
                   ` (2 preceding siblings ...)
  2009-07-10 13:13 ` [PATCH 3/8] parisc: implement dma_mmap_coherent() Takashi Iwai
@ 2009-07-10 13:14 ` Takashi Iwai
  2009-07-10 13:14 ` [PATCH 5/8] sparc: " Takashi Iwai
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

A lazy version of dma_mmap_coherent() implementation for SH architecture.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 arch/sh/include/asm/dma-mapping.h |    4 ++++
 arch/sh/mm/consistent.c           |   13 +++++++++++++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/sh/include/asm/dma-mapping.h b/arch/sh/include/asm/dma-mapping.h
index 69d56dd..120fe1a 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -28,6 +28,10 @@ void *dma_alloc_coherent(struct device *dev, size_t size,
 void dma_free_coherent(struct device *dev, size_t size,
 		       void *vaddr, dma_addr_t dma_handle);
 
+#define ARCH_HAS_DMA_MMAP_COHERENT
+int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
+		      void *cpu_addr, dma_addr_t handle, size_t size);
+
 void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 		    enum dma_data_direction dir);
 
diff --git a/arch/sh/mm/consistent.c b/arch/sh/mm/consistent.c
index e098ec1..f45aaff 100644
--- a/arch/sh/mm/consistent.c
+++ b/arch/sh/mm/consistent.c
@@ -107,6 +107,19 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
 }
 EXPORT_SYMBOL(dma_cache_sync);
 
+int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
+		      void *cpu_addr, dma_addr_t handle, size_t size)
+{
+	struct page *pg;
+	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+	cpu_addr = (void *)phys_to_virt(handle);
+	pg = virt_to_page(cpu_addr);
+	return remap_pfn_range(vma, vma->vm_start,
+			       page_to_pfn(pg) + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
+EXPORT_SYMBOL(dma_mmap_coherent);
+
 static int __init memchunk_setup(char *str)
 {
 	return 1; /* accept anything that begins with "memchunk." */
-- 
1.6.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/8] sparc: implement dma_mmap_coherent()
  2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
                   ` (3 preceding siblings ...)
  2009-07-10 13:14 ` [PATCH 4/8] sh: " Takashi Iwai
@ 2009-07-10 13:14 ` Takashi Iwai
  2009-07-10 13:15 ` [PATCH 6/8] powerpc: " Takashi Iwai
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

A lazy version of dma_mmap_coherent() implementation for Sparc.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 arch/sparc/include/asm/dma-mapping.h |   17 +++++++++++++++++
 arch/sparc/kernel/dma.c              |    1 +
 arch/sparc/kernel/dma.h              |    3 +++
 arch/sparc/kernel/ioport.c           |   16 ++++++++++++++++
 4 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/sparc/include/asm/dma-mapping.h b/arch/sparc/include/asm/dma-mapping.h
index 204e4bf..dd9d102 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -41,6 +41,8 @@ struct dma_ops {
 	void (*sync_sg_for_device)(struct device *dev,
 				   struct scatterlist *sg, int nents,
 				   enum dma_data_direction dir);
+	int (*mmap_coherent)(struct device *dev, struct vm_area_struct *vma,
+			     void *cpu_addr, dma_addr_t handle, size_t size);
 };
 extern const struct dma_ops *dma_ops;
 
@@ -148,6 +150,21 @@ static inline void dma_sync_single_range_for_device(struct device *dev,
 	dma_sync_single_for_device(dev, dma_handle+offset, size, dir);
 }
 
+#define ARCH_HAS_DMA_MMAP_COHERENT
+static inline int dma_mmap_coherent(struct device *dev,
+				    struct vm_area_struct *vma,
+				    void *cpu_addr, dma_addr_t handle,
+				    size_t size)
+{
+	unsigned long pfn;
+
+	if (dma_ops->mmap_coherent)
+		return dma_ops->mmap_coherent(dev, vma, cpu_addr, handle, size);
+	pfn = page_to_pfn(virt_to_page(cpu_addr));
+	return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
+
 
 static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
diff --git a/arch/sparc/kernel/dma.c b/arch/sparc/kernel/dma.c
index 524c32f..85cf6de 100644
--- a/arch/sparc/kernel/dma.c
+++ b/arch/sparc/kernel/dma.c
@@ -172,6 +172,7 @@ static const struct dma_ops dma32_dma_ops = {
 	.sync_single_for_device	= dma32_sync_single_for_device,
 	.sync_sg_for_cpu	= dma32_sync_sg_for_cpu,
 	.sync_sg_for_device	= dma32_sync_sg_for_device,
+	.mmap_coherent		= dma32_mmap_coherent,
 };
 
 const struct dma_ops *dma_ops = &dma32_dma_ops;
diff --git a/arch/sparc/kernel/dma.h b/arch/sparc/kernel/dma.h
index f8d8951..0f2e61f 100644
--- a/arch/sparc/kernel/dma.h
+++ b/arch/sparc/kernel/dma.h
@@ -12,3 +12,6 @@ void sbus_dma_sync_single_for_cpu(struct device *dev, dma_addr_t ba,
 				  size_t size, int direction);
 void sbus_dma_sync_single_for_device(struct device *dev, dma_addr_t ba,
 				     size_t size, int direction);
+
+int dma32_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
+			void *cpu_addr, dma_addr_t handle, size_t size);
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index 87ea0d0..ff21189 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -661,6 +661,22 @@ void pci_dma_sync_sg_for_device(struct pci_dev *hwdev, struct scatterlist *sgl,
 EXPORT_SYMBOL(pci_dma_sync_sg_for_device);
 #endif /* CONFIG_PCI */
 
+/* used in dma.c */
+int dma32_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
+			void *cpu_addr, dma_addr_t handle, size_t size)
+{
+	struct resource *res;
+	struct page *pg;
+
+	res = _sparc_find_resource(&_sparc_dvma, (unsigned long)cpu_addr);
+	if (!res)
+		return -ENXIO;
+	pg = virt_to_page(res->start);
+	return remap_pfn_range(vma, vma->vm_start,
+			       page_to_pfn(pg) + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int
-- 
1.6.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/8] powerpc: implement dma_mmap_coherent()
  2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
                   ` (4 preceding siblings ...)
  2009-07-10 13:14 ` [PATCH 5/8] sparc: " Takashi Iwai
@ 2009-07-10 13:15 ` Takashi Iwai
  2009-07-10 13:15 ` [PATCH 7/8] x86: " Takashi Iwai
  2009-07-10 13:16 ` [PATCH 8/8] ia64: " Takashi Iwai
  7 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:15 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

A lazy version of dma_mmap_coherent() implementation for powerpc.
The standard mmap is used as a fallback.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 arch/powerpc/include/asm/dma-mapping.h |   22 ++++++++++++++++++++++
 arch/powerpc/kernel/dma.c              |   21 +++++++++++++++++++++
 2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index b44aaab..030a4b1 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -105,6 +105,10 @@ struct dma_mapping_ops {
 				struct scatterlist *sg, int nelems,
 				enum dma_data_direction direction);
 #endif
+	int		(*mmap_coherent)(struct device *hwdev,
+					 struct vm_area_struct *vma,
+					 void *cpu_addr, dma_addr_t dma_handle,
+					 size_t size);
 };
 
 /*
@@ -415,6 +419,24 @@ static inline void dma_sync_single_range_for_device(struct device *dev,
 }
 #endif
 
+#define ARCH_HAS_DMA_MMAP_COHERENT
+static inline int dma_mmap_coherent(struct device *dev,
+				    struct vm_area_struct *vma,
+				    void *cpu_addr, dma_addr_t dma_handle,
+				    size_t size)
+{
+	struct dma_mapping_ops *dma_ops = get_dma_ops(dev);
+
+	BUG_ON(!dma_ops);
+
+	if (dma_ops->mmap_coherent)
+		return dma_ops->mmap_coherent(dev, vma, cpu_addr, dma_handle,
+					      size);
+	else
+		return dma_direct_ops.mmap_coherent(dev, vma, cpu_addr,
+						    dma_handle, size);
+}
+
 static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 20a60d6..d11db99 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -140,6 +140,26 @@ static inline void dma_direct_sync_single_range(struct device *dev,
 }
 #endif
 
+static int dma_direct_mmap_coherent(struct device *dev,
+				    struct vm_area_struct *vma,
+				    void *cpu_addr, dma_addr_t dma_handle,
+				    size_t size)
+{
+	unsigned long pfn;
+#ifdef CONFIG_NOT_COHERENT_CACHE
+	dma_handle -= get_dma_direct_offset(dev);
+	/* assume dma_handle set via pfn_to_phys() in
+	 * mm/dma-noncoherent.c
+	 */
+	pfn = dma_handle >> PAGE_SHIFT;
+#else
+	pfn = page_to_pfn(virt_to_page(cpu_addr));
+#endif
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+	return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
+
 struct dma_mapping_ops dma_direct_ops = {
 	.alloc_coherent	= dma_direct_alloc_coherent,
 	.free_coherent	= dma_direct_free_coherent,
@@ -154,5 +174,6 @@ struct dma_mapping_ops dma_direct_ops = {
 	.sync_sg_for_cpu 		= dma_direct_sync_sg,
 	.sync_sg_for_device 		= dma_direct_sync_sg,
 #endif
+	.mmap_coherent	= dma_direct_mmap_coherent,
 };
 EXPORT_SYMBOL(dma_direct_ops);
-- 
1.6.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/8] x86: implement dma_mmap_coherent()
  2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
                   ` (5 preceding siblings ...)
  2009-07-10 13:15 ` [PATCH 6/8] powerpc: " Takashi Iwai
@ 2009-07-10 13:15 ` Takashi Iwai
  2009-07-10 13:16 ` [PATCH 8/8] ia64: " Takashi Iwai
  7 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:15 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

A lazy version of dma_mmap_coherent() implementation for x86.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 arch/x86/include/asm/dma-mapping.h |   16 ++++++++++++++++
 include/linux/dma-mapping.h        |    2 ++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 1c3f943..4941ad9 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -139,4 +139,20 @@ static inline void dma_free_coherent(struct device *dev, size_t size,
 		ops->free_coherent(dev, size, vaddr, bus);
 }
 
+#define ARCH_HAS_DMA_MMAP_COHERENT
+static inline int dma_mmap_coherent(struct device *dev,
+				    struct vm_area_struct *vma,
+				    void *cpu_addr, dma_addr_t handle,
+				    size_t size)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	unsigned long pfn;
+
+	if (ops->mmap_coherent)
+		return ops->mmap_coherent(dev, vma, cpu_addr, handle, size);
+	pfn = page_to_pfn(virt_to_page(cpu_addr));
+	return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
+
 #endif
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 07dfd46..9ddc6ae 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -58,6 +58,8 @@ struct dma_map_ops {
 				   enum dma_data_direction dir);
 	int (*mapping_error)(struct device *dev, dma_addr_t dma_addr);
 	int (*dma_supported)(struct device *dev, u64 mask);
+	int (*mmap_coherent)(struct device *dev, struct vm_area_struct *vma,
+			     void *cpu_addr, dma_addr_t handle, size_t size);
 	int is_phys;
 };
 
-- 
1.6.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 8/8] ia64: implement dma_mmap_coherent()
  2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
                   ` (6 preceding siblings ...)
  2009-07-10 13:15 ` [PATCH 7/8] x86: " Takashi Iwai
@ 2009-07-10 13:16 ` Takashi Iwai
  7 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-10 13:16 UTC (permalink / raw)
  To: linux-arch; +Cc: Gerhard Pircher

A lazy version of dma_mmap_coherent() implementation for IA64.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 arch/ia64/include/asm/dma-mapping.h |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h
index 5a61b5c..70b2c61 100644
--- a/arch/ia64/include/asm/dma-mapping.h
+++ b/arch/ia64/include/asm/dma-mapping.h
@@ -84,4 +84,20 @@ dma_cache_sync (struct device *dev, void *vaddr, size_t size,
 
 #define dma_is_consistent(d, h)	(1)	/* all we do is coherent memory... */
 
+#define ARCH_HAS_DMA_MMAP_COHERENT
+static inline int dma_mmap_coherent(struct device *dev,
+				    struct vm_area_struct *vma,
+				    void *cpu_addr, dma_addr_t handle,
+				    size_t size)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	unsigned long pfn;
+
+	if (ops->mmap_coherent)
+		return ops->mmap_coherent(dev, vma, cpu_addr, handle, size);
+	pfn = page_to_pfn(virt_to_page(cpu_addr));
+	return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+			       size, vma->vm_page_prot);
+}
+
 #endif /* _ASM_IA64_DMA_MAPPING_H */
-- 
1.6.3.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-10 13:13 ` [PATCH 3/8] parisc: implement dma_mmap_coherent() Takashi Iwai
@ 2009-07-10 15:11   ` James Bottomley
  2009-07-10 18:16     ` Russell King
  0 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2009-07-10 15:11 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: linux-arch, Gerhard Pircher, Parisc List

On Fri, 2009-07-10 at 15:13 +0200, Takashi Iwai wrote:
> A lazy version of dma_mmap_coherent() implementation for PA-RISC.
> 
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  arch/parisc/include/asm/dma-mapping.h |   13 +++++++++++++
>  arch/parisc/kernel/pci-dma.c          |   16 ++++++++++++++++
>  drivers/parisc/ccio-dma.c             |    1 +
>  drivers/parisc/iommu-helpers.h        |   16 ++++++++++++++++
>  drivers/parisc/sba_iommu.c            |    1 +
>  5 files changed, 47 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/parisc/include/asm/dma-mapping.h b/arch/parisc/include/asm/dma-mapping.h
> index da69433..b5f810e 100644
> --- a/arch/parisc/include/asm/dma-mapping.h
> +++ b/arch/parisc/include/asm/dma-mapping.h
> @@ -19,6 +19,9 @@ struct hppa_dma_ops {
>  	void (*dma_sync_single_for_device)(struct device *dev, dma_addr_t iova, unsigned long offset, size_t size, enum dma_data_direction direction);
>  	void (*dma_sync_sg_for_cpu)(struct device *dev, struct scatterlist *sg, int nelems, enum dma_data_direction direction);
>  	void (*dma_sync_sg_for_device)(struct device *dev, struct scatterlist *sg, int nelems, enum dma_data_direction direction);
> +	int (*mmap_coherent)(struct device *dev, struct vm_area_struct *vma,
> +			     void *cpu_addr, dma_addr_t handle, size_t size);
> +
>  };
>  
>  /*
> @@ -204,6 +207,16 @@ dma_cache_sync(struct device *dev, void *vaddr, size_t size,
>  		flush_kernel_dcache_range((unsigned long)vaddr, size);
>  }
>  
> +#define ARCH_HAS_DMA_MMAP_COHERENT
> +static inline int
> +dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
> +		  void *cpu_addr, dma_addr_t handle, size_t size)
> +{
> +	if (!hppa_dma_ops->mmap_coherent)
> +		return -ENXIO;
> +	return hppa_dma_ops->mmap_coherent(dev, vma, cpu_addr, handle, size);
> +}
> +
>  static inline void *
>  parisc_walk_tree(struct device *dev)
>  {
> diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
> index c07f618..e8c68e3 100644
> --- a/arch/parisc/kernel/pci-dma.c
> +++ b/arch/parisc/kernel/pci-dma.c
> @@ -539,6 +539,20 @@ static void pa11_dma_sync_sg_for_device(struct device *dev, struct scatterlist *
>  		flush_kernel_dcache_range(sg_virt_addr(sglist), sglist->length);
>  }
>  
> +static int pa11_dma_mmap_coherent(struct device *dev,
> +				  struct vm_area_struct *vma,
> +				  void *cpu_addr, dma_addr_t handle,
> +				  size_t size)
> +{
> +	struct page *pg;
> +	pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE;
> +	cpu_addr = __va(handle);
> +	pg = virt_to_page(cpu_addr);
> +	return remap_pfn_range(vma, vma->vm_start,
> +			       page_to_pfn(pg) + vma->vm_pgoff,
> +			       size, vma->vm_page_prot);
> +}
> +
>  struct hppa_dma_ops pcxl_dma_ops = {
>  	.dma_supported =	pa11_dma_supported,
>  	.alloc_consistent =	pa11_dma_alloc_consistent,
> @@ -552,6 +566,7 @@ struct hppa_dma_ops pcxl_dma_ops = {
>  	.dma_sync_single_for_device = pa11_dma_sync_single_for_device,
>  	.dma_sync_sg_for_cpu = pa11_dma_sync_sg_for_cpu,
>  	.dma_sync_sg_for_device = pa11_dma_sync_sg_for_device,
> +	.mmap_coherent =	pa11_dma_mmap_coherent,
>  };
>  
>  static void *fail_alloc_consistent(struct device *dev, size_t size,
> @@ -592,4 +607,5 @@ struct hppa_dma_ops pcx_dma_ops = {
>  	.dma_sync_single_for_device =	pa11_dma_sync_single_for_device,
>  	.dma_sync_sg_for_cpu =		pa11_dma_sync_sg_for_cpu,
>  	.dma_sync_sg_for_device =	pa11_dma_sync_sg_for_device,
> +	.mmap_coherent =	pa11_dma_mmap_coherent,
>  };
> diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
> index 0f0e0b9..0dd67ae 100644
> --- a/drivers/parisc/ccio-dma.c
> +++ b/drivers/parisc/ccio-dma.c
> @@ -1016,6 +1016,7 @@ static struct hppa_dma_ops ccio_ops = {
>  	.dma_sync_single_for_device =	NULL,	/* NOP for U2/Uturn */
>  	.dma_sync_sg_for_cpu =		NULL,	/* ditto */
>  	.dma_sync_sg_for_device =		NULL,	/* ditto */
> +	.mmap_coherent =	iommu_dma_mmap_coherent,
>  };
>  
>  #ifdef CONFIG_PROC_FS
> diff --git a/drivers/parisc/iommu-helpers.h b/drivers/parisc/iommu-helpers.h
> index a9c46cc..9f86f3f 100644
> --- a/drivers/parisc/iommu-helpers.h
> +++ b/drivers/parisc/iommu-helpers.h
> @@ -174,3 +174,19 @@ iommu_coalesce_chunks(struct ioc *ioc, struct device *dev,
>  	return n_mappings;
>  }
>  
> +/* dma_mmap_coherent callback function */
> +/* Note that this is no inline function -- this function is eventually
> + * included in ccio_ops in both ccio-dma.c and sba_iommu.c
> + */
> +static int iommu_dma_mmap_coherent(struct device *dev,
> +				   struct vm_area_struct *vma,
> +				   void *cpu_addr, dma_addr_t handle,
> +				   size_t size)
> +{
> +	struct page *pg;
> +	pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE;
> +	pg = virt_to_page(cpu_addr);
> +	return remap_pfn_range(vma, vma->vm_start,
> +			       page_to_pfn(pg) + vma->vm_pgoff,
> +			       size, vma->vm_page_prot);
> +}
> diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
> index 123d8fe..8ccb3e4 100644
> --- a/drivers/parisc/sba_iommu.c
> +++ b/drivers/parisc/sba_iommu.c
> @@ -1070,6 +1070,7 @@ static struct hppa_dma_ops sba_ops = {
>  	.dma_sync_single_for_device =	NULL,
>  	.dma_sync_sg_for_cpu =		NULL,
>  	.dma_sync_sg_for_device =	NULL,
> +	.mmap_coherent =	iommu_dma_mmap_coherent,
>  };

This isn't the right thing to do on parisc.  The only case that will
actually work is the small subset of pa11 systems where the only way we
can manufacture coherent memory is to turn the cache off.  One more
thing: even on pa11 you have to flush the cache lines before you make a
page uncached otherwise you can get coherency problems.

For pa20 systems, we have a cache flushing system on bus writes.  What
your patch does is turn off caching over the user page only.  This would
leave us with a cache over the kernel page, so any write the kernel
makes would need flushing before it's visible to the user, which isn't
how the kernel operates with coherent memory.

The only way to fix the above is either to make the user virtual address
congruent with the kernel one (VI congruence in parisc is steps of 4MB)
so they share the same cache (in which case they don't need the uncached
bit).  Or to make the coherent memory uncached when it's first
allocated ... this is a performance hit.

The design of coherent memory was for memory based device mailboxes
managed by the kernel ... trying to give userspace coherent access to
the same mailbox is problematic because it gives a direct way for the
process to interfere with a device function ... shouldn't whatever
you're trying to do be better accomplished by using an API to control
the device and keeping the coherent mailbox fully in the kernel address
space?

James



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-10 15:11   ` James Bottomley
@ 2009-07-10 18:16     ` Russell King
  2009-07-10 18:30       ` James Bottomley
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King @ 2009-07-10 18:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: Takashi Iwai, linux-arch, Gerhard Pircher, Parisc List

On Fri, Jul 10, 2009 at 03:11:29PM +0000, James Bottomley wrote:
> The design of coherent memory was for memory based device mailboxes
> managed by the kernel ... trying to give userspace coherent access to
> the same mailbox is problematic because it gives a direct way for the
> process to interfere with a device function ... shouldn't whatever
> you're trying to do be better accomplished by using an API to control
> the device and keeping the coherent mailbox fully in the kernel address
> space?

As far as sound DMA goes, it's not about mailboxes.  It's about a circular
buffer which you want the device to DMA from direct to/from the DAC/ADC
and have the application write/read data directly to/from that same
buffer.

Without this, you end up having to copy the sound data - at something
around 200KB/s from applications into a driver managed buffer, which is
quite an unnecessary overhead for the CPU.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-10 18:16     ` Russell King
@ 2009-07-10 18:30       ` James Bottomley
  2009-07-10 18:39         ` Russell King
  0 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2009-07-10 18:30 UTC (permalink / raw)
  To: Russell King; +Cc: Takashi Iwai, linux-arch, Gerhard Pircher, Parisc List

On Fri, 2009-07-10 at 19:16 +0100, Russell King wrote:
> On Fri, Jul 10, 2009 at 03:11:29PM +0000, James Bottomley wrote:
> > The design of coherent memory was for memory based device mailboxes
> > managed by the kernel ... trying to give userspace coherent access to
> > the same mailbox is problematic because it gives a direct way for the
> > process to interfere with a device function ... shouldn't whatever
> > you're trying to do be better accomplished by using an API to control
> > the device and keeping the coherent mailbox fully in the kernel address
> > space?
> 
> As far as sound DMA goes, it's not about mailboxes.  It's about a circular
> buffer which you want the device to DMA from direct to/from the DAC/ADC
> and have the application write/read data directly to/from that same
> buffer.

But that makes it sound like ordinary streaming DMA from a device to
user space ... that's what the dma_map_xx APIs are already designed to
handle:  I don't understand why you need coherent memory for this (which
can be a scarce resource on some platforms).

> Without this, you end up having to copy the sound data - at something
> around 200KB/s from applications into a driver managed buffer, which is
> quite an unnecessary overhead for the CPU.

Why?  The dma_map_xx API is designed to be zero copy.

James

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-10 18:30       ` James Bottomley
@ 2009-07-10 18:39         ` Russell King
  2009-07-10 18:59           ` James Bottomley
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King @ 2009-07-10 18:39 UTC (permalink / raw)
  To: James Bottomley; +Cc: Takashi Iwai, linux-arch, Gerhard Pircher, Parisc List

On Fri, Jul 10, 2009 at 06:30:50PM +0000, James Bottomley wrote:
> On Fri, 2009-07-10 at 19:16 +0100, Russell King wrote:
> > As far as sound DMA goes, it's not about mailboxes.  It's about a circular
> > buffer which you want the device to DMA from direct to/from the DAC/ADC
> > and have the application write/read data directly to/from that same
> > buffer.
> 
> But that makes it sound like ordinary streaming DMA from a device to
> user space ... that's what the dma_map_xx APIs are already designed to
> handle:  I don't understand why you need coherent memory for this (which
> can be a scarce resource on some platforms).

The streaming APIs are inefficient for this.  Consider the overhead of
having to writeback and invalidate caches at 200KB/s (which is what
you're requiring ARM to do).  That's far too much CPU overhead.

It's much more efficient to use non-cached memory for this on ARM.

We've been doing this for years and years, it's well proven.

> Why?  The dma_map_xx API is designed to be zero copy.

Except with a rather large overhead of repetitive cache handling.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-10 18:39         ` Russell King
@ 2009-07-10 18:59           ` James Bottomley
  2009-07-17 14:13             ` Takashi Iwai
  0 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2009-07-10 18:59 UTC (permalink / raw)
  To: Russell King; +Cc: Takashi Iwai, linux-arch, Gerhard Pircher, Parisc List

On Fri, 2009-07-10 at 19:39 +0100, Russell King wrote:
> On Fri, Jul 10, 2009 at 06:30:50PM +0000, James Bottomley wrote:
> > On Fri, 2009-07-10 at 19:16 +0100, Russell King wrote:
> > > As far as sound DMA goes, it's not about mailboxes.  It's about a circular
> > > buffer which you want the device to DMA from direct to/from the DAC/ADC
> > > and have the application write/read data directly to/from that same
> > > buffer.
> > 
> > But that makes it sound like ordinary streaming DMA from a device to
> > user space ... that's what the dma_map_xx APIs are already designed to
> > handle:  I don't understand why you need coherent memory for this (which
> > can be a scarce resource on some platforms).

> The streaming APIs are inefficient for this.  Consider the overhead of
> having to writeback and invalidate caches at 200KB/s (which is what
> you're requiring ARM to do).  That's far too much CPU overhead.

Um, but streaming APIs are what we use for all block and network I/O
from user space ... we don't see huge performance problems running up to
gigabits.  Even on parisc where we have to flush several times to make
this happen we can get up to several hundred megabytes per second.

> It's much more efficient to use non-cached memory for this on ARM.
> 
> We've been doing this for years and years, it's well proven.
> 
> > Why?  The dma_map_xx API is designed to be zero copy.
> 
> Except with a rather large overhead of repetitive cache handling.

OK, so this is the bit I don't get ... why isn't the streaming a problem
for I/O then, which even on arm wants far more than 200kb/s?

James



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-10 18:59           ` James Bottomley
@ 2009-07-17 14:13             ` Takashi Iwai
  2009-07-17 19:16               ` James Bottomley
  0 siblings, 1 reply; 17+ messages in thread
From: Takashi Iwai @ 2009-07-17 14:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: Russell King, linux-arch, Gerhard Pircher, Parisc List

[Sorry for the late follow-up as I've been absent]

At Fri, 10 Jul 2009 18:59:23 +0000,
James Bottomley wrote:
> 
> On Fri, 2009-07-10 at 19:39 +0100, Russell King wrote:
> > On Fri, Jul 10, 2009 at 06:30:50PM +0000, James Bottomley wrote:
> > > On Fri, 2009-07-10 at 19:16 +0100, Russell King wrote:
> > > > As far as sound DMA goes, it's not about mailboxes.  It's about a circular
> > > > buffer which you want the device to DMA from direct to/from the DAC/ADC
> > > > and have the application write/read data directly to/from that same
> > > > buffer.
> > > 
> > > But that makes it sound like ordinary streaming DMA from a device to
> > > user space ... that's what the dma_map_xx APIs are already designed to
> > > handle:  I don't understand why you need coherent memory for this (which
> > > can be a scarce resource on some platforms).
> 
> > The streaming APIs are inefficient for this.  Consider the overhead of
> > having to writeback and invalidate caches at 200KB/s (which is what
> > you're requiring ARM to do).  That's far too much CPU overhead.
> 
> Um, but streaming APIs are what we use for all block and network I/O
> from user space ... we don't see huge performance problems running up to
> gigabits.  Even on parisc where we have to flush several times to make
> this happen we can get up to several hundred megabytes per second.

Well, the requirement of the non-streaming mmap is rather a historical
reason.  From the fairly early time, the sound driver provided the
mmap of a whole ring buffer since it's practical and efficient for
real-time sound processes.  But, usually the mmap mode is optional.
If it's not supported, apps should fall back to the normal read/write
mode.

Now, the problem is that we have no way to tell whether the DMA
ring-buffer mmap is available or not on each architecture.  So, my
proposal has basically two meanings:

- clarify which arch / platform supports the coherent DMA mapping
  of a whole ring-buffer
- provide the same API to cover the possible archs / platforms

In that sense, if a few (or all) PA-RISC platforms don't give the
proper coherent mapping that the sound apps require, it's OK.  We
just drop the flag in the driver as "unsupported", then.

So... before going to the detail of PA-RISC implementation, I'd like
to know your opinions: whether applying dma_mmap_coherent() to
possible archs/platforms is a reasonable solution for such a
scenario.

[Why not using the standard map->sync procedure is another level of
 question :)  My goal here is to improve the current (partly broken)
 situation in a minimal effort.  The change of mmap procedure would
 lead to major rewrites of API, and thus has to be discussed more
 deeply.]

Any comments appreciated.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-17 14:13             ` Takashi Iwai
@ 2009-07-17 19:16               ` James Bottomley
  2009-07-19 12:23                 ` Takashi Iwai
  0 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2009-07-17 19:16 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Russell King, linux-arch, Gerhard Pircher, Parisc List

On Fri, 2009-07-17 at 16:13 +0200, Takashi Iwai wrote:
> [Sorry for the late follow-up as I've been absent]
> 
> At Fri, 10 Jul 2009 18:59:23 +0000,
> James Bottomley wrote:
> > 
> > On Fri, 2009-07-10 at 19:39 +0100, Russell King wrote:
> > > On Fri, Jul 10, 2009 at 06:30:50PM +0000, James Bottomley wrote:
> > > > On Fri, 2009-07-10 at 19:16 +0100, Russell King wrote:
> > > > > As far as sound DMA goes, it's not about mailboxes.  It's about a circular
> > > > > buffer which you want the device to DMA from direct to/from the DAC/ADC
> > > > > and have the application write/read data directly to/from that same
> > > > > buffer.
> > > > 
> > > > But that makes it sound like ordinary streaming DMA from a device to
> > > > user space ... that's what the dma_map_xx APIs are already designed to
> > > > handle:  I don't understand why you need coherent memory for this (which
> > > > can be a scarce resource on some platforms).
> > 
> > > The streaming APIs are inefficient for this.  Consider the overhead of
> > > having to writeback and invalidate caches at 200KB/s (which is what
> > > you're requiring ARM to do).  That's far too much CPU overhead.
> > 
> > Um, but streaming APIs are what we use for all block and network I/O
> > from user space ... we don't see huge performance problems running up to
> > gigabits.  Even on parisc where we have to flush several times to make
> > this happen we can get up to several hundred megabytes per second.
> 
> Well, the requirement of the non-streaming mmap is rather a historical
> reason.  From the fairly early time, the sound driver provided the
> mmap of a whole ring buffer since it's practical and efficient for
> real-time sound processes.  But, usually the mmap mode is optional.
> If it's not supported, apps should fall back to the normal read/write
> mode.

But what I don't understand is why you can't treat the ring buffer as
streaming.  Sure, you have a head pointer and a tail pointer, but
everything between head and tail is owned by the device and everything
between tail and head is owned by the kernel, surely?  In that case, you
can manipulate head and tail movements simply via the streaming API?
(when head moves, map the delta to the device; when tail moves, map the
delta to the kernel) and we can simply then use the standard APIs?

> Now, the problem is that we have no way to tell whether the DMA
> ring-buffer mmap is available or not on each architecture.  So, my
> proposal has basically two meanings:
> 
> - clarify which arch / platform supports the coherent DMA mapping
>   of a whole ring-buffer
> - provide the same API to cover the possible archs / platforms
> 
> In that sense, if a few (or all) PA-RISC platforms don't give the
> proper coherent mapping that the sound apps require, it's OK.  We
> just drop the flag in the driver as "unsupported", then.

That would work for us too ... I can see that x86 will have little
problem with this, but I can predict that most other architectures will
have some difficulty.

> So... before going to the detail of PA-RISC implementation, I'd like
> to know your opinions: whether applying dma_mmap_coherent() to
> possible archs/platforms is a reasonable solution for such a
> scenario.

Realistically, the only way to make it work as you want on parisc is to
have a kernel/user congruence (it's an address rule [virtual addresses
are equal modulo the congruence step, which is 4MB]) which would ensure
this would work without flushing ... that's hard for the kernel to make
happen on its own.

> [Why not using the standard map->sync procedure is another level of
>  question :)  My goal here is to improve the current (partly broken)
>  situation in a minimal effort.  The change of mmap procedure would
>  lead to major rewrites of API, and thus has to be discussed more
>  deeply.]

So, yes, there is an equivalent used by the frame buffer which basically
tries to place a dma mapped page into a user address space ... that
might also work in this case.

> Any comments appreciated.

James



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/8] parisc: implement dma_mmap_coherent()
  2009-07-17 19:16               ` James Bottomley
@ 2009-07-19 12:23                 ` Takashi Iwai
  0 siblings, 0 replies; 17+ messages in thread
From: Takashi Iwai @ 2009-07-19 12:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: Russell King, linux-arch, Gerhard Pircher, Parisc List

At Fri, 17 Jul 2009 19:16:49 +0000,
James Bottomley wrote:
> 
> On Fri, 2009-07-17 at 16:13 +0200, Takashi Iwai wrote:
> > [Sorry for the late follow-up as I've been absent]
> > 
> > At Fri, 10 Jul 2009 18:59:23 +0000,
> > James Bottomley wrote:
> > > 
> > > On Fri, 2009-07-10 at 19:39 +0100, Russell King wrote:
> > > > On Fri, Jul 10, 2009 at 06:30:50PM +0000, James Bottomley wrote:
> > > > > On Fri, 2009-07-10 at 19:16 +0100, Russell King wrote:
> > > > > > As far as sound DMA goes, it's not about mailboxes.  It's about a circular
> > > > > > buffer which you want the device to DMA from direct to/from the DAC/ADC
> > > > > > and have the application write/read data directly to/from that same
> > > > > > buffer.
> > > > > 
> > > > > But that makes it sound like ordinary streaming DMA from a device to
> > > > > user space ... that's what the dma_map_xx APIs are already designed to
> > > > > handle:  I don't understand why you need coherent memory for this (which
> > > > > can be a scarce resource on some platforms).
> > > 
> > > > The streaming APIs are inefficient for this.  Consider the overhead of
> > > > having to writeback and invalidate caches at 200KB/s (which is what
> > > > you're requiring ARM to do).  That's far too much CPU overhead.
> > > 
> > > Um, but streaming APIs are what we use for all block and network I/O
> > > from user space ... we don't see huge performance problems running up to
> > > gigabits.  Even on parisc where we have to flush several times to make
> > > this happen we can get up to several hundred megabytes per second.
> > 
> > Well, the requirement of the non-streaming mmap is rather a historical
> > reason.  From the fairly early time, the sound driver provided the
> > mmap of a whole ring buffer since it's practical and efficient for
> > real-time sound processes.  But, usually the mmap mode is optional.
> > If it's not supported, apps should fall back to the normal read/write
> > mode.
> 
> But what I don't understand is why you can't treat the ring buffer as
> streaming.

We could, if we want.  The problem is only that the existing sound
mmap API since over 10 years ago requires the whole ring-buffer to be
exposed, to be accessible without sync operation.

>  Sure, you have a head pointer and a tail pointer, but
> everything between head and tail is owned by the device and everything
> between tail and head is owned by the kernel, surely?  In that case, you
> can manipulate head and tail movements simply via the streaming API?
> (when head moves, map the delta to the device; when tail moves, map the
> delta to the kernel) and we can simply then use the standard APIs?

That's a good thing for the future :)

> > Now, the problem is that we have no way to tell whether the DMA
> > ring-buffer mmap is available or not on each architecture.  So, my
> > proposal has basically two meanings:
> > 
> > - clarify which arch / platform supports the coherent DMA mapping
> >   of a whole ring-buffer
> > - provide the same API to cover the possible archs / platforms
> > 
> > In that sense, if a few (or all) PA-RISC platforms don't give the
> > proper coherent mapping that the sound apps require, it's OK.  We
> > just drop the flag in the driver as "unsupported", then.
> 
> That would work for us too ... I can see that x86 will have little
> problem with this, but I can predict that most other architectures will
> have some difficulty.

Yes, appears like so...

> > So... before going to the detail of PA-RISC implementation, I'd like
> > to know your opinions: whether applying dma_mmap_coherent() to
> > possible archs/platforms is a reasonable solution for such a
> > scenario.
> 
> Realistically, the only way to make it work as you want on parisc is to
> have a kernel/user congruence (it's an address rule [virtual addresses
> are equal modulo the congruence step, which is 4MB]) which would ensure
> this would work without flushing ... that's hard for the kernel to make
> happen on its own.

Hmm.  So, maybe enabling dma_mmap_coherent() for PA-RISC isn't worth,
at least, for the sound device mmap.

> > [Why not using the standard map->sync procedure is another level of
> >  question :)  My goal here is to improve the current (partly broken)
> >  situation in a minimal effort.  The change of mmap procedure would
> >  lead to major rewrites of API, and thus has to be discussed more
> >  deeply.]
> 
> So, yes, there is an equivalent used by the frame buffer which basically
> tries to place a dma mapped page into a user address space ... that
> might also work in this case.

Sorry for ignorance, but doesn't fb-mmap work in a way like the sound
apps expect?  Or does it require any map->sync operation?


thanks,

Takashi

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-07-19 12:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-10 13:10 [PATCH 0/8] Add dma_mmap_coherent() for other archs Takashi Iwai
2009-07-10 13:12 ` [PATCH 1/8] mips: implement dma_mmap_coherent() Takashi Iwai
2009-07-10 13:13 ` [PATCH 2/8] arm: Define ARCH_HAS_DMA_MMAP_COHERENT Takashi Iwai
2009-07-10 13:13 ` [PATCH 3/8] parisc: implement dma_mmap_coherent() Takashi Iwai
2009-07-10 15:11   ` James Bottomley
2009-07-10 18:16     ` Russell King
2009-07-10 18:30       ` James Bottomley
2009-07-10 18:39         ` Russell King
2009-07-10 18:59           ` James Bottomley
2009-07-17 14:13             ` Takashi Iwai
2009-07-17 19:16               ` James Bottomley
2009-07-19 12:23                 ` Takashi Iwai
2009-07-10 13:14 ` [PATCH 4/8] sh: " Takashi Iwai
2009-07-10 13:14 ` [PATCH 5/8] sparc: " Takashi Iwai
2009-07-10 13:15 ` [PATCH 6/8] powerpc: " Takashi Iwai
2009-07-10 13:15 ` [PATCH 7/8] x86: " Takashi Iwai
2009-07-10 13:16 ` [PATCH 8/8] ia64: " Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.