All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-20  7:50 ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

Hello,

This patch series is a continuation of my works on implementing generic
IOMMU support in DMA mapping framework for ARM architecture. Now I
focused on the DMA mapping framework itself. It turned out that adding
support for common dma_map_ops structure was not that hard as I initally
thought. After some modification most of the code fits really well to
the generic dma_map_ops methods.

The only change required to dma_map_ops is a new alloc function. During
the discussion on Linaro Memory Management meeting in Budapest we got
the idea that we can have only one alloc/free/mmap function with
additional attributes argument. This way all different kinds of
architecture specific buffer mappings can be hidden behind the
attributes without the need of creating several versions of dma_alloc_
function. I also noticed that the dma_alloc_noncoherent() function can
be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
will just ignore such attribute.

Another good use case for alloc methods with attributes is the
possibility to allocate buffer without a valid kernel mapping. There are
a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
buffers to user space. Such drivers don't touch the buffer data at all.
For such buffers we can avoid the creation of a mapping in kernel
virtual address space, saving precious vmalloc area. Such buffers might
be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.

All the changes introduced in this patch series are intended to prepare
a good ground for upcoming generic IOMMU integration to DMA mapping
framework on ARM architecture.

For more information about proof-of-concept IOMMU implementation in DMA
mapping framework, please refer to my previous set of patches:
http://www.spinics.net/lists/linux-mm/msg19856.html

I've tried to split the redesign into a set of single-step changes for
easier review and understanding. If there is anything that needs further
clarification, please don't hesitate to ask.

The patches are prepared on top of Linux Kernel v3.0-rc3.

The proposed changes have been tested on Samsung Exynos4 platform. I've
also tested dmabounce code (by manually registering support for DMA
bounce for some of the devices available on my board), although my
hardware have no such strict requirements. Would be great if one could
test my patches on different ARM architectures to check if I didn't
break anything.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



Patch summary:

Marek Szyprowski (8):
  ARM: dma-mapping: remove offset parameter to prepare for generic
    dma_ops
  ARM: dma-mapping: implement dma_map_single on top of dma_map_page
  ARM: dma-mapping: use asm-generic/dma-mapping-common.h
  ARM: dma-mapping: implement dma sg methods on top of generic dma ops
  ARM: dma-mapping: move all dma bounce code to separate dma ops
    structure
  ARM: dma-mapping: remove redundant code and cleanup
  common: dma-mapping: change alloc/free_coherent method to more
    generic alloc/free_attrs
  ARM: dma-mapping: use alloc, mmap, free from dma_ops

 arch/arm/Kconfig                   |    1 +
 arch/arm/common/dmabounce.c        |  112 +++--
 arch/arm/include/asm/device.h      |    1 +
 arch/arm/include/asm/dma-mapping.h |  835 +++++++++++++-----------------------
 arch/arm/mm/dma-mapping.c          |  278 +++++++------
 include/linux/dma-attrs.h          |    1 +
 include/linux/dma-mapping.h        |   13 +-
 7 files changed, 539 insertions(+), 702 deletions(-)
 rewrite arch/arm/include/asm/dma-mapping.h (66%)

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-20  7:50 ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

Hello,

This patch series is a continuation of my works on implementing generic
IOMMU support in DMA mapping framework for ARM architecture. Now I
focused on the DMA mapping framework itself. It turned out that adding
support for common dma_map_ops structure was not that hard as I initally
thought. After some modification most of the code fits really well to
the generic dma_map_ops methods.

The only change required to dma_map_ops is a new alloc function. During
the discussion on Linaro Memory Management meeting in Budapest we got
the idea that we can have only one alloc/free/mmap function with
additional attributes argument. This way all different kinds of
architecture specific buffer mappings can be hidden behind the
attributes without the need of creating several versions of dma_alloc_
function. I also noticed that the dma_alloc_noncoherent() function can
be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
will just ignore such attribute.

Another good use case for alloc methods with attributes is the
possibility to allocate buffer without a valid kernel mapping. There are
a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
buffers to user space. Such drivers don't touch the buffer data at all.
For such buffers we can avoid the creation of a mapping in kernel
virtual address space, saving precious vmalloc area. Such buffers might
be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.

All the changes introduced in this patch series are intended to prepare
a good ground for upcoming generic IOMMU integration to DMA mapping
framework on ARM architecture.

For more information about proof-of-concept IOMMU implementation in DMA
mapping framework, please refer to my previous set of patches:
http://www.spinics.net/lists/linux-mm/msg19856.html

I've tried to split the redesign into a set of single-step changes for
easier review and understanding. If there is anything that needs further
clarification, please don't hesitate to ask.

The patches are prepared on top of Linux Kernel v3.0-rc3.

The proposed changes have been tested on Samsung Exynos4 platform. I've
also tested dmabounce code (by manually registering support for DMA
bounce for some of the devices available on my board), although my
hardware have no such strict requirements. Would be great if one could
test my patches on different ARM architectures to check if I didn't
break anything.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



Patch summary:

Marek Szyprowski (8):
  ARM: dma-mapping: remove offset parameter to prepare for generic
    dma_ops
  ARM: dma-mapping: implement dma_map_single on top of dma_map_page
  ARM: dma-mapping: use asm-generic/dma-mapping-common.h
  ARM: dma-mapping: implement dma sg methods on top of generic dma ops
  ARM: dma-mapping: move all dma bounce code to separate dma ops
    structure
  ARM: dma-mapping: remove redundant code and cleanup
  common: dma-mapping: change alloc/free_coherent method to more
    generic alloc/free_attrs
  ARM: dma-mapping: use alloc, mmap, free from dma_ops

 arch/arm/Kconfig                   |    1 +
 arch/arm/common/dmabounce.c        |  112 +++--
 arch/arm/include/asm/device.h      |    1 +
 arch/arm/include/asm/dma-mapping.h |  835 +++++++++++++-----------------------
 arch/arm/mm/dma-mapping.c          |  278 +++++++------
 include/linux/dma-attrs.h          |    1 +
 include/linux/dma-mapping.h        |   13 +-
 7 files changed, 539 insertions(+), 702 deletions(-)
 rewrite arch/arm/include/asm/dma-mapping.h (66%)

-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-20  7:50 ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

This patch series is a continuation of my works on implementing generic
IOMMU support in DMA mapping framework for ARM architecture. Now I
focused on the DMA mapping framework itself. It turned out that adding
support for common dma_map_ops structure was not that hard as I initally
thought. After some modification most of the code fits really well to
the generic dma_map_ops methods.

The only change required to dma_map_ops is a new alloc function. During
the discussion on Linaro Memory Management meeting in Budapest we got
the idea that we can have only one alloc/free/mmap function with
additional attributes argument. This way all different kinds of
architecture specific buffer mappings can be hidden behind the
attributes without the need of creating several versions of dma_alloc_
function. I also noticed that the dma_alloc_noncoherent() function can
be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
will just ignore such attribute.

Another good use case for alloc methods with attributes is the
possibility to allocate buffer without a valid kernel mapping. There are
a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
buffers to user space. Such drivers don't touch the buffer data at all.
For such buffers we can avoid the creation of a mapping in kernel
virtual address space, saving precious vmalloc area. Such buffers might
be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.

All the changes introduced in this patch series are intended to prepare
a good ground for upcoming generic IOMMU integration to DMA mapping
framework on ARM architecture.

For more information about proof-of-concept IOMMU implementation in DMA
mapping framework, please refer to my previous set of patches:
http://www.spinics.net/lists/linux-mm/msg19856.html

I've tried to split the redesign into a set of single-step changes for
easier review and understanding. If there is anything that needs further
clarification, please don't hesitate to ask.

The patches are prepared on top of Linux Kernel v3.0-rc3.

The proposed changes have been tested on Samsung Exynos4 platform. I've
also tested dmabounce code (by manually registering support for DMA
bounce for some of the devices available on my board), although my
hardware have no such strict requirements. Would be great if one could
test my patches on different ARM architectures to check if I didn't
break anything.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



Patch summary:

Marek Szyprowski (8):
  ARM: dma-mapping: remove offset parameter to prepare for generic
    dma_ops
  ARM: dma-mapping: implement dma_map_single on top of dma_map_page
  ARM: dma-mapping: use asm-generic/dma-mapping-common.h
  ARM: dma-mapping: implement dma sg methods on top of generic dma ops
  ARM: dma-mapping: move all dma bounce code to separate dma ops
    structure
  ARM: dma-mapping: remove redundant code and cleanup
  common: dma-mapping: change alloc/free_coherent method to more
    generic alloc/free_attrs
  ARM: dma-mapping: use alloc, mmap, free from dma_ops

 arch/arm/Kconfig                   |    1 +
 arch/arm/common/dmabounce.c        |  112 +++--
 arch/arm/include/asm/device.h      |    1 +
 arch/arm/include/asm/dma-mapping.h |  835 +++++++++++++-----------------------
 arch/arm/mm/dma-mapping.c          |  278 +++++++------
 include/linux/dma-attrs.h          |    1 +
 include/linux/dma-mapping.h        |   13 +-
 7 files changed, 539 insertions(+), 702 deletions(-)
 rewrite arch/arm/include/asm/dma-mapping.h (66%)

-- 
1.7.1.569.g6f426

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-20  7:50   ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch removes the need for offset parameter in dma bounce
functions. This is required to let dma-mapping framework on ARM
architecture use common, generic dma-mapping helpers.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/common/dmabounce.c        |   13 ++++++--
 arch/arm/include/asm/dma-mapping.h |   63 +++++++++++++++++------------------
 arch/arm/mm/dma-mapping.c          |    4 +-
 3 files changed, 43 insertions(+), 37 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index e568163..f7b330f 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -171,7 +171,8 @@ find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_
 	read_lock_irqsave(&device_info->lock, flags);
 
 	list_for_each_entry(b, &device_info->safe_buffers, node)
-		if (b->safe_dma_addr == safe_dma_addr) {
+		if (b->safe_dma_addr <= safe_dma_addr &&
+		    b->safe_dma_addr + b->size > safe_dma_addr) {
 			rb = b;
 			break;
 		}
@@ -391,9 +392,10 @@ void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
 EXPORT_SYMBOL(__dma_unmap_page);
 
 int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
-		unsigned long off, size_t sz, enum dma_data_direction dir)
+		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
+	unsigned long off;
 
 	dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
 		__func__, addr, off, sz, dir);
@@ -402,6 +404,8 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 	if (!buf)
 		return 1;
 
+	off = addr - buf->safe_dma_addr;
+
 	BUG_ON(buf->direction != dir);
 
 	dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
@@ -420,9 +424,10 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 EXPORT_SYMBOL(dmabounce_sync_for_cpu);
 
 int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
-		unsigned long off, size_t sz, enum dma_data_direction dir)
+		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
+	unsigned long off;
 
 	dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
 		__func__, addr, off, sz, dir);
@@ -431,6 +436,8 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 	if (!buf)
 		return 1;
 
+	off = addr - buf->safe_dma_addr;
+
 	BUG_ON(buf->direction != dir);
 
 	dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 4fff837..ca920aa 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -310,10 +310,8 @@ extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
 /*
  * Private functions
  */
-int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long,
-		size_t, enum dma_data_direction);
-int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long,
-		size_t, enum dma_data_direction);
+int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction);
+int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction);
 #else
 static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
 	unsigned long offset, size_t size, enum dma_data_direction dir)
@@ -454,6 +452,33 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
 	__dma_unmap_page(dev, handle, size, dir);
 }
 
+
+static inline void dma_sync_single_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	BUG_ON(!valid_dma_direction(dir));
+
+	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
+
+	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+}
+
+static inline void dma_sync_single_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	BUG_ON(!valid_dma_direction(dir));
+
+	debug_dma_sync_single_for_device(dev, handle, size, dir);
+
+	if (!dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+}
+
 /**
  * dma_sync_single_range_for_cpu
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -476,40 +501,14 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev,
 		dma_addr_t handle, unsigned long offset, size_t size,
 		enum dma_data_direction dir)
 {
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir);
-
-	if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir))
-		return;
-
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir);
+	dma_sync_single_for_cpu(dev, handle + offset, size, dir);
 }
 
 static inline void dma_sync_single_range_for_device(struct device *dev,
 		dma_addr_t handle, unsigned long offset, size_t size,
 		enum dma_data_direction dir)
 {
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_device(dev, handle + offset, size, dir);
-
-	if (!dmabounce_sync_for_device(dev, handle, offset, size, dir))
-		return;
-
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir);
-}
-
-static inline void dma_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	dma_sync_single_range_for_cpu(dev, handle, 0, size, dir);
-}
-
-static inline void dma_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	dma_sync_single_range_for_device(dev, handle, 0, size, dir);
+	dma_sync_single_for_device(dev, handle + offset, size, dir);
 }
 
 /*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82a093c..c11f234 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -619,7 +619,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), 0,
+		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s),
 					    sg_dma_len(s), dir))
 			continue;
 
@@ -645,7 +645,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_device(dev, sg_dma_address(s), 0,
+		if (!dmabounce_sync_for_device(dev, sg_dma_address(s),
 					sg_dma_len(s), dir))
 			continue;
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch removes the need for offset parameter in dma bounce
functions. This is required to let dma-mapping framework on ARM
architecture use common, generic dma-mapping helpers.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/common/dmabounce.c        |   13 ++++++--
 arch/arm/include/asm/dma-mapping.h |   63 +++++++++++++++++------------------
 arch/arm/mm/dma-mapping.c          |    4 +-
 3 files changed, 43 insertions(+), 37 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index e568163..f7b330f 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -171,7 +171,8 @@ find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_
 	read_lock_irqsave(&device_info->lock, flags);
 
 	list_for_each_entry(b, &device_info->safe_buffers, node)
-		if (b->safe_dma_addr == safe_dma_addr) {
+		if (b->safe_dma_addr <= safe_dma_addr &&
+		    b->safe_dma_addr + b->size > safe_dma_addr) {
 			rb = b;
 			break;
 		}
@@ -391,9 +392,10 @@ void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
 EXPORT_SYMBOL(__dma_unmap_page);
 
 int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
-		unsigned long off, size_t sz, enum dma_data_direction dir)
+		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
+	unsigned long off;
 
 	dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
 		__func__, addr, off, sz, dir);
@@ -402,6 +404,8 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 	if (!buf)
 		return 1;
 
+	off = addr - buf->safe_dma_addr;
+
 	BUG_ON(buf->direction != dir);
 
 	dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
@@ -420,9 +424,10 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 EXPORT_SYMBOL(dmabounce_sync_for_cpu);
 
 int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
-		unsigned long off, size_t sz, enum dma_data_direction dir)
+		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
+	unsigned long off;
 
 	dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
 		__func__, addr, off, sz, dir);
@@ -431,6 +436,8 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 	if (!buf)
 		return 1;
 
+	off = addr - buf->safe_dma_addr;
+
 	BUG_ON(buf->direction != dir);
 
 	dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 4fff837..ca920aa 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -310,10 +310,8 @@ extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
 /*
  * Private functions
  */
-int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long,
-		size_t, enum dma_data_direction);
-int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long,
-		size_t, enum dma_data_direction);
+int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction);
+int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction);
 #else
 static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
 	unsigned long offset, size_t size, enum dma_data_direction dir)
@@ -454,6 +452,33 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
 	__dma_unmap_page(dev, handle, size, dir);
 }
 
+
+static inline void dma_sync_single_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	BUG_ON(!valid_dma_direction(dir));
+
+	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
+
+	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+}
+
+static inline void dma_sync_single_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	BUG_ON(!valid_dma_direction(dir));
+
+	debug_dma_sync_single_for_device(dev, handle, size, dir);
+
+	if (!dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+}
+
 /**
  * dma_sync_single_range_for_cpu
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -476,40 +501,14 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev,
 		dma_addr_t handle, unsigned long offset, size_t size,
 		enum dma_data_direction dir)
 {
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir);
-
-	if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir))
-		return;
-
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir);
+	dma_sync_single_for_cpu(dev, handle + offset, size, dir);
 }
 
 static inline void dma_sync_single_range_for_device(struct device *dev,
 		dma_addr_t handle, unsigned long offset, size_t size,
 		enum dma_data_direction dir)
 {
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_device(dev, handle + offset, size, dir);
-
-	if (!dmabounce_sync_for_device(dev, handle, offset, size, dir))
-		return;
-
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir);
-}
-
-static inline void dma_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	dma_sync_single_range_for_cpu(dev, handle, 0, size, dir);
-}
-
-static inline void dma_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	dma_sync_single_range_for_device(dev, handle, 0, size, dir);
+	dma_sync_single_for_device(dev, handle + offset, size, dir);
 }
 
 /*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82a093c..c11f234 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -619,7 +619,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), 0,
+		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s),
 					    sg_dma_len(s), dir))
 			continue;
 
@@ -645,7 +645,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_device(dev, sg_dma_address(s), 0,
+		if (!dmabounce_sync_for_device(dev, sg_dma_address(s),
 					sg_dma_len(s), dir))
 			continue;
 
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

This patch removes the need for offset parameter in dma bounce
functions. This is required to let dma-mapping framework on ARM
architecture use common, generic dma-mapping helpers.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/common/dmabounce.c        |   13 ++++++--
 arch/arm/include/asm/dma-mapping.h |   63 +++++++++++++++++------------------
 arch/arm/mm/dma-mapping.c          |    4 +-
 3 files changed, 43 insertions(+), 37 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index e568163..f7b330f 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -171,7 +171,8 @@ find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_
 	read_lock_irqsave(&device_info->lock, flags);
 
 	list_for_each_entry(b, &device_info->safe_buffers, node)
-		if (b->safe_dma_addr == safe_dma_addr) {
+		if (b->safe_dma_addr <= safe_dma_addr &&
+		    b->safe_dma_addr + b->size > safe_dma_addr) {
 			rb = b;
 			break;
 		}
@@ -391,9 +392,10 @@ void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
 EXPORT_SYMBOL(__dma_unmap_page);
 
 int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
-		unsigned long off, size_t sz, enum dma_data_direction dir)
+		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
+	unsigned long off;
 
 	dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
 		__func__, addr, off, sz, dir);
@@ -402,6 +404,8 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 	if (!buf)
 		return 1;
 
+	off = addr - buf->safe_dma_addr;
+
 	BUG_ON(buf->direction != dir);
 
 	dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
@@ -420,9 +424,10 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 EXPORT_SYMBOL(dmabounce_sync_for_cpu);
 
 int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
-		unsigned long off, size_t sz, enum dma_data_direction dir)
+		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
+	unsigned long off;
 
 	dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n",
 		__func__, addr, off, sz, dir);
@@ -431,6 +436,8 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 	if (!buf)
 		return 1;
 
+	off = addr - buf->safe_dma_addr;
+
 	BUG_ON(buf->direction != dir);
 
 	dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 4fff837..ca920aa 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -310,10 +310,8 @@ extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
 /*
  * Private functions
  */
-int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long,
-		size_t, enum dma_data_direction);
-int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long,
-		size_t, enum dma_data_direction);
+int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction);
+int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction);
 #else
 static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
 	unsigned long offset, size_t size, enum dma_data_direction dir)
@@ -454,6 +452,33 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
 	__dma_unmap_page(dev, handle, size, dir);
 }
 
+
+static inline void dma_sync_single_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	BUG_ON(!valid_dma_direction(dir));
+
+	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
+
+	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+}
+
+static inline void dma_sync_single_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	BUG_ON(!valid_dma_direction(dir));
+
+	debug_dma_sync_single_for_device(dev, handle, size, dir);
+
+	if (!dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+}
+
 /**
  * dma_sync_single_range_for_cpu
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -476,40 +501,14 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev,
 		dma_addr_t handle, unsigned long offset, size_t size,
 		enum dma_data_direction dir)
 {
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir);
-
-	if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir))
-		return;
-
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir);
+	dma_sync_single_for_cpu(dev, handle + offset, size, dir);
 }
 
 static inline void dma_sync_single_range_for_device(struct device *dev,
 		dma_addr_t handle, unsigned long offset, size_t size,
 		enum dma_data_direction dir)
 {
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_device(dev, handle + offset, size, dir);
-
-	if (!dmabounce_sync_for_device(dev, handle, offset, size, dir))
-		return;
-
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir);
-}
-
-static inline void dma_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	dma_sync_single_range_for_cpu(dev, handle, 0, size, dir);
-}
-
-static inline void dma_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	dma_sync_single_range_for_device(dev, handle, 0, size, dir);
+	dma_sync_single_for_device(dev, handle + offset, size, dir);
 }
 
 /*
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82a093c..c11f234 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -619,7 +619,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), 0,
+		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s),
 					    sg_dma_len(s), dir))
 			continue;
 
@@ -645,7 +645,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_device(dev, sg_dma_address(s), 0,
+		if (!dmabounce_sync_for_device(dev, sg_dma_address(s),
 					sg_dma_len(s), dir))
 			continue;
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-20  7:50   ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch consolidates dma_map_single and dma_map_page calls. This is
required to let dma-mapping framework on ARM architecture use common,
generic dma-mapping helpers.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/common/dmabounce.c        |   28 ----------
 arch/arm/include/asm/dma-mapping.h |  100 +++++++++++++----------------------
 2 files changed, 37 insertions(+), 91 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index f7b330f..9eb161e 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -329,34 +329,6 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
  * substitute the safe buffer for the unsafe one.
  * (basically move the buffer from an unsafe area to a safe one)
  */
-dma_addr_t __dma_map_single(struct device *dev, void *ptr, size_t size,
-		enum dma_data_direction dir)
-{
-	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
-		__func__, ptr, size, dir);
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	return map_single(dev, ptr, size, dir);
-}
-EXPORT_SYMBOL(__dma_map_single);
-
-/*
- * see if a mapped address was really a "safe" buffer and if so, copy
- * the data from the safe buffer back to the unsafe buffer and free up
- * the safe buffer.  (basically return things back to the way they
- * should be)
- */
-void __dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
-		enum dma_data_direction dir)
-{
-	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
-		__func__, (void *) dma_addr, size, dir);
-
-	unmap_single(dev, dma_addr, size, dir);
-}
-EXPORT_SYMBOL(__dma_unmap_single);
-
 dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, enum dma_data_direction dir)
 {
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index ca920aa..799669d 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -298,10 +298,6 @@ extern int dma_needs_bounce(struct device*, dma_addr_t, size_t);
 /*
  * The DMA API, implemented by dmabounce.c.  See below for descriptions.
  */
-extern dma_addr_t __dma_map_single(struct device *, void *, size_t,
-		enum dma_data_direction);
-extern void __dma_unmap_single(struct device *, dma_addr_t, size_t,
-		enum dma_data_direction);
 extern dma_addr_t __dma_map_page(struct device *, struct page *,
 		unsigned long, size_t, enum dma_data_direction);
 extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
@@ -325,14 +321,6 @@ static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr,
 	return 1;
 }
 
-
-static inline dma_addr_t __dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_single_cpu_to_dev(cpu_addr, size, dir);
-	return virt_to_dma(dev, cpu_addr);
-}
-
 static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir)
 {
@@ -340,12 +328,6 @@ static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
 
-static inline void __dma_unmap_single(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
-}
-
 static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
@@ -354,34 +336,6 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 }
 #endif /* CONFIG_DMABOUNCE */
 
-/**
- * dma_map_single - map a single buffer for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @cpu_addr: CPU direct mapped address of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_single() or
- * dma_sync_single_for_cpu().
- */
-static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	dma_addr_t addr;
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	addr = __dma_map_single(dev, cpu_addr, size, dir);
-	debug_dma_map_page(dev, virt_to_page(cpu_addr),
-			(unsigned long)cpu_addr & ~PAGE_MASK, size,
-			dir, addr, true);
-
-	return addr;
-}
 
 /**
  * dma_map_page - map a portion of a page for streaming DMA
@@ -411,48 +365,68 @@ static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
 }
 
 /**
- * dma_unmap_single - unmap a single buffer previously mapped
+ * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_single)
- * @dir: DMA transfer direction (same as passed to dma_map_single)
+ * @size: size of buffer (same as passed to dma_map_page)
+ * @dir: DMA transfer direction (same as passed to dma_map_page)
  *
- * Unmap a single streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_single() call.
+ * Unmap a page streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_page() call.
  * All other usages are undefined.
  *
  * After this call, reads by the CPU to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
+
+static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
-	debug_dma_unmap_page(dev, handle, size, dir, true);
-	__dma_unmap_single(dev, handle, size, dir);
+	debug_dma_unmap_page(dev, handle, size, dir, false);
+	__dma_unmap_page(dev, handle, size, dir);
 }
 
 /**
- * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
+ * dma_map_single - map a single buffer for streaming DMA
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @cpu_addr: CPU direct mapped address of buffer
+ * @size: size of buffer to map
+ * @dir: DMA transfer direction
+ *
+ * Ensure that any data held in the cache is appropriately discarded
+ * or written back.
+ *
+ * The device owns this memory once this call has completed.  The CPU
+ * can regain ownership by calling dma_unmap_single() or
+ * dma_sync_single_for_cpu().
+ */
+static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
+		size_t size, enum dma_data_direction dir)
+{
+	return dma_map_page(dev, virt_to_page(cpu_addr),
+			    (unsigned long)cpu_addr & ~PAGE_MASK, size, dir);
+}
+
+/**
+ * dma_unmap_single - unmap a single buffer previously mapped
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
+ * @size: size of buffer (same as passed to dma_map_single)
+ * @dir: DMA transfer direction (same as passed to dma_map_single)
  *
- * Unmap a page streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_page() call.
+ * Unmap a single streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_single() call.
  * All other usages are undefined.
  *
  * After this call, reads by the CPU to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
+static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
-	debug_dma_unmap_page(dev, handle, size, dir, false);
-	__dma_unmap_page(dev, handle, size, dir);
+	dma_unmap_page(dev, handle, size, dir);
 }
 
-
 static inline void dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch consolidates dma_map_single and dma_map_page calls. This is
required to let dma-mapping framework on ARM architecture use common,
generic dma-mapping helpers.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/common/dmabounce.c        |   28 ----------
 arch/arm/include/asm/dma-mapping.h |  100 +++++++++++++----------------------
 2 files changed, 37 insertions(+), 91 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index f7b330f..9eb161e 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -329,34 +329,6 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
  * substitute the safe buffer for the unsafe one.
  * (basically move the buffer from an unsafe area to a safe one)
  */
-dma_addr_t __dma_map_single(struct device *dev, void *ptr, size_t size,
-		enum dma_data_direction dir)
-{
-	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
-		__func__, ptr, size, dir);
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	return map_single(dev, ptr, size, dir);
-}
-EXPORT_SYMBOL(__dma_map_single);
-
-/*
- * see if a mapped address was really a "safe" buffer and if so, copy
- * the data from the safe buffer back to the unsafe buffer and free up
- * the safe buffer.  (basically return things back to the way they
- * should be)
- */
-void __dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
-		enum dma_data_direction dir)
-{
-	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
-		__func__, (void *) dma_addr, size, dir);
-
-	unmap_single(dev, dma_addr, size, dir);
-}
-EXPORT_SYMBOL(__dma_unmap_single);
-
 dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, enum dma_data_direction dir)
 {
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index ca920aa..799669d 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -298,10 +298,6 @@ extern int dma_needs_bounce(struct device*, dma_addr_t, size_t);
 /*
  * The DMA API, implemented by dmabounce.c.  See below for descriptions.
  */
-extern dma_addr_t __dma_map_single(struct device *, void *, size_t,
-		enum dma_data_direction);
-extern void __dma_unmap_single(struct device *, dma_addr_t, size_t,
-		enum dma_data_direction);
 extern dma_addr_t __dma_map_page(struct device *, struct page *,
 		unsigned long, size_t, enum dma_data_direction);
 extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
@@ -325,14 +321,6 @@ static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr,
 	return 1;
 }
 
-
-static inline dma_addr_t __dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_single_cpu_to_dev(cpu_addr, size, dir);
-	return virt_to_dma(dev, cpu_addr);
-}
-
 static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir)
 {
@@ -340,12 +328,6 @@ static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
 
-static inline void __dma_unmap_single(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
-}
-
 static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
@@ -354,34 +336,6 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 }
 #endif /* CONFIG_DMABOUNCE */
 
-/**
- * dma_map_single - map a single buffer for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @cpu_addr: CPU direct mapped address of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_single() or
- * dma_sync_single_for_cpu().
- */
-static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	dma_addr_t addr;
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	addr = __dma_map_single(dev, cpu_addr, size, dir);
-	debug_dma_map_page(dev, virt_to_page(cpu_addr),
-			(unsigned long)cpu_addr & ~PAGE_MASK, size,
-			dir, addr, true);
-
-	return addr;
-}
 
 /**
  * dma_map_page - map a portion of a page for streaming DMA
@@ -411,48 +365,68 @@ static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
 }
 
 /**
- * dma_unmap_single - unmap a single buffer previously mapped
+ * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_single)
- * @dir: DMA transfer direction (same as passed to dma_map_single)
+ * @size: size of buffer (same as passed to dma_map_page)
+ * @dir: DMA transfer direction (same as passed to dma_map_page)
  *
- * Unmap a single streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_single() call.
+ * Unmap a page streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_page() call.
  * All other usages are undefined.
  *
  * After this call, reads by the CPU to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
+
+static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
-	debug_dma_unmap_page(dev, handle, size, dir, true);
-	__dma_unmap_single(dev, handle, size, dir);
+	debug_dma_unmap_page(dev, handle, size, dir, false);
+	__dma_unmap_page(dev, handle, size, dir);
 }
 
 /**
- * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
+ * dma_map_single - map a single buffer for streaming DMA
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @cpu_addr: CPU direct mapped address of buffer
+ * @size: size of buffer to map
+ * @dir: DMA transfer direction
+ *
+ * Ensure that any data held in the cache is appropriately discarded
+ * or written back.
+ *
+ * The device owns this memory once this call has completed.  The CPU
+ * can regain ownership by calling dma_unmap_single() or
+ * dma_sync_single_for_cpu().
+ */
+static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
+		size_t size, enum dma_data_direction dir)
+{
+	return dma_map_page(dev, virt_to_page(cpu_addr),
+			    (unsigned long)cpu_addr & ~PAGE_MASK, size, dir);
+}
+
+/**
+ * dma_unmap_single - unmap a single buffer previously mapped
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
+ * @size: size of buffer (same as passed to dma_map_single)
+ * @dir: DMA transfer direction (same as passed to dma_map_single)
  *
- * Unmap a page streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_page() call.
+ * Unmap a single streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_single() call.
  * All other usages are undefined.
  *
  * After this call, reads by the CPU to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
+static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
-	debug_dma_unmap_page(dev, handle, size, dir, false);
-	__dma_unmap_page(dev, handle, size, dir);
+	dma_unmap_page(dev, handle, size, dir);
 }
 
-
 static inline void dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

This patch consolidates dma_map_single and dma_map_page calls. This is
required to let dma-mapping framework on ARM architecture use common,
generic dma-mapping helpers.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/common/dmabounce.c        |   28 ----------
 arch/arm/include/asm/dma-mapping.h |  100 +++++++++++++----------------------
 2 files changed, 37 insertions(+), 91 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index f7b330f..9eb161e 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -329,34 +329,6 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
  * substitute the safe buffer for the unsafe one.
  * (basically move the buffer from an unsafe area to a safe one)
  */
-dma_addr_t __dma_map_single(struct device *dev, void *ptr, size_t size,
-		enum dma_data_direction dir)
-{
-	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
-		__func__, ptr, size, dir);
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	return map_single(dev, ptr, size, dir);
-}
-EXPORT_SYMBOL(__dma_map_single);
-
-/*
- * see if a mapped address was really a "safe" buffer and if so, copy
- * the data from the safe buffer back to the unsafe buffer and free up
- * the safe buffer.  (basically return things back to the way they
- * should be)
- */
-void __dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
-		enum dma_data_direction dir)
-{
-	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
-		__func__, (void *) dma_addr, size, dir);
-
-	unmap_single(dev, dma_addr, size, dir);
-}
-EXPORT_SYMBOL(__dma_unmap_single);
-
 dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, enum dma_data_direction dir)
 {
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index ca920aa..799669d 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -298,10 +298,6 @@ extern int dma_needs_bounce(struct device*, dma_addr_t, size_t);
 /*
  * The DMA API, implemented by dmabounce.c.  See below for descriptions.
  */
-extern dma_addr_t __dma_map_single(struct device *, void *, size_t,
-		enum dma_data_direction);
-extern void __dma_unmap_single(struct device *, dma_addr_t, size_t,
-		enum dma_data_direction);
 extern dma_addr_t __dma_map_page(struct device *, struct page *,
 		unsigned long, size_t, enum dma_data_direction);
 extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
@@ -325,14 +321,6 @@ static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr,
 	return 1;
 }
 
-
-static inline dma_addr_t __dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_single_cpu_to_dev(cpu_addr, size, dir);
-	return virt_to_dma(dev, cpu_addr);
-}
-
 static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir)
 {
@@ -340,12 +328,6 @@ static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
 
-static inline void __dma_unmap_single(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
-}
-
 static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
@@ -354,34 +336,6 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 }
 #endif /* CONFIG_DMABOUNCE */
 
-/**
- * dma_map_single - map a single buffer for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @cpu_addr: CPU direct mapped address of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_single() or
- * dma_sync_single_for_cpu().
- */
-static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	dma_addr_t addr;
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	addr = __dma_map_single(dev, cpu_addr, size, dir);
-	debug_dma_map_page(dev, virt_to_page(cpu_addr),
-			(unsigned long)cpu_addr & ~PAGE_MASK, size,
-			dir, addr, true);
-
-	return addr;
-}
 
 /**
  * dma_map_page - map a portion of a page for streaming DMA
@@ -411,48 +365,68 @@ static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
 }
 
 /**
- * dma_unmap_single - unmap a single buffer previously mapped
+ * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_single)
- * @dir: DMA transfer direction (same as passed to dma_map_single)
+ * @size: size of buffer (same as passed to dma_map_page)
+ * @dir: DMA transfer direction (same as passed to dma_map_page)
  *
- * Unmap a single streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_single() call.
+ * Unmap a page streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_page() call.
  * All other usages are undefined.
  *
  * After this call, reads by the CPU to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
+
+static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
-	debug_dma_unmap_page(dev, handle, size, dir, true);
-	__dma_unmap_single(dev, handle, size, dir);
+	debug_dma_unmap_page(dev, handle, size, dir, false);
+	__dma_unmap_page(dev, handle, size, dir);
 }
 
 /**
- * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
+ * dma_map_single - map a single buffer for streaming DMA
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @cpu_addr: CPU direct mapped address of buffer
+ * @size: size of buffer to map
+ * @dir: DMA transfer direction
+ *
+ * Ensure that any data held in the cache is appropriately discarded
+ * or written back.
+ *
+ * The device owns this memory once this call has completed.  The CPU
+ * can regain ownership by calling dma_unmap_single() or
+ * dma_sync_single_for_cpu().
+ */
+static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
+		size_t size, enum dma_data_direction dir)
+{
+	return dma_map_page(dev, virt_to_page(cpu_addr),
+			    (unsigned long)cpu_addr & ~PAGE_MASK, size, dir);
+}
+
+/**
+ * dma_unmap_single - unmap a single buffer previously mapped
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
+ * @size: size of buffer (same as passed to dma_map_single)
+ * @dir: DMA transfer direction (same as passed to dma_map_single)
  *
- * Unmap a page streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_page() call.
+ * Unmap a single streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_single() call.
  * All other usages are undefined.
  *
  * After this call, reads by the CPU to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
+static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir)
 {
-	debug_dma_unmap_page(dev, handle, size, dir, false);
-	__dma_unmap_page(dev, handle, size, dir);
+	dma_unmap_page(dev, handle, size, dir);
 }
 
-
 static inline void dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-20  7:50   ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch modifies dma-mapping implementation on ARM architecture to
use common dma_map_ops structure and asm-generic/dma-mapping-common.h
helpers.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    1 +
 arch/arm/include/asm/dma-mapping.h |  201 +++++-------------------------------
 arch/arm/mm/dma-mapping.c          |  117 +++++++++++++++++----
 4 files changed, 127 insertions(+), 193 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9adc278..0b834c1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_ATTRS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..d3b35d8 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -7,6 +7,7 @@
 #define ASMARM_DEVICE_H
 
 struct dev_archdata {
+	struct dma_map_ops	*dma_ops;
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 799669d..f4e4968 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -10,6 +10,27 @@
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
 
+extern struct dma_map_ops dma_ops;
+
+static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+{
+	if (dev->archdata.dma_ops)
+		return dev->archdata.dma_ops;
+	return &dma_ops;
+}
+
+static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+{
+	dev->archdata.dma_ops = ops;
+}
+
+#include <asm-generic/dma-mapping-common.h>
+
+static inline int dma_set_mask(struct device *dev, u64 mask)
+{
+	return get_dma_ops(dev)->set_dma_mask(dev, mask);
+}
+
 #ifdef __arch_page_to_dma
 #error Please update to __arch_pfn_to_dma
 #endif
@@ -131,24 +152,6 @@ static inline int dma_supported(struct device *dev, u64 mask)
 	return 1;
 }
 
-static inline int dma_set_mask(struct device *dev, u64 dma_mask)
-{
-#ifdef CONFIG_DMABOUNCE
-	if (dev->archdata.dmabounce) {
-		if (dma_mask >= ISA_DMA_THRESHOLD)
-			return 0;
-		else
-			return -EIO;
-	}
-#endif
-	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-		return -EIO;
-
-	*dev->dma_mask = dma_mask;
-
-	return 0;
-}
-
 /*
  * DMA errors are defined by all-bits-set in the DMA address.
  */
@@ -336,167 +339,17 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 }
 #endif /* CONFIG_DMABOUNCE */
 
-
-/**
- * dma_map_page - map a portion of a page for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_page().
- */
-static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	dma_addr_t addr;
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	addr = __dma_map_page(dev, page, offset, size, dir);
-	debug_dma_map_page(dev, page, offset, size, dir, addr, false);
-
-	return addr;
-}
-
-/**
- * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * Unmap a page streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_page() call.
- * All other usages are undefined.
- *
- * After this call, reads by the CPU to the buffer are guaranteed to see
- * whatever the device wrote there.
- */
-
-static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	debug_dma_unmap_page(dev, handle, size, dir, false);
-	__dma_unmap_page(dev, handle, size, dir);
-}
-
-/**
- * dma_map_single - map a single buffer for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @cpu_addr: CPU direct mapped address of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_single() or
- * dma_sync_single_for_cpu().
- */
-static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	return dma_map_page(dev, virt_to_page(cpu_addr),
-			    (unsigned long)cpu_addr & ~PAGE_MASK, size, dir);
-}
-
-/**
- * dma_unmap_single - unmap a single buffer previously mapped
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_single)
- * @dir: DMA transfer direction (same as passed to dma_map_single)
- *
- * Unmap a single streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_single() call.
- * All other usages are undefined.
- *
- * After this call, reads by the CPU to the buffer are guaranteed to see
- * whatever the device wrote there.
- */
-static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	dma_unmap_page(dev, handle, size, dir);
-}
-
-static inline void dma_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
-
-	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
-		return;
-
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
-}
-
-static inline void dma_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_device(dev, handle, size, dir);
-
-	if (!dmabounce_sync_for_device(dev, handle, size, dir))
-		return;
-
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
-}
-
-/**
- * dma_sync_single_range_for_cpu
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @offset: offset of region to start sync
- * @size: size of region to sync
- * @dir: DMA transfer direction (same as passed to dma_map_single)
- *
- * Make physical memory consistent for a single streaming mode DMA
- * translation after a transfer.
- *
- * If you perform a dma_map_single() but wish to interrogate the
- * buffer using the cpu, yet do not wish to teardown the PCI dma
- * mapping, you must call this function before doing so.  At the
- * next point you give the PCI dma address back to the card, you
- * must first the perform a dma_sync_for_device, and then the
- * device again owns the buffer.
- */
-static inline void dma_sync_single_range_for_cpu(struct device *dev,
-		dma_addr_t handle, unsigned long offset, size_t size,
-		enum dma_data_direction dir)
-{
-	dma_sync_single_for_cpu(dev, handle + offset, size, dir);
-}
-
-static inline void dma_sync_single_range_for_device(struct device *dev,
-		dma_addr_t handle, unsigned long offset, size_t size,
-		enum dma_data_direction dir)
-{
-	dma_sync_single_for_device(dev, handle + offset, size, dir);
-}
-
 /*
  * The scatter list versions of the above methods.
  */
-extern int dma_map_sg(struct device *, struct scatterlist *, int,
-		enum dma_data_direction);
-extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
+extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
+		enum dma_data_direction, struct dma_attrs *attrs);
+extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
+		enum dma_data_direction, struct dma_attrs *attrs);
+extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
+extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
-		enum dma_data_direction);
-
 
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c11f234..5264552 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -25,6 +25,98 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 
+/**
+ * dma_map_page - map a portion of a page for streaming DMA
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @page: page that buffer resides in
+ * @offset: offset into page for start of buffer
+ * @size: size of buffer to map
+ * @dir: DMA transfer direction
+ *
+ * Ensure that any data held in the cache is appropriately discarded
+ * or written back.
+ *
+ * The device owns this memory once this call has completed.  The CPU
+ * can regain ownership by calling dma_unmap_page().
+ */
+static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
+	     unsigned long offset, size_t size, enum dma_data_direction dir,
+	     struct dma_attrs *attrs)
+{
+	return __dma_map_page(dev, page, offset, size, dir);
+}
+
+/**
+ * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @handle: DMA address of buffer
+ * @size: size of buffer (same as passed to dma_map_page)
+ * @dir: DMA transfer direction (same as passed to dma_map_page)
+ *
+ * Unmap a page streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_page() call.
+ * All other usages are undefined.
+ *
+ * After this call, reads by the CPU to the buffer are guaranteed to see
+ * whatever the device wrote there.
+ */
+
+static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir,
+		struct dma_attrs *attrs)
+{
+	__dma_unmap_page(dev, handle, size, dir);
+}
+
+static inline void arm_dma_sync_single_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+}
+
+static inline void arm_dma_sync_single_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+}
+
+static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
+{
+#ifdef CONFIG_DMABOUNCE
+	if (dev->archdata.dmabounce) {
+		if (dma_mask >= ISA_DMA_THRESHOLD)
+			return 0;
+		else
+			return -EIO;
+	}
+#endif
+	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+		return -EIO;
+
+	*dev->dma_mask = dma_mask;
+
+	return 0;
+}
+
+struct dma_map_ops dma_ops = {
+	.map_page		= arm_dma_map_page,
+	.unmap_page		= arm_dma_unmap_page,
+	.map_sg			= arm_dma_map_sg,
+	.unmap_sg		= arm_dma_unmap_sg,
+	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
+	.sync_single_for_device	= arm_dma_sync_single_for_device,
+	.sync_sg_for_cpu	= arm_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= arm_dma_sync_sg_for_device,
+	.set_dma_mask		= arm_dma_set_mask,
+};
+EXPORT_SYMBOL(dma_ops);
+
 static u64 get_coherent_dma_mask(struct device *dev)
 {
 	u64 mask = ISA_DMA_THRESHOLD;
@@ -558,21 +650,18 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu);
  * Device ownership issues as mentioned for dma_map_single are the same
  * here.
  */
-int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
-		enum dma_data_direction dir)
+int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct scatterlist *s;
 	int i, j;
 
-	BUG_ON(!valid_dma_direction(dir));
-
 	for_each_sg(sg, s, nents, i) {
 		s->dma_address = __dma_map_page(dev, sg_page(s), s->offset,
 						s->length, dir);
 		if (dma_mapping_error(dev, s->dma_address))
 			goto bad_mapping;
 	}
-	debug_dma_map_sg(dev, sg, nents, nents, dir);
 	return nents;
 
  bad_mapping:
@@ -580,7 +669,6 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
 	return 0;
 }
-EXPORT_SYMBOL(dma_map_sg);
 
 /**
  * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
@@ -592,18 +680,15 @@ EXPORT_SYMBOL(dma_map_sg);
  * Unmap a set of streaming mode DMA translations.  Again, CPU access
  * rules concerning calls here are the same as for dma_unmap_single().
  */
-void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
-		enum dma_data_direction dir)
+void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct scatterlist *s;
 	int i;
 
-	debug_dma_unmap_sg(dev, sg, nents, dir);
-
 	for_each_sg(sg, s, nents, i)
 		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
 }
-EXPORT_SYMBOL(dma_unmap_sg);
 
 /**
  * dma_sync_sg_for_cpu
@@ -612,7 +697,7 @@ EXPORT_SYMBOL(dma_unmap_sg);
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *s;
@@ -626,10 +711,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 		__dma_page_dev_to_cpu(sg_page(s), s->offset,
 				      s->length, dir);
 	}
-
-	debug_dma_sync_sg_for_cpu(dev, sg, nents, dir);
 }
-EXPORT_SYMBOL(dma_sync_sg_for_cpu);
 
 /**
  * dma_sync_sg_for_device
@@ -638,7 +720,7 @@ EXPORT_SYMBOL(dma_sync_sg_for_cpu);
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *s;
@@ -652,10 +734,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 		__dma_page_cpu_to_dev(sg_page(s), s->offset,
 				      s->length, dir);
 	}
-
-	debug_dma_sync_sg_for_device(dev, sg, nents, dir);
 }
-EXPORT_SYMBOL(dma_sync_sg_for_device);
 
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch modifies dma-mapping implementation on ARM architecture to
use common dma_map_ops structure and asm-generic/dma-mapping-common.h
helpers.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    1 +
 arch/arm/include/asm/dma-mapping.h |  201 +++++-------------------------------
 arch/arm/mm/dma-mapping.c          |  117 +++++++++++++++++----
 4 files changed, 127 insertions(+), 193 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9adc278..0b834c1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_ATTRS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..d3b35d8 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -7,6 +7,7 @@
 #define ASMARM_DEVICE_H
 
 struct dev_archdata {
+	struct dma_map_ops	*dma_ops;
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 799669d..f4e4968 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -10,6 +10,27 @@
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
 
+extern struct dma_map_ops dma_ops;
+
+static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+{
+	if (dev->archdata.dma_ops)
+		return dev->archdata.dma_ops;
+	return &dma_ops;
+}
+
+static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+{
+	dev->archdata.dma_ops = ops;
+}
+
+#include <asm-generic/dma-mapping-common.h>
+
+static inline int dma_set_mask(struct device *dev, u64 mask)
+{
+	return get_dma_ops(dev)->set_dma_mask(dev, mask);
+}
+
 #ifdef __arch_page_to_dma
 #error Please update to __arch_pfn_to_dma
 #endif
@@ -131,24 +152,6 @@ static inline int dma_supported(struct device *dev, u64 mask)
 	return 1;
 }
 
-static inline int dma_set_mask(struct device *dev, u64 dma_mask)
-{
-#ifdef CONFIG_DMABOUNCE
-	if (dev->archdata.dmabounce) {
-		if (dma_mask >= ISA_DMA_THRESHOLD)
-			return 0;
-		else
-			return -EIO;
-	}
-#endif
-	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-		return -EIO;
-
-	*dev->dma_mask = dma_mask;
-
-	return 0;
-}
-
 /*
  * DMA errors are defined by all-bits-set in the DMA address.
  */
@@ -336,167 +339,17 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 }
 #endif /* CONFIG_DMABOUNCE */
 
-
-/**
- * dma_map_page - map a portion of a page for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_page().
- */
-static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	dma_addr_t addr;
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	addr = __dma_map_page(dev, page, offset, size, dir);
-	debug_dma_map_page(dev, page, offset, size, dir, addr, false);
-
-	return addr;
-}
-
-/**
- * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * Unmap a page streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_page() call.
- * All other usages are undefined.
- *
- * After this call, reads by the CPU to the buffer are guaranteed to see
- * whatever the device wrote there.
- */
-
-static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	debug_dma_unmap_page(dev, handle, size, dir, false);
-	__dma_unmap_page(dev, handle, size, dir);
-}
-
-/**
- * dma_map_single - map a single buffer for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @cpu_addr: CPU direct mapped address of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_single() or
- * dma_sync_single_for_cpu().
- */
-static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	return dma_map_page(dev, virt_to_page(cpu_addr),
-			    (unsigned long)cpu_addr & ~PAGE_MASK, size, dir);
-}
-
-/**
- * dma_unmap_single - unmap a single buffer previously mapped
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_single)
- * @dir: DMA transfer direction (same as passed to dma_map_single)
- *
- * Unmap a single streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_single() call.
- * All other usages are undefined.
- *
- * After this call, reads by the CPU to the buffer are guaranteed to see
- * whatever the device wrote there.
- */
-static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	dma_unmap_page(dev, handle, size, dir);
-}
-
-static inline void dma_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
-
-	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
-		return;
-
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
-}
-
-static inline void dma_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_device(dev, handle, size, dir);
-
-	if (!dmabounce_sync_for_device(dev, handle, size, dir))
-		return;
-
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
-}
-
-/**
- * dma_sync_single_range_for_cpu
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @offset: offset of region to start sync
- * @size: size of region to sync
- * @dir: DMA transfer direction (same as passed to dma_map_single)
- *
- * Make physical memory consistent for a single streaming mode DMA
- * translation after a transfer.
- *
- * If you perform a dma_map_single() but wish to interrogate the
- * buffer using the cpu, yet do not wish to teardown the PCI dma
- * mapping, you must call this function before doing so.  At the
- * next point you give the PCI dma address back to the card, you
- * must first the perform a dma_sync_for_device, and then the
- * device again owns the buffer.
- */
-static inline void dma_sync_single_range_for_cpu(struct device *dev,
-		dma_addr_t handle, unsigned long offset, size_t size,
-		enum dma_data_direction dir)
-{
-	dma_sync_single_for_cpu(dev, handle + offset, size, dir);
-}
-
-static inline void dma_sync_single_range_for_device(struct device *dev,
-		dma_addr_t handle, unsigned long offset, size_t size,
-		enum dma_data_direction dir)
-{
-	dma_sync_single_for_device(dev, handle + offset, size, dir);
-}
-
 /*
  * The scatter list versions of the above methods.
  */
-extern int dma_map_sg(struct device *, struct scatterlist *, int,
-		enum dma_data_direction);
-extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
+extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
+		enum dma_data_direction, struct dma_attrs *attrs);
+extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
+		enum dma_data_direction, struct dma_attrs *attrs);
+extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
+extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
-		enum dma_data_direction);
-
 
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c11f234..5264552 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -25,6 +25,98 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 
+/**
+ * dma_map_page - map a portion of a page for streaming DMA
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @page: page that buffer resides in
+ * @offset: offset into page for start of buffer
+ * @size: size of buffer to map
+ * @dir: DMA transfer direction
+ *
+ * Ensure that any data held in the cache is appropriately discarded
+ * or written back.
+ *
+ * The device owns this memory once this call has completed.  The CPU
+ * can regain ownership by calling dma_unmap_page().
+ */
+static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
+	     unsigned long offset, size_t size, enum dma_data_direction dir,
+	     struct dma_attrs *attrs)
+{
+	return __dma_map_page(dev, page, offset, size, dir);
+}
+
+/**
+ * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @handle: DMA address of buffer
+ * @size: size of buffer (same as passed to dma_map_page)
+ * @dir: DMA transfer direction (same as passed to dma_map_page)
+ *
+ * Unmap a page streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_page() call.
+ * All other usages are undefined.
+ *
+ * After this call, reads by the CPU to the buffer are guaranteed to see
+ * whatever the device wrote there.
+ */
+
+static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir,
+		struct dma_attrs *attrs)
+{
+	__dma_unmap_page(dev, handle, size, dir);
+}
+
+static inline void arm_dma_sync_single_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+}
+
+static inline void arm_dma_sync_single_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+}
+
+static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
+{
+#ifdef CONFIG_DMABOUNCE
+	if (dev->archdata.dmabounce) {
+		if (dma_mask >= ISA_DMA_THRESHOLD)
+			return 0;
+		else
+			return -EIO;
+	}
+#endif
+	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+		return -EIO;
+
+	*dev->dma_mask = dma_mask;
+
+	return 0;
+}
+
+struct dma_map_ops dma_ops = {
+	.map_page		= arm_dma_map_page,
+	.unmap_page		= arm_dma_unmap_page,
+	.map_sg			= arm_dma_map_sg,
+	.unmap_sg		= arm_dma_unmap_sg,
+	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
+	.sync_single_for_device	= arm_dma_sync_single_for_device,
+	.sync_sg_for_cpu	= arm_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= arm_dma_sync_sg_for_device,
+	.set_dma_mask		= arm_dma_set_mask,
+};
+EXPORT_SYMBOL(dma_ops);
+
 static u64 get_coherent_dma_mask(struct device *dev)
 {
 	u64 mask = ISA_DMA_THRESHOLD;
@@ -558,21 +650,18 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu);
  * Device ownership issues as mentioned for dma_map_single are the same
  * here.
  */
-int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
-		enum dma_data_direction dir)
+int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct scatterlist *s;
 	int i, j;
 
-	BUG_ON(!valid_dma_direction(dir));
-
 	for_each_sg(sg, s, nents, i) {
 		s->dma_address = __dma_map_page(dev, sg_page(s), s->offset,
 						s->length, dir);
 		if (dma_mapping_error(dev, s->dma_address))
 			goto bad_mapping;
 	}
-	debug_dma_map_sg(dev, sg, nents, nents, dir);
 	return nents;
 
  bad_mapping:
@@ -580,7 +669,6 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
 	return 0;
 }
-EXPORT_SYMBOL(dma_map_sg);
 
 /**
  * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
@@ -592,18 +680,15 @@ EXPORT_SYMBOL(dma_map_sg);
  * Unmap a set of streaming mode DMA translations.  Again, CPU access
  * rules concerning calls here are the same as for dma_unmap_single().
  */
-void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
-		enum dma_data_direction dir)
+void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct scatterlist *s;
 	int i;
 
-	debug_dma_unmap_sg(dev, sg, nents, dir);
-
 	for_each_sg(sg, s, nents, i)
 		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
 }
-EXPORT_SYMBOL(dma_unmap_sg);
 
 /**
  * dma_sync_sg_for_cpu
@@ -612,7 +697,7 @@ EXPORT_SYMBOL(dma_unmap_sg);
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *s;
@@ -626,10 +711,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 		__dma_page_dev_to_cpu(sg_page(s), s->offset,
 				      s->length, dir);
 	}
-
-	debug_dma_sync_sg_for_cpu(dev, sg, nents, dir);
 }
-EXPORT_SYMBOL(dma_sync_sg_for_cpu);
 
 /**
  * dma_sync_sg_for_device
@@ -638,7 +720,7 @@ EXPORT_SYMBOL(dma_sync_sg_for_cpu);
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *s;
@@ -652,10 +734,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 		__dma_page_cpu_to_dev(sg_page(s), s->offset,
 				      s->length, dir);
 	}
-
-	debug_dma_sync_sg_for_device(dev, sg, nents, dir);
 }
-EXPORT_SYMBOL(dma_sync_sg_for_device);
 
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
 
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

This patch modifies dma-mapping implementation on ARM architecture to
use common dma_map_ops structure and asm-generic/dma-mapping-common.h
helpers.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/Kconfig                   |    1 +
 arch/arm/include/asm/device.h      |    1 +
 arch/arm/include/asm/dma-mapping.h |  201 +++++-------------------------------
 arch/arm/mm/dma-mapping.c          |  117 +++++++++++++++++----
 4 files changed, 127 insertions(+), 193 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9adc278..0b834c1 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
+	select HAVE_DMA_ATTRS
 	select HAVE_IDE
 	select HAVE_MEMBLOCK
 	select RTC_LIB
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 9f390ce..d3b35d8 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -7,6 +7,7 @@
 #define ASMARM_DEVICE_H
 
 struct dev_archdata {
+	struct dma_map_ops	*dma_ops;
 #ifdef CONFIG_DMABOUNCE
 	struct dmabounce_device_info *dmabounce;
 #endif
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 799669d..f4e4968 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -10,6 +10,27 @@
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
 
+extern struct dma_map_ops dma_ops;
+
+static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+{
+	if (dev->archdata.dma_ops)
+		return dev->archdata.dma_ops;
+	return &dma_ops;
+}
+
+static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+{
+	dev->archdata.dma_ops = ops;
+}
+
+#include <asm-generic/dma-mapping-common.h>
+
+static inline int dma_set_mask(struct device *dev, u64 mask)
+{
+	return get_dma_ops(dev)->set_dma_mask(dev, mask);
+}
+
 #ifdef __arch_page_to_dma
 #error Please update to __arch_pfn_to_dma
 #endif
@@ -131,24 +152,6 @@ static inline int dma_supported(struct device *dev, u64 mask)
 	return 1;
 }
 
-static inline int dma_set_mask(struct device *dev, u64 dma_mask)
-{
-#ifdef CONFIG_DMABOUNCE
-	if (dev->archdata.dmabounce) {
-		if (dma_mask >= ISA_DMA_THRESHOLD)
-			return 0;
-		else
-			return -EIO;
-	}
-#endif
-	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-		return -EIO;
-
-	*dev->dma_mask = dma_mask;
-
-	return 0;
-}
-
 /*
  * DMA errors are defined by all-bits-set in the DMA address.
  */
@@ -336,167 +339,17 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 }
 #endif /* CONFIG_DMABOUNCE */
 
-
-/**
- * dma_map_page - map a portion of a page for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_page().
- */
-static inline dma_addr_t dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	dma_addr_t addr;
-
-	BUG_ON(!valid_dma_direction(dir));
-
-	addr = __dma_map_page(dev, page, offset, size, dir);
-	debug_dma_map_page(dev, page, offset, size, dir, addr, false);
-
-	return addr;
-}
-
-/**
- * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * Unmap a page streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_page() call.
- * All other usages are undefined.
- *
- * After this call, reads by the CPU to the buffer are guaranteed to see
- * whatever the device wrote there.
- */
-
-static inline void dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	debug_dma_unmap_page(dev, handle, size, dir, false);
-	__dma_unmap_page(dev, handle, size, dir);
-}
-
-/**
- * dma_map_single - map a single buffer for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @cpu_addr: CPU direct mapped address of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_single() or
- * dma_sync_single_for_cpu().
- */
-static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
-		size_t size, enum dma_data_direction dir)
-{
-	return dma_map_page(dev, virt_to_page(cpu_addr),
-			    (unsigned long)cpu_addr & ~PAGE_MASK, size, dir);
-}
-
-/**
- * dma_unmap_single - unmap a single buffer previously mapped
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_single)
- * @dir: DMA transfer direction (same as passed to dma_map_single)
- *
- * Unmap a single streaming mode DMA translation.  The handle and size
- * must match what was provided in the previous dma_map_single() call.
- * All other usages are undefined.
- *
- * After this call, reads by the CPU to the buffer are guaranteed to see
- * whatever the device wrote there.
- */
-static inline void dma_unmap_single(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	dma_unmap_page(dev, handle, size, dir);
-}
-
-static inline void dma_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
-
-	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
-		return;
-
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
-}
-
-static inline void dma_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	BUG_ON(!valid_dma_direction(dir));
-
-	debug_dma_sync_single_for_device(dev, handle, size, dir);
-
-	if (!dmabounce_sync_for_device(dev, handle, size, dir))
-		return;
-
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
-}
-
-/**
- * dma_sync_single_range_for_cpu
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @handle: DMA address of buffer
- * @offset: offset of region to start sync
- * @size: size of region to sync
- * @dir: DMA transfer direction (same as passed to dma_map_single)
- *
- * Make physical memory consistent for a single streaming mode DMA
- * translation after a transfer.
- *
- * If you perform a dma_map_single() but wish to interrogate the
- * buffer using the cpu, yet do not wish to teardown the PCI dma
- * mapping, you must call this function before doing so.  At the
- * next point you give the PCI dma address back to the card, you
- * must first the perform a dma_sync_for_device, and then the
- * device again owns the buffer.
- */
-static inline void dma_sync_single_range_for_cpu(struct device *dev,
-		dma_addr_t handle, unsigned long offset, size_t size,
-		enum dma_data_direction dir)
-{
-	dma_sync_single_for_cpu(dev, handle + offset, size, dir);
-}
-
-static inline void dma_sync_single_range_for_device(struct device *dev,
-		dma_addr_t handle, unsigned long offset, size_t size,
-		enum dma_data_direction dir)
-{
-	dma_sync_single_for_device(dev, handle + offset, size, dir);
-}
-
 /*
  * The scatter list versions of the above methods.
  */
-extern int dma_map_sg(struct device *, struct scatterlist *, int,
-		enum dma_data_direction);
-extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
+extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
+		enum dma_data_direction, struct dma_attrs *attrs);
+extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
+		enum dma_data_direction, struct dma_attrs *attrs);
+extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
+extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
-		enum dma_data_direction);
-
 
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c11f234..5264552 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -25,6 +25,98 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 
+/**
+ * dma_map_page - map a portion of a page for streaming DMA
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @page: page that buffer resides in
+ * @offset: offset into page for start of buffer
+ * @size: size of buffer to map
+ * @dir: DMA transfer direction
+ *
+ * Ensure that any data held in the cache is appropriately discarded
+ * or written back.
+ *
+ * The device owns this memory once this call has completed.  The CPU
+ * can regain ownership by calling dma_unmap_page().
+ */
+static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
+	     unsigned long offset, size_t size, enum dma_data_direction dir,
+	     struct dma_attrs *attrs)
+{
+	return __dma_map_page(dev, page, offset, size, dir);
+}
+
+/**
+ * dma_unmap_page - unmap a buffer previously mapped through dma_map_page()
+ * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
+ * @handle: DMA address of buffer
+ * @size: size of buffer (same as passed to dma_map_page)
+ * @dir: DMA transfer direction (same as passed to dma_map_page)
+ *
+ * Unmap a page streaming mode DMA translation.  The handle and size
+ * must match what was provided in the previous dma_map_page() call.
+ * All other usages are undefined.
+ *
+ * After this call, reads by the CPU to the buffer are guaranteed to see
+ * whatever the device wrote there.
+ */
+
+static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir,
+		struct dma_attrs *attrs)
+{
+	__dma_unmap_page(dev, handle, size, dir);
+}
+
+static inline void arm_dma_sync_single_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+}
+
+static inline void arm_dma_sync_single_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+}
+
+static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
+{
+#ifdef CONFIG_DMABOUNCE
+	if (dev->archdata.dmabounce) {
+		if (dma_mask >= ISA_DMA_THRESHOLD)
+			return 0;
+		else
+			return -EIO;
+	}
+#endif
+	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+		return -EIO;
+
+	*dev->dma_mask = dma_mask;
+
+	return 0;
+}
+
+struct dma_map_ops dma_ops = {
+	.map_page		= arm_dma_map_page,
+	.unmap_page		= arm_dma_unmap_page,
+	.map_sg			= arm_dma_map_sg,
+	.unmap_sg		= arm_dma_unmap_sg,
+	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
+	.sync_single_for_device	= arm_dma_sync_single_for_device,
+	.sync_sg_for_cpu	= arm_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= arm_dma_sync_sg_for_device,
+	.set_dma_mask		= arm_dma_set_mask,
+};
+EXPORT_SYMBOL(dma_ops);
+
 static u64 get_coherent_dma_mask(struct device *dev)
 {
 	u64 mask = ISA_DMA_THRESHOLD;
@@ -558,21 +650,18 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu);
  * Device ownership issues as mentioned for dma_map_single are the same
  * here.
  */
-int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
-		enum dma_data_direction dir)
+int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct scatterlist *s;
 	int i, j;
 
-	BUG_ON(!valid_dma_direction(dir));
-
 	for_each_sg(sg, s, nents, i) {
 		s->dma_address = __dma_map_page(dev, sg_page(s), s->offset,
 						s->length, dir);
 		if (dma_mapping_error(dev, s->dma_address))
 			goto bad_mapping;
 	}
-	debug_dma_map_sg(dev, sg, nents, nents, dir);
 	return nents;
 
  bad_mapping:
@@ -580,7 +669,6 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
 	return 0;
 }
-EXPORT_SYMBOL(dma_map_sg);
 
 /**
  * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
@@ -592,18 +680,15 @@ EXPORT_SYMBOL(dma_map_sg);
  * Unmap a set of streaming mode DMA translations.  Again, CPU access
  * rules concerning calls here are the same as for dma_unmap_single().
  */
-void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
-		enum dma_data_direction dir)
+void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	struct scatterlist *s;
 	int i;
 
-	debug_dma_unmap_sg(dev, sg, nents, dir);
-
 	for_each_sg(sg, s, nents, i)
 		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
 }
-EXPORT_SYMBOL(dma_unmap_sg);
 
 /**
  * dma_sync_sg_for_cpu
@@ -612,7 +697,7 @@ EXPORT_SYMBOL(dma_unmap_sg);
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *s;
@@ -626,10 +711,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 		__dma_page_dev_to_cpu(sg_page(s), s->offset,
 				      s->length, dir);
 	}
-
-	debug_dma_sync_sg_for_cpu(dev, sg, nents, dir);
 }
-EXPORT_SYMBOL(dma_sync_sg_for_cpu);
 
 /**
  * dma_sync_sg_for_device
@@ -638,7 +720,7 @@ EXPORT_SYMBOL(dma_sync_sg_for_cpu);
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *s;
@@ -652,10 +734,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 		__dma_page_cpu_to_dev(sg_page(s), s->offset,
 				      s->length, dir);
 	}
-
-	debug_dma_sync_sg_for_device(dev, sg, nents, dir);
 }
-EXPORT_SYMBOL(dma_sync_sg_for_device);
 
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch converts all dma_sg methods to be generic (independent of the
current DMA mapping implementation for ARM architecture). All dma sg
operations are now implemented on top of respective
dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/include/asm/dma-mapping.h |   10 +++---
 arch/arm/mm/dma-mapping.c          |   59 ++++++++++++++++-------------------
 2 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index f4e4968..fa73efc 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -340,15 +340,15 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 #endif /* CONFIG_DMABOUNCE */
 
 /*
- * The scatter list versions of the above methods.
+ * The generic scatter list versions of dma methods.
  */
-extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
+extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
 		enum dma_data_direction, struct dma_attrs *attrs);
-extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
+extern void generic_dma_unmap_sg(struct device *, struct scatterlist *, int,
 		enum dma_data_direction, struct dma_attrs *attrs);
-extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
+extern void generic_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
+extern void generic_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
 
 #endif /* __KERNEL__ */
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 5264552..ebbd76c 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -107,12 +107,12 @@ static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 struct dma_map_ops dma_ops = {
 	.map_page		= arm_dma_map_page,
 	.unmap_page		= arm_dma_unmap_page,
-	.map_sg			= arm_dma_map_sg,
-	.unmap_sg		= arm_dma_unmap_sg,
 	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
 	.sync_single_for_device	= arm_dma_sync_single_for_device,
-	.sync_sg_for_cpu	= arm_dma_sync_sg_for_cpu,
-	.sync_sg_for_device	= arm_dma_sync_sg_for_device,
+	.map_sg			= generic_dma_map_sg,
+	.unmap_sg		= generic_dma_unmap_sg,
+	.sync_sg_for_cpu	= generic_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= generic_dma_sync_sg_for_device,
 	.set_dma_mask		= arm_dma_set_mask,
 };
 EXPORT_SYMBOL(dma_ops);
@@ -635,7 +635,7 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
 EXPORT_SYMBOL(___dma_page_dev_to_cpu);
 
 /**
- * dma_map_sg - map a set of SG buffers for streaming mode DMA
+ * generic_map_sg - map a set of SG buffers for streaming mode DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map
@@ -650,15 +650,16 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu);
  * Device ownership issues as mentioned for dma_map_single are the same
  * here.
  */
-int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+int generic_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i, j;
 
 	for_each_sg(sg, s, nents, i) {
-		s->dma_address = __dma_map_page(dev, sg_page(s), s->offset,
-						s->length, dir);
+		s->dma_address = ops->map_page(dev, sg_page(s), s->offset,
+						s->length, dir, attrs);
 		if (dma_mapping_error(dev, s->dma_address))
 			goto bad_mapping;
 	}
@@ -666,12 +667,12 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 
  bad_mapping:
 	for_each_sg(sg, s, i, j)
-		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
+		ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs);
 	return 0;
 }
 
 /**
- * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
+ * generic_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
@@ -680,60 +681,54 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
  * Unmap a set of streaming mode DMA translations.  Again, CPU access
  * rules concerning calls here are the same as for dma_unmap_single().
  */
-void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+void generic_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
+
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
+		ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs);
 }
 
 /**
- * dma_sync_sg_for_cpu
+ * generic_sync_sg_for_cpu
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+void generic_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i;
 
-	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s),
-					    sg_dma_len(s), dir))
-			continue;
-
-		__dma_page_dev_to_cpu(sg_page(s), s->offset,
-				      s->length, dir);
-	}
+	for_each_sg(sg, s, nents, i)
+		ops->sync_single_for_cpu(dev, sg_dma_address(s) + s->offset,
+					 s->length, dir);
 }
 
 /**
- * dma_sync_sg_for_device
+ * generic_sync_sg_for_device
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+void generic_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i;
 
-	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_device(dev, sg_dma_address(s),
-					sg_dma_len(s), dir))
-			continue;
-
-		__dma_page_cpu_to_dev(sg_page(s), s->offset,
-				      s->length, dir);
-	}
+	for_each_sg(sg, s, nents, i)
+		ops->sync_single_for_device(dev, sg_dma_address(s) + s->offset,
+					    s->length, dir);
 }
 
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch converts all dma_sg methods to be generic (independent of the
current DMA mapping implementation for ARM architecture). All dma sg
operations are now implemented on top of respective
dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/include/asm/dma-mapping.h |   10 +++---
 arch/arm/mm/dma-mapping.c          |   59 ++++++++++++++++-------------------
 2 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index f4e4968..fa73efc 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -340,15 +340,15 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 #endif /* CONFIG_DMABOUNCE */
 
 /*
- * The scatter list versions of the above methods.
+ * The generic scatter list versions of dma methods.
  */
-extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
+extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
 		enum dma_data_direction, struct dma_attrs *attrs);
-extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
+extern void generic_dma_unmap_sg(struct device *, struct scatterlist *, int,
 		enum dma_data_direction, struct dma_attrs *attrs);
-extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
+extern void generic_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
+extern void generic_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
 
 #endif /* __KERNEL__ */
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 5264552..ebbd76c 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -107,12 +107,12 @@ static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 struct dma_map_ops dma_ops = {
 	.map_page		= arm_dma_map_page,
 	.unmap_page		= arm_dma_unmap_page,
-	.map_sg			= arm_dma_map_sg,
-	.unmap_sg		= arm_dma_unmap_sg,
 	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
 	.sync_single_for_device	= arm_dma_sync_single_for_device,
-	.sync_sg_for_cpu	= arm_dma_sync_sg_for_cpu,
-	.sync_sg_for_device	= arm_dma_sync_sg_for_device,
+	.map_sg			= generic_dma_map_sg,
+	.unmap_sg		= generic_dma_unmap_sg,
+	.sync_sg_for_cpu	= generic_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= generic_dma_sync_sg_for_device,
 	.set_dma_mask		= arm_dma_set_mask,
 };
 EXPORT_SYMBOL(dma_ops);
@@ -635,7 +635,7 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
 EXPORT_SYMBOL(___dma_page_dev_to_cpu);
 
 /**
- * dma_map_sg - map a set of SG buffers for streaming mode DMA
+ * generic_map_sg - map a set of SG buffers for streaming mode DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map
@@ -650,15 +650,16 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu);
  * Device ownership issues as mentioned for dma_map_single are the same
  * here.
  */
-int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+int generic_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i, j;
 
 	for_each_sg(sg, s, nents, i) {
-		s->dma_address = __dma_map_page(dev, sg_page(s), s->offset,
-						s->length, dir);
+		s->dma_address = ops->map_page(dev, sg_page(s), s->offset,
+						s->length, dir, attrs);
 		if (dma_mapping_error(dev, s->dma_address))
 			goto bad_mapping;
 	}
@@ -666,12 +667,12 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 
  bad_mapping:
 	for_each_sg(sg, s, i, j)
-		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
+		ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs);
 	return 0;
 }
 
 /**
- * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
+ * generic_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
@@ -680,60 +681,54 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
  * Unmap a set of streaming mode DMA translations.  Again, CPU access
  * rules concerning calls here are the same as for dma_unmap_single().
  */
-void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+void generic_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
+
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
+		ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs);
 }
 
 /**
- * dma_sync_sg_for_cpu
+ * generic_sync_sg_for_cpu
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+void generic_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i;
 
-	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s),
-					    sg_dma_len(s), dir))
-			continue;
-
-		__dma_page_dev_to_cpu(sg_page(s), s->offset,
-				      s->length, dir);
-	}
+	for_each_sg(sg, s, nents, i)
+		ops->sync_single_for_cpu(dev, sg_dma_address(s) + s->offset,
+					 s->length, dir);
 }
 
 /**
- * dma_sync_sg_for_device
+ * generic_sync_sg_for_device
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+void generic_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i;
 
-	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_device(dev, sg_dma_address(s),
-					sg_dma_len(s), dir))
-			continue;
-
-		__dma_page_cpu_to_dev(sg_page(s), s->offset,
-				      s->length, dir);
-	}
+	for_each_sg(sg, s, nents, i)
+		ops->sync_single_for_device(dev, sg_dma_address(s) + s->offset,
+					    s->length, dir);
 }
 
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

This patch converts all dma_sg methods to be generic (independent of the
current DMA mapping implementation for ARM architecture). All dma sg
operations are now implemented on top of respective
dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/include/asm/dma-mapping.h |   10 +++---
 arch/arm/mm/dma-mapping.c          |   59 ++++++++++++++++-------------------
 2 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index f4e4968..fa73efc 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -340,15 +340,15 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
 #endif /* CONFIG_DMABOUNCE */
 
 /*
- * The scatter list versions of the above methods.
+ * The generic scatter list versions of dma methods.
  */
-extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
+extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
 		enum dma_data_direction, struct dma_attrs *attrs);
-extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
+extern void generic_dma_unmap_sg(struct device *, struct scatterlist *, int,
 		enum dma_data_direction, struct dma_attrs *attrs);
-extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
+extern void generic_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
-extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
+extern void generic_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
 		enum dma_data_direction);
 
 #endif /* __KERNEL__ */
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 5264552..ebbd76c 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -107,12 +107,12 @@ static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 struct dma_map_ops dma_ops = {
 	.map_page		= arm_dma_map_page,
 	.unmap_page		= arm_dma_unmap_page,
-	.map_sg			= arm_dma_map_sg,
-	.unmap_sg		= arm_dma_unmap_sg,
 	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
 	.sync_single_for_device	= arm_dma_sync_single_for_device,
-	.sync_sg_for_cpu	= arm_dma_sync_sg_for_cpu,
-	.sync_sg_for_device	= arm_dma_sync_sg_for_device,
+	.map_sg			= generic_dma_map_sg,
+	.unmap_sg		= generic_dma_unmap_sg,
+	.sync_sg_for_cpu	= generic_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= generic_dma_sync_sg_for_device,
 	.set_dma_mask		= arm_dma_set_mask,
 };
 EXPORT_SYMBOL(dma_ops);
@@ -635,7 +635,7 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
 EXPORT_SYMBOL(___dma_page_dev_to_cpu);
 
 /**
- * dma_map_sg - map a set of SG buffers for streaming mode DMA
+ * generic_map_sg - map a set of SG buffers for streaming mode DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map
@@ -650,15 +650,16 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu);
  * Device ownership issues as mentioned for dma_map_single are the same
  * here.
  */
-int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+int generic_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i, j;
 
 	for_each_sg(sg, s, nents, i) {
-		s->dma_address = __dma_map_page(dev, sg_page(s), s->offset,
-						s->length, dir);
+		s->dma_address = ops->map_page(dev, sg_page(s), s->offset,
+						s->length, dir, attrs);
 		if (dma_mapping_error(dev, s->dma_address))
 			goto bad_mapping;
 	}
@@ -666,12 +667,12 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
 
  bad_mapping:
 	for_each_sg(sg, s, i, j)
-		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
+		ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs);
 	return 0;
 }
 
 /**
- * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
+ * generic_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
@@ -680,60 +681,54 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
  * Unmap a set of streaming mode DMA translations.  Again, CPU access
  * rules concerning calls here are the same as for dma_unmap_single().
  */
-void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+void generic_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
+
 	int i;
 
 	for_each_sg(sg, s, nents, i)
-		__dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir);
+		ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs);
 }
 
 /**
- * dma_sync_sg_for_cpu
+ * generic_sync_sg_for_cpu
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+void generic_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i;
 
-	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s),
-					    sg_dma_len(s), dir))
-			continue;
-
-		__dma_page_dev_to_cpu(sg_page(s), s->offset,
-				      s->length, dir);
-	}
+	for_each_sg(sg, s, nents, i)
+		ops->sync_single_for_cpu(dev, sg_dma_address(s) + s->offset,
+					 s->length, dir);
 }
 
 /**
- * dma_sync_sg_for_device
+ * generic_sync_sg_for_device
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @sg: list of buffers
  * @nents: number of buffers to map (returned from dma_map_sg)
  * @dir: DMA transfer direction (same as was passed to dma_map_sg)
  */
-void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+void generic_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 			int nents, enum dma_data_direction dir)
 {
+	struct dma_map_ops *ops = get_dma_ops(dev);
 	struct scatterlist *s;
 	int i;
 
-	for_each_sg(sg, s, nents, i) {
-		if (!dmabounce_sync_for_device(dev, sg_dma_address(s),
-					sg_dma_len(s), dir))
-			continue;
-
-		__dma_page_cpu_to_dev(sg_page(s), s->offset,
-				      s->length, dir);
-	}
+	for_each_sg(sg, s, nents, i)
+		ops->sync_single_for_device(dev, sg_dma_address(s) + s->offset,
+					    s->length, dir);
 }
 
 #define PREALLOC_DMA_DEBUG_ENTRIES	4096
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-20  7:50   ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch removes dma bounce hooks from the common dma mapping
implementation on ARM architecture and creates a separate set of
dma_map_ops for dma bounce devices.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/common/dmabounce.c        |   68 ++++++++++++++++++++------
 arch/arm/include/asm/dma-mapping.h |   96 ------------------------------------
 arch/arm/mm/dma-mapping.c          |   83 ++++++++++++++++++++++++++-----
 3 files changed, 122 insertions(+), 125 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 9eb161e..5b411ef 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -256,7 +256,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
 		if (buf == 0) {
 			dev_err(dev, "%s: unable to map unsafe buffer %p!\n",
 			       __func__, ptr);
-			return 0;
+			return ~0;
 		}
 
 		dev_dbg(dev,
@@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
 		 * We don't need to sync the DMA buffer since
 		 * it was allocated via the coherent allocators.
 		 */
-		__dma_single_cpu_to_dev(ptr, size, dir);
+		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
 	}
 
 	return dma_addr;
@@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
 		}
 		free_safe_buffer(dev->archdata.dmabounce, buf);
 	} else {
-		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
+		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
 	}
 }
 
@@ -329,14 +329,13 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
  * substitute the safe buffer for the unsafe one.
  * (basically move the buffer from an unsafe area to a safe one)
  */
-dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-		unsigned long offset, size_t size, enum dma_data_direction dir)
+static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, enum dma_data_direction dir,
+		struct dma_attrs *attrs)
 {
 	dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
 		__func__, page, offset, size, dir);
 
-	BUG_ON(!valid_dma_direction(dir));
-
 	if (PageHighMem(page)) {
 		dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
 			     "is not supported\n");
@@ -345,7 +344,6 @@ dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 
 	return map_single(dev, page_address(page) + offset, size, dir);
 }
-EXPORT_SYMBOL(__dma_map_page);
 
 /*
  * see if a mapped address was really a "safe" buffer and if so, copy
@@ -353,17 +351,16 @@ EXPORT_SYMBOL(__dma_map_page);
  * the safe buffer.  (basically return things back to the way they
  * should be)
  */
-void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
-		enum dma_data_direction dir)
+static void dmabounce_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
 		__func__, (void *) dma_addr, size, dir);
 
 	unmap_single(dev, dma_addr, size, dir);
 }
-EXPORT_SYMBOL(__dma_unmap_page);
 
-int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
+static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
@@ -393,9 +390,17 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 	}
 	return 0;
 }
-EXPORT_SYMBOL(dmabounce_sync_for_cpu);
 
-int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
+static void dmabounce_sync_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!__dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	dma_ops.sync_single_for_cpu(dev, handle, size, dir);
+}
+
+static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
@@ -425,7 +430,38 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 	}
 	return 0;
 }
-EXPORT_SYMBOL(dmabounce_sync_for_device);
+
+static void dmabounce_sync_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!__dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	dma_ops.sync_single_for_device(dev, handle, size, dir);
+}
+
+static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
+{
+	if (dev->archdata.dmabounce) {
+		if (dma_mask >= ISA_DMA_THRESHOLD)
+			return 0;
+		else
+			return -EIO;
+	}
+	return dma_ops.set_dma_mask(dev, dma_mask);
+}
+
+static struct dma_map_ops dmabounce_ops = {
+	.map_page		= dmabounce_map_page,
+	.unmap_page		= dmabounce_unmap_page,
+	.sync_single_for_cpu	= dmabounce_sync_for_cpu,
+	.sync_single_for_device	= dmabounce_sync_for_device,
+	.map_sg			= generic_dma_map_sg,
+	.unmap_sg		= generic_dma_unmap_sg,
+	.sync_sg_for_cpu	= generic_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= generic_dma_sync_sg_for_device,
+	.set_dma_mask		= dmabounce_set_mask,
+};
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
 		const char *name, unsigned long size)
@@ -485,6 +521,7 @@ int dmabounce_register_dev(struct device *dev, unsigned long small_buffer_size,
 #endif
 
 	dev->archdata.dmabounce = device_info;
+	set_dma_ops(dev, &dmabounce_ops);
 
 	dev_info(dev, "dmabounce: registered device\n");
 
@@ -503,6 +540,7 @@ void dmabounce_unregister_dev(struct device *dev)
 	struct dmabounce_device_info *device_info = dev->archdata.dmabounce;
 
 	dev->archdata.dmabounce = NULL;
+	set_dma_ops(dev, NULL);
 
 	if (!device_info) {
 		dev_warn(dev,
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index fa73efc..7ceec8f 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -83,60 +83,6 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
 #endif
 
 /*
- * The DMA API is built upon the notion of "buffer ownership".  A buffer
- * is either exclusively owned by the CPU (and therefore may be accessed
- * by it) or exclusively owned by the DMA device.  These helper functions
- * represent the transitions between these two ownership states.
- *
- * Note, however, that on later ARMs, this notion does not work due to
- * speculative prefetches.  We model our approach on the assumption that
- * the CPU does do speculative prefetches, which means we clean caches
- * before transfers and delay cache invalidation until transfer completion.
- *
- * Private support functions: these are not part of the API and are
- * liable to change.  Drivers must not use these.
- */
-static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_cpu_to_dev(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_cpu_to_dev(kaddr, size, dir);
-}
-
-static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_dev_to_cpu(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_dev_to_cpu(kaddr, size, dir);
-}
-
-static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_cpu_to_dev(page, off, size, dir);
-}
-
-static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_dev_to_cpu(page, off, size, dir);
-}
-
-/*
  * Return whether the given device DMA address mask can be supported
  * properly.  For example, if your device can only drive the low 24-bits
  * during bus mastering, then you would pass 0x00ffffff as the mask
@@ -240,7 +186,6 @@ int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
 		void *, dma_addr_t, size_t);
 
 
-#ifdef CONFIG_DMABOUNCE
 /*
  * For SA-1111, IXP425, and ADI systems  the dma-mapping functions are "magic"
  * and utilize bounce buffers as needed to work around limited DMA windows.
@@ -299,47 +244,6 @@ extern void dmabounce_unregister_dev(struct device *);
 extern int dma_needs_bounce(struct device*, dma_addr_t, size_t);
 
 /*
- * The DMA API, implemented by dmabounce.c.  See below for descriptions.
- */
-extern dma_addr_t __dma_map_page(struct device *, struct page *,
-		unsigned long, size_t, enum dma_data_direction);
-extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
-		enum dma_data_direction);
-
-/*
- * Private functions
- */
-int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction);
-int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction);
-#else
-static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
-	unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	return 1;
-}
-
-static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr,
-	unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	return 1;
-}
-
-static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
-}
-
-static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
-		handle & ~PAGE_MASK, size, dir);
-}
-#endif /* CONFIG_DMABOUNCE */
-
-/*
  * The generic scatter list versions of dma methods.
  */
 extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ebbd76c..9536481 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -25,6 +25,75 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 
+/*
+ * The DMA API is built upon the notion of "buffer ownership".  A buffer
+ * is either exclusively owned by the CPU (and therefore may be accessed
+ * by it) or exclusively owned by the DMA device.  These helper functions
+ * represent the transitions between these two ownership states.
+ *
+ * Note, however, that on later ARMs, this notion does not work due to
+ * speculative prefetches.  We model our approach on the assumption that
+ * the CPU does do speculative prefetches, which means we clean caches
+ * before transfers and delay cache invalidation until transfer completion.
+ *
+ * Private support functions: these are not part of the API and are
+ * liable to change.  Drivers must not use these.
+ */
+static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
+	enum dma_data_direction dir)
+{
+	extern void ___dma_single_cpu_to_dev(const void *, size_t,
+		enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_single_cpu_to_dev(kaddr, size, dir);
+}
+
+static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
+	enum dma_data_direction dir)
+{
+	extern void ___dma_single_dev_to_cpu(const void *, size_t,
+		enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_single_dev_to_cpu(kaddr, size, dir);
+}
+
+static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
+	size_t size, enum dma_data_direction dir)
+{
+	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
+		size_t, enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_page_cpu_to_dev(page, off, size, dir);
+}
+
+static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
+	size_t size, enum dma_data_direction dir)
+{
+	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
+		size_t, enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_page_dev_to_cpu(page, off, size, dir);
+}
+
+
+static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
+	     unsigned long offset, size_t size, enum dma_data_direction dir)
+{
+	__dma_page_cpu_to_dev(page, offset, size, dir);
+	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+}
+
+static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir)
+{
+	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
+		handle & ~PAGE_MASK, size, dir);
+}
+
 /**
  * dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -71,31 +140,17 @@ static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 static inline void arm_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
-		return;
-
 	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
 }
 
 static inline void arm_dma_sync_single_for_device(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	if (!dmabounce_sync_for_device(dev, handle, size, dir))
-		return;
-
 	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
 }
 
 static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 {
-#ifdef CONFIG_DMABOUNCE
-	if (dev->archdata.dmabounce) {
-		if (dma_mask >= ISA_DMA_THRESHOLD)
-			return 0;
-		else
-			return -EIO;
-	}
-#endif
 	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
 		return -EIO;
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch removes dma bounce hooks from the common dma mapping
implementation on ARM architecture and creates a separate set of
dma_map_ops for dma bounce devices.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/common/dmabounce.c        |   68 ++++++++++++++++++++------
 arch/arm/include/asm/dma-mapping.h |   96 ------------------------------------
 arch/arm/mm/dma-mapping.c          |   83 ++++++++++++++++++++++++++-----
 3 files changed, 122 insertions(+), 125 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 9eb161e..5b411ef 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -256,7 +256,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
 		if (buf == 0) {
 			dev_err(dev, "%s: unable to map unsafe buffer %p!\n",
 			       __func__, ptr);
-			return 0;
+			return ~0;
 		}
 
 		dev_dbg(dev,
@@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
 		 * We don't need to sync the DMA buffer since
 		 * it was allocated via the coherent allocators.
 		 */
-		__dma_single_cpu_to_dev(ptr, size, dir);
+		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
 	}
 
 	return dma_addr;
@@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
 		}
 		free_safe_buffer(dev->archdata.dmabounce, buf);
 	} else {
-		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
+		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
 	}
 }
 
@@ -329,14 +329,13 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
  * substitute the safe buffer for the unsafe one.
  * (basically move the buffer from an unsafe area to a safe one)
  */
-dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-		unsigned long offset, size_t size, enum dma_data_direction dir)
+static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, enum dma_data_direction dir,
+		struct dma_attrs *attrs)
 {
 	dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
 		__func__, page, offset, size, dir);
 
-	BUG_ON(!valid_dma_direction(dir));
-
 	if (PageHighMem(page)) {
 		dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
 			     "is not supported\n");
@@ -345,7 +344,6 @@ dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 
 	return map_single(dev, page_address(page) + offset, size, dir);
 }
-EXPORT_SYMBOL(__dma_map_page);
 
 /*
  * see if a mapped address was really a "safe" buffer and if so, copy
@@ -353,17 +351,16 @@ EXPORT_SYMBOL(__dma_map_page);
  * the safe buffer.  (basically return things back to the way they
  * should be)
  */
-void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
-		enum dma_data_direction dir)
+static void dmabounce_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
 		__func__, (void *) dma_addr, size, dir);
 
 	unmap_single(dev, dma_addr, size, dir);
 }
-EXPORT_SYMBOL(__dma_unmap_page);
 
-int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
+static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
@@ -393,9 +390,17 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 	}
 	return 0;
 }
-EXPORT_SYMBOL(dmabounce_sync_for_cpu);
 
-int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
+static void dmabounce_sync_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!__dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	dma_ops.sync_single_for_cpu(dev, handle, size, dir);
+}
+
+static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
@@ -425,7 +430,38 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 	}
 	return 0;
 }
-EXPORT_SYMBOL(dmabounce_sync_for_device);
+
+static void dmabounce_sync_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!__dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	dma_ops.sync_single_for_device(dev, handle, size, dir);
+}
+
+static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
+{
+	if (dev->archdata.dmabounce) {
+		if (dma_mask >= ISA_DMA_THRESHOLD)
+			return 0;
+		else
+			return -EIO;
+	}
+	return dma_ops.set_dma_mask(dev, dma_mask);
+}
+
+static struct dma_map_ops dmabounce_ops = {
+	.map_page		= dmabounce_map_page,
+	.unmap_page		= dmabounce_unmap_page,
+	.sync_single_for_cpu	= dmabounce_sync_for_cpu,
+	.sync_single_for_device	= dmabounce_sync_for_device,
+	.map_sg			= generic_dma_map_sg,
+	.unmap_sg		= generic_dma_unmap_sg,
+	.sync_sg_for_cpu	= generic_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= generic_dma_sync_sg_for_device,
+	.set_dma_mask		= dmabounce_set_mask,
+};
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
 		const char *name, unsigned long size)
@@ -485,6 +521,7 @@ int dmabounce_register_dev(struct device *dev, unsigned long small_buffer_size,
 #endif
 
 	dev->archdata.dmabounce = device_info;
+	set_dma_ops(dev, &dmabounce_ops);
 
 	dev_info(dev, "dmabounce: registered device\n");
 
@@ -503,6 +540,7 @@ void dmabounce_unregister_dev(struct device *dev)
 	struct dmabounce_device_info *device_info = dev->archdata.dmabounce;
 
 	dev->archdata.dmabounce = NULL;
+	set_dma_ops(dev, NULL);
 
 	if (!device_info) {
 		dev_warn(dev,
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index fa73efc..7ceec8f 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -83,60 +83,6 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
 #endif
 
 /*
- * The DMA API is built upon the notion of "buffer ownership".  A buffer
- * is either exclusively owned by the CPU (and therefore may be accessed
- * by it) or exclusively owned by the DMA device.  These helper functions
- * represent the transitions between these two ownership states.
- *
- * Note, however, that on later ARMs, this notion does not work due to
- * speculative prefetches.  We model our approach on the assumption that
- * the CPU does do speculative prefetches, which means we clean caches
- * before transfers and delay cache invalidation until transfer completion.
- *
- * Private support functions: these are not part of the API and are
- * liable to change.  Drivers must not use these.
- */
-static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_cpu_to_dev(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_cpu_to_dev(kaddr, size, dir);
-}
-
-static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_dev_to_cpu(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_dev_to_cpu(kaddr, size, dir);
-}
-
-static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_cpu_to_dev(page, off, size, dir);
-}
-
-static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_dev_to_cpu(page, off, size, dir);
-}
-
-/*
  * Return whether the given device DMA address mask can be supported
  * properly.  For example, if your device can only drive the low 24-bits
  * during bus mastering, then you would pass 0x00ffffff as the mask
@@ -240,7 +186,6 @@ int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
 		void *, dma_addr_t, size_t);
 
 
-#ifdef CONFIG_DMABOUNCE
 /*
  * For SA-1111, IXP425, and ADI systems  the dma-mapping functions are "magic"
  * and utilize bounce buffers as needed to work around limited DMA windows.
@@ -299,47 +244,6 @@ extern void dmabounce_unregister_dev(struct device *);
 extern int dma_needs_bounce(struct device*, dma_addr_t, size_t);
 
 /*
- * The DMA API, implemented by dmabounce.c.  See below for descriptions.
- */
-extern dma_addr_t __dma_map_page(struct device *, struct page *,
-		unsigned long, size_t, enum dma_data_direction);
-extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
-		enum dma_data_direction);
-
-/*
- * Private functions
- */
-int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction);
-int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction);
-#else
-static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
-	unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	return 1;
-}
-
-static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr,
-	unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	return 1;
-}
-
-static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
-}
-
-static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
-		handle & ~PAGE_MASK, size, dir);
-}
-#endif /* CONFIG_DMABOUNCE */
-
-/*
  * The generic scatter list versions of dma methods.
  */
 extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ebbd76c..9536481 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -25,6 +25,75 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 
+/*
+ * The DMA API is built upon the notion of "buffer ownership".  A buffer
+ * is either exclusively owned by the CPU (and therefore may be accessed
+ * by it) or exclusively owned by the DMA device.  These helper functions
+ * represent the transitions between these two ownership states.
+ *
+ * Note, however, that on later ARMs, this notion does not work due to
+ * speculative prefetches.  We model our approach on the assumption that
+ * the CPU does do speculative prefetches, which means we clean caches
+ * before transfers and delay cache invalidation until transfer completion.
+ *
+ * Private support functions: these are not part of the API and are
+ * liable to change.  Drivers must not use these.
+ */
+static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
+	enum dma_data_direction dir)
+{
+	extern void ___dma_single_cpu_to_dev(const void *, size_t,
+		enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_single_cpu_to_dev(kaddr, size, dir);
+}
+
+static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
+	enum dma_data_direction dir)
+{
+	extern void ___dma_single_dev_to_cpu(const void *, size_t,
+		enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_single_dev_to_cpu(kaddr, size, dir);
+}
+
+static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
+	size_t size, enum dma_data_direction dir)
+{
+	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
+		size_t, enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_page_cpu_to_dev(page, off, size, dir);
+}
+
+static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
+	size_t size, enum dma_data_direction dir)
+{
+	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
+		size_t, enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_page_dev_to_cpu(page, off, size, dir);
+}
+
+
+static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
+	     unsigned long offset, size_t size, enum dma_data_direction dir)
+{
+	__dma_page_cpu_to_dev(page, offset, size, dir);
+	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+}
+
+static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir)
+{
+	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
+		handle & ~PAGE_MASK, size, dir);
+}
+
 /**
  * dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -71,31 +140,17 @@ static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 static inline void arm_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
-		return;
-
 	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
 }
 
 static inline void arm_dma_sync_single_for_device(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	if (!dmabounce_sync_for_device(dev, handle, size, dir))
-		return;
-
 	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
 }
 
 static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 {
-#ifdef CONFIG_DMABOUNCE
-	if (dev->archdata.dmabounce) {
-		if (dma_mask >= ISA_DMA_THRESHOLD)
-			return 0;
-		else
-			return -EIO;
-	}
-#endif
 	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
 		return -EIO;
 
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

This patch removes dma bounce hooks from the common dma mapping
implementation on ARM architecture and creates a separate set of
dma_map_ops for dma bounce devices.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/common/dmabounce.c        |   68 ++++++++++++++++++++------
 arch/arm/include/asm/dma-mapping.h |   96 ------------------------------------
 arch/arm/mm/dma-mapping.c          |   83 ++++++++++++++++++++++++++-----
 3 files changed, 122 insertions(+), 125 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 9eb161e..5b411ef 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -256,7 +256,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
 		if (buf == 0) {
 			dev_err(dev, "%s: unable to map unsafe buffer %p!\n",
 			       __func__, ptr);
-			return 0;
+			return ~0;
 		}
 
 		dev_dbg(dev,
@@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
 		 * We don't need to sync the DMA buffer since
 		 * it was allocated via the coherent allocators.
 		 */
-		__dma_single_cpu_to_dev(ptr, size, dir);
+		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
 	}
 
 	return dma_addr;
@@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
 		}
 		free_safe_buffer(dev->archdata.dmabounce, buf);
 	} else {
-		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
+		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
 	}
 }
 
@@ -329,14 +329,13 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
  * substitute the safe buffer for the unsafe one.
  * (basically move the buffer from an unsafe area to a safe one)
  */
-dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-		unsigned long offset, size_t size, enum dma_data_direction dir)
+static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, enum dma_data_direction dir,
+		struct dma_attrs *attrs)
 {
 	dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
 		__func__, page, offset, size, dir);
 
-	BUG_ON(!valid_dma_direction(dir));
-
 	if (PageHighMem(page)) {
 		dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
 			     "is not supported\n");
@@ -345,7 +344,6 @@ dma_addr_t __dma_map_page(struct device *dev, struct page *page,
 
 	return map_single(dev, page_address(page) + offset, size, dir);
 }
-EXPORT_SYMBOL(__dma_map_page);
 
 /*
  * see if a mapped address was really a "safe" buffer and if so, copy
@@ -353,17 +351,16 @@ EXPORT_SYMBOL(__dma_map_page);
  * the safe buffer.  (basically return things back to the way they
  * should be)
  */
-void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
-		enum dma_data_direction dir)
+static void dmabounce_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 	dev_dbg(dev, "%s(ptr=%p,size=%d,dir=%x)\n",
 		__func__, (void *) dma_addr, size, dir);
 
 	unmap_single(dev, dma_addr, size, dir);
 }
-EXPORT_SYMBOL(__dma_unmap_page);
 
-int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
+static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
@@ -393,9 +390,17 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
 	}
 	return 0;
 }
-EXPORT_SYMBOL(dmabounce_sync_for_cpu);
 
-int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
+static void dmabounce_sync_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!__dmabounce_sync_for_cpu(dev, handle, size, dir))
+		return;
+
+	dma_ops.sync_single_for_cpu(dev, handle, size, dir);
+}
+
+static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 		size_t sz, enum dma_data_direction dir)
 {
 	struct safe_buffer *buf;
@@ -425,7 +430,38 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr,
 	}
 	return 0;
 }
-EXPORT_SYMBOL(dmabounce_sync_for_device);
+
+static void dmabounce_sync_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	if (!__dmabounce_sync_for_device(dev, handle, size, dir))
+		return;
+
+	dma_ops.sync_single_for_device(dev, handle, size, dir);
+}
+
+static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
+{
+	if (dev->archdata.dmabounce) {
+		if (dma_mask >= ISA_DMA_THRESHOLD)
+			return 0;
+		else
+			return -EIO;
+	}
+	return dma_ops.set_dma_mask(dev, dma_mask);
+}
+
+static struct dma_map_ops dmabounce_ops = {
+	.map_page		= dmabounce_map_page,
+	.unmap_page		= dmabounce_unmap_page,
+	.sync_single_for_cpu	= dmabounce_sync_for_cpu,
+	.sync_single_for_device	= dmabounce_sync_for_device,
+	.map_sg			= generic_dma_map_sg,
+	.unmap_sg		= generic_dma_unmap_sg,
+	.sync_sg_for_cpu	= generic_dma_sync_sg_for_cpu,
+	.sync_sg_for_device	= generic_dma_sync_sg_for_device,
+	.set_dma_mask		= dmabounce_set_mask,
+};
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
 		const char *name, unsigned long size)
@@ -485,6 +521,7 @@ int dmabounce_register_dev(struct device *dev, unsigned long small_buffer_size,
 #endif
 
 	dev->archdata.dmabounce = device_info;
+	set_dma_ops(dev, &dmabounce_ops);
 
 	dev_info(dev, "dmabounce: registered device\n");
 
@@ -503,6 +540,7 @@ void dmabounce_unregister_dev(struct device *dev)
 	struct dmabounce_device_info *device_info = dev->archdata.dmabounce;
 
 	dev->archdata.dmabounce = NULL;
+	set_dma_ops(dev, NULL);
 
 	if (!device_info) {
 		dev_warn(dev,
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index fa73efc..7ceec8f 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -83,60 +83,6 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
 #endif
 
 /*
- * The DMA API is built upon the notion of "buffer ownership".  A buffer
- * is either exclusively owned by the CPU (and therefore may be accessed
- * by it) or exclusively owned by the DMA device.  These helper functions
- * represent the transitions between these two ownership states.
- *
- * Note, however, that on later ARMs, this notion does not work due to
- * speculative prefetches.  We model our approach on the assumption that
- * the CPU does do speculative prefetches, which means we clean caches
- * before transfers and delay cache invalidation until transfer completion.
- *
- * Private support functions: these are not part of the API and are
- * liable to change.  Drivers must not use these.
- */
-static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_cpu_to_dev(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_cpu_to_dev(kaddr, size, dir);
-}
-
-static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_dev_to_cpu(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_dev_to_cpu(kaddr, size, dir);
-}
-
-static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_cpu_to_dev(page, off, size, dir);
-}
-
-static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_dev_to_cpu(page, off, size, dir);
-}
-
-/*
  * Return whether the given device DMA address mask can be supported
  * properly.  For example, if your device can only drive the low 24-bits
  * during bus mastering, then you would pass 0x00ffffff as the mask
@@ -240,7 +186,6 @@ int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
 		void *, dma_addr_t, size_t);
 
 
-#ifdef CONFIG_DMABOUNCE
 /*
  * For SA-1111, IXP425, and ADI systems  the dma-mapping functions are "magic"
  * and utilize bounce buffers as needed to work around limited DMA windows.
@@ -299,47 +244,6 @@ extern void dmabounce_unregister_dev(struct device *);
 extern int dma_needs_bounce(struct device*, dma_addr_t, size_t);
 
 /*
- * The DMA API, implemented by dmabounce.c.  See below for descriptions.
- */
-extern dma_addr_t __dma_map_page(struct device *, struct page *,
-		unsigned long, size_t, enum dma_data_direction);
-extern void __dma_unmap_page(struct device *, dma_addr_t, size_t,
-		enum dma_data_direction);
-
-/*
- * Private functions
- */
-int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction);
-int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction);
-#else
-static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr,
-	unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	return 1;
-}
-
-static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr,
-	unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	return 1;
-}
-
-static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
-}
-
-static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
-		handle & ~PAGE_MASK, size, dir);
-}
-#endif /* CONFIG_DMABOUNCE */
-
-/*
  * The generic scatter list versions of dma methods.
  */
 extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ebbd76c..9536481 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -25,6 +25,75 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 
+/*
+ * The DMA API is built upon the notion of "buffer ownership".  A buffer
+ * is either exclusively owned by the CPU (and therefore may be accessed
+ * by it) or exclusively owned by the DMA device.  These helper functions
+ * represent the transitions between these two ownership states.
+ *
+ * Note, however, that on later ARMs, this notion does not work due to
+ * speculative prefetches.  We model our approach on the assumption that
+ * the CPU does do speculative prefetches, which means we clean caches
+ * before transfers and delay cache invalidation until transfer completion.
+ *
+ * Private support functions: these are not part of the API and are
+ * liable to change.  Drivers must not use these.
+ */
+static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
+	enum dma_data_direction dir)
+{
+	extern void ___dma_single_cpu_to_dev(const void *, size_t,
+		enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_single_cpu_to_dev(kaddr, size, dir);
+}
+
+static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
+	enum dma_data_direction dir)
+{
+	extern void ___dma_single_dev_to_cpu(const void *, size_t,
+		enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_single_dev_to_cpu(kaddr, size, dir);
+}
+
+static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
+	size_t size, enum dma_data_direction dir)
+{
+	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
+		size_t, enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_page_cpu_to_dev(page, off, size, dir);
+}
+
+static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
+	size_t size, enum dma_data_direction dir)
+{
+	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
+		size_t, enum dma_data_direction);
+
+	if (!arch_is_coherent())
+		___dma_page_dev_to_cpu(page, off, size, dir);
+}
+
+
+static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
+	     unsigned long offset, size_t size, enum dma_data_direction dir)
+{
+	__dma_page_cpu_to_dev(page, offset, size, dir);
+	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+}
+
+static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir)
+{
+	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
+		handle & ~PAGE_MASK, size, dir);
+}
+
 /**
  * dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -71,31 +140,17 @@ static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 static inline void arm_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
-		return;
-
 	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
 }
 
 static inline void arm_dma_sync_single_for_device(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	if (!dmabounce_sync_for_device(dev, handle, size, dir))
-		return;
-
 	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
 }
 
 static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 {
-#ifdef CONFIG_DMABOUNCE
-	if (dev->archdata.dmabounce) {
-		if (dma_mask >= ISA_DMA_THRESHOLD)
-			return 0;
-		else
-			return -EIO;
-	}
-#endif
 	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
 		return -EIO;
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 6/8] ARM: dma-mapping: remove redundant code and cleanup
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-20  7:50   ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch just performs a global cleanup in DMA mapping implementation
for ARM architecture. Some of the tiny helper functions have been moved
to the caller code, some have been merged together.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mm/dma-mapping.c |  133 +++++++++------------------------------------
 1 files changed, 26 insertions(+), 107 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 9536481..7c62e60 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -36,64 +36,12 @@
  * the CPU does do speculative prefetches, which means we clean caches
  * before transfers and delay cache invalidation until transfer completion.
  *
- * Private support functions: these are not part of the API and are
- * liable to change.  Drivers must not use these.
  */
-static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_cpu_to_dev(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_cpu_to_dev(kaddr, size, dir);
-}
-
-static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_dev_to_cpu(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_dev_to_cpu(kaddr, size, dir);
-}
-
-static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
+static void __dma_page_cpu_to_dev(struct page *, unsigned long,
 		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_cpu_to_dev(page, off, size, dir);
-}
-
-static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
+static void __dma_page_dev_to_cpu(struct page *, unsigned long,
 		size_t, enum dma_data_direction);
 
-	if (!arch_is_coherent())
-		___dma_page_dev_to_cpu(page, off, size, dir);
-}
-
-
-static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
-}
-
-static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
-		handle & ~PAGE_MASK, size, dir);
-}
-
 /**
  * dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -108,11 +56,13 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
  * The device owns this memory once this call has completed.  The CPU
  * can regain ownership by calling dma_unmap_page().
  */
-static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
+static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir,
 	     struct dma_attrs *attrs)
 {
-	return __dma_map_page(dev, page, offset, size, dir);
+	if (!arch_is_coherent())
+		__dma_page_cpu_to_dev(page, offset, size, dir);
+	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
 
 /**
@@ -130,23 +80,29 @@ static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
  * whatever the device wrote there.
  */
 
-static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
+static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir,
 		struct dma_attrs *attrs)
 {
-	__dma_unmap_page(dev, handle, size, dir);
+	if (!arch_is_coherent())
+		__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
-static inline void arm_dma_sync_single_for_cpu(struct device *dev,
+static void arm_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+	if (!arch_is_coherent())
+		__dma_page_dev_to_cpu(virt_to_page(dma_to_virt(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
-static inline void arm_dma_sync_single_for_device(struct device *dev,
+static void arm_dma_sync_single_for_device(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+	if (!arch_is_coherent())
+		__dma_page_cpu_to_dev(virt_to_page(dma_to_virt(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
 static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
@@ -567,47 +523,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
-/*
- * Make an area consistent for devices.
- * Note: Drivers should NOT use this function directly, as it will break
- * platforms with CONFIG_DMABOUNCE.
- * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
- */
-void ___dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	unsigned long paddr;
-
-	BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1));
-
-	dmac_map_area(kaddr, size, dir);
-
-	paddr = __pa(kaddr);
-	if (dir == DMA_FROM_DEVICE) {
-		outer_inv_range(paddr, paddr + size);
-	} else {
-		outer_clean_range(paddr, paddr + size);
-	}
-	/* FIXME: non-speculating: flush on bidirectional mappings? */
-}
-EXPORT_SYMBOL(___dma_single_cpu_to_dev);
-
-void ___dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1));
-
-	/* FIXME: non-speculating: not required */
-	/* don't bother invalidating if DMA to device */
-	if (dir != DMA_TO_DEVICE) {
-		unsigned long paddr = __pa(kaddr);
-		outer_inv_range(paddr, paddr + size);
-	}
-
-	dmac_unmap_area(kaddr, size, dir);
-}
-EXPORT_SYMBOL(___dma_single_dev_to_cpu);
-
 static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	size_t size, enum dma_data_direction dir,
 	void (*op)(const void *, size_t, int))
@@ -652,7 +567,13 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	} while (left);
 }
 
-void ___dma_page_cpu_to_dev(struct page *page, unsigned long off,
+/*
+ * Make an area consistent for devices.
+ * Note: Drivers should NOT use this function directly, as it will break
+ * platforms with CONFIG_DMABOUNCE.
+ * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
+ */
+static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	unsigned long paddr;
@@ -667,9 +588,8 @@ void ___dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	}
 	/* FIXME: non-speculating: flush on bidirectional mappings? */
 }
-EXPORT_SYMBOL(___dma_page_cpu_to_dev);
 
-void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
+static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	unsigned long paddr = page_to_phys(page) + off;
@@ -687,7 +607,6 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	if (dir != DMA_TO_DEVICE && off == 0 && size >= PAGE_SIZE)
 		set_bit(PG_dcache_clean, &page->flags);
 }
-EXPORT_SYMBOL(___dma_page_dev_to_cpu);
 
 /**
  * generic_map_sg - map a set of SG buffers for streaming mode DMA
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 6/8] ARM: dma-mapping: remove redundant code and cleanup
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch just performs a global cleanup in DMA mapping implementation
for ARM architecture. Some of the tiny helper functions have been moved
to the caller code, some have been merged together.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mm/dma-mapping.c |  133 +++++++++------------------------------------
 1 files changed, 26 insertions(+), 107 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 9536481..7c62e60 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -36,64 +36,12 @@
  * the CPU does do speculative prefetches, which means we clean caches
  * before transfers and delay cache invalidation until transfer completion.
  *
- * Private support functions: these are not part of the API and are
- * liable to change.  Drivers must not use these.
  */
-static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_cpu_to_dev(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_cpu_to_dev(kaddr, size, dir);
-}
-
-static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_dev_to_cpu(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_dev_to_cpu(kaddr, size, dir);
-}
-
-static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
+static void __dma_page_cpu_to_dev(struct page *, unsigned long,
 		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_cpu_to_dev(page, off, size, dir);
-}
-
-static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
+static void __dma_page_dev_to_cpu(struct page *, unsigned long,
 		size_t, enum dma_data_direction);
 
-	if (!arch_is_coherent())
-		___dma_page_dev_to_cpu(page, off, size, dir);
-}
-
-
-static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
-}
-
-static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
-		handle & ~PAGE_MASK, size, dir);
-}
-
 /**
  * dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -108,11 +56,13 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
  * The device owns this memory once this call has completed.  The CPU
  * can regain ownership by calling dma_unmap_page().
  */
-static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
+static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir,
 	     struct dma_attrs *attrs)
 {
-	return __dma_map_page(dev, page, offset, size, dir);
+	if (!arch_is_coherent())
+		__dma_page_cpu_to_dev(page, offset, size, dir);
+	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
 
 /**
@@ -130,23 +80,29 @@ static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
  * whatever the device wrote there.
  */
 
-static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
+static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir,
 		struct dma_attrs *attrs)
 {
-	__dma_unmap_page(dev, handle, size, dir);
+	if (!arch_is_coherent())
+		__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
-static inline void arm_dma_sync_single_for_cpu(struct device *dev,
+static void arm_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+	if (!arch_is_coherent())
+		__dma_page_dev_to_cpu(virt_to_page(dma_to_virt(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
-static inline void arm_dma_sync_single_for_device(struct device *dev,
+static void arm_dma_sync_single_for_device(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+	if (!arch_is_coherent())
+		__dma_page_cpu_to_dev(virt_to_page(dma_to_virt(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
 static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
@@ -567,47 +523,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
-/*
- * Make an area consistent for devices.
- * Note: Drivers should NOT use this function directly, as it will break
- * platforms with CONFIG_DMABOUNCE.
- * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
- */
-void ___dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	unsigned long paddr;
-
-	BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1));
-
-	dmac_map_area(kaddr, size, dir);
-
-	paddr = __pa(kaddr);
-	if (dir == DMA_FROM_DEVICE) {
-		outer_inv_range(paddr, paddr + size);
-	} else {
-		outer_clean_range(paddr, paddr + size);
-	}
-	/* FIXME: non-speculating: flush on bidirectional mappings? */
-}
-EXPORT_SYMBOL(___dma_single_cpu_to_dev);
-
-void ___dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1));
-
-	/* FIXME: non-speculating: not required */
-	/* don't bother invalidating if DMA to device */
-	if (dir != DMA_TO_DEVICE) {
-		unsigned long paddr = __pa(kaddr);
-		outer_inv_range(paddr, paddr + size);
-	}
-
-	dmac_unmap_area(kaddr, size, dir);
-}
-EXPORT_SYMBOL(___dma_single_dev_to_cpu);
-
 static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	size_t size, enum dma_data_direction dir,
 	void (*op)(const void *, size_t, int))
@@ -652,7 +567,13 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	} while (left);
 }
 
-void ___dma_page_cpu_to_dev(struct page *page, unsigned long off,
+/*
+ * Make an area consistent for devices.
+ * Note: Drivers should NOT use this function directly, as it will break
+ * platforms with CONFIG_DMABOUNCE.
+ * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
+ */
+static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	unsigned long paddr;
@@ -667,9 +588,8 @@ void ___dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	}
 	/* FIXME: non-speculating: flush on bidirectional mappings? */
 }
-EXPORT_SYMBOL(___dma_page_cpu_to_dev);
 
-void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
+static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	unsigned long paddr = page_to_phys(page) + off;
@@ -687,7 +607,6 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	if (dir != DMA_TO_DEVICE && off == 0 && size >= PAGE_SIZE)
 		set_bit(PG_dcache_clean, &page->flags);
 }
-EXPORT_SYMBOL(___dma_page_dev_to_cpu);
 
 /**
  * generic_map_sg - map a set of SG buffers for streaming mode DMA
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 6/8] ARM: dma-mapping: remove redundant code and cleanup
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

This patch just performs a global cleanup in DMA mapping implementation
for ARM architecture. Some of the tiny helper functions have been moved
to the caller code, some have been merged together.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/mm/dma-mapping.c |  133 +++++++++------------------------------------
 1 files changed, 26 insertions(+), 107 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 9536481..7c62e60 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -36,64 +36,12 @@
  * the CPU does do speculative prefetches, which means we clean caches
  * before transfers and delay cache invalidation until transfer completion.
  *
- * Private support functions: these are not part of the API and are
- * liable to change.  Drivers must not use these.
  */
-static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_cpu_to_dev(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_cpu_to_dev(kaddr, size, dir);
-}
-
-static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	extern void ___dma_single_dev_to_cpu(const void *, size_t,
-		enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_single_dev_to_cpu(kaddr, size, dir);
-}
-
-static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_cpu_to_dev(struct page *, unsigned long,
+static void __dma_page_cpu_to_dev(struct page *, unsigned long,
 		size_t, enum dma_data_direction);
-
-	if (!arch_is_coherent())
-		___dma_page_cpu_to_dev(page, off, size, dir);
-}
-
-static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
-	size_t size, enum dma_data_direction dir)
-{
-	extern void ___dma_page_dev_to_cpu(struct page *, unsigned long,
+static void __dma_page_dev_to_cpu(struct page *, unsigned long,
 		size_t, enum dma_data_direction);
 
-	if (!arch_is_coherent())
-		___dma_page_dev_to_cpu(page, off, size, dir);
-}
-
-
-static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir)
-{
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
-}
-
-static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir)
-{
-	__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
-		handle & ~PAGE_MASK, size, dir);
-}
-
 /**
  * dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -108,11 +56,13 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle,
  * The device owns this memory once this call has completed.  The CPU
  * can regain ownership by calling dma_unmap_page().
  */
-static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
+static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
 	     unsigned long offset, size_t size, enum dma_data_direction dir,
 	     struct dma_attrs *attrs)
 {
-	return __dma_map_page(dev, page, offset, size, dir);
+	if (!arch_is_coherent())
+		__dma_page_cpu_to_dev(page, offset, size, dir);
+	return pfn_to_dma(dev, page_to_pfn(page)) + offset;
 }
 
 /**
@@ -130,23 +80,29 @@ static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
  * whatever the device wrote there.
  */
 
-static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
+static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle,
 		size_t size, enum dma_data_direction dir,
 		struct dma_attrs *attrs)
 {
-	__dma_unmap_page(dev, handle, size, dir);
+	if (!arch_is_coherent())
+		__dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
-static inline void arm_dma_sync_single_for_cpu(struct device *dev,
+static void arm_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
+	if (!arch_is_coherent())
+		__dma_page_dev_to_cpu(virt_to_page(dma_to_virt(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
-static inline void arm_dma_sync_single_for_device(struct device *dev,
+static void arm_dma_sync_single_for_device(struct device *dev,
 		dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);
+	if (!arch_is_coherent())
+		__dma_page_cpu_to_dev(virt_to_page(dma_to_virt(dev, handle)),
+				      handle & ~PAGE_MASK, size, dir);
 }
 
 static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
@@ -567,47 +523,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
-/*
- * Make an area consistent for devices.
- * Note: Drivers should NOT use this function directly, as it will break
- * platforms with CONFIG_DMABOUNCE.
- * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
- */
-void ___dma_single_cpu_to_dev(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	unsigned long paddr;
-
-	BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1));
-
-	dmac_map_area(kaddr, size, dir);
-
-	paddr = __pa(kaddr);
-	if (dir == DMA_FROM_DEVICE) {
-		outer_inv_range(paddr, paddr + size);
-	} else {
-		outer_clean_range(paddr, paddr + size);
-	}
-	/* FIXME: non-speculating: flush on bidirectional mappings? */
-}
-EXPORT_SYMBOL(___dma_single_cpu_to_dev);
-
-void ___dma_single_dev_to_cpu(const void *kaddr, size_t size,
-	enum dma_data_direction dir)
-{
-	BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1));
-
-	/* FIXME: non-speculating: not required */
-	/* don't bother invalidating if DMA to device */
-	if (dir != DMA_TO_DEVICE) {
-		unsigned long paddr = __pa(kaddr);
-		outer_inv_range(paddr, paddr + size);
-	}
-
-	dmac_unmap_area(kaddr, size, dir);
-}
-EXPORT_SYMBOL(___dma_single_dev_to_cpu);
-
 static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	size_t size, enum dma_data_direction dir,
 	void (*op)(const void *, size_t, int))
@@ -652,7 +567,13 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	} while (left);
 }
 
-void ___dma_page_cpu_to_dev(struct page *page, unsigned long off,
+/*
+ * Make an area consistent for devices.
+ * Note: Drivers should NOT use this function directly, as it will break
+ * platforms with CONFIG_DMABOUNCE.
+ * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
+ */
+static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	unsigned long paddr;
@@ -667,9 +588,8 @@ void ___dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	}
 	/* FIXME: non-speculating: flush on bidirectional mappings? */
 }
-EXPORT_SYMBOL(___dma_page_cpu_to_dev);
 
-void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
+static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	unsigned long paddr = page_to_phys(page) + off;
@@ -687,7 +607,6 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	if (dir != DMA_TO_DEVICE && off == 0 && size >= PAGE_SIZE)
 		set_bit(PG_dcache_clean, &page->flags);
 }
-EXPORT_SYMBOL(___dma_page_dev_to_cpu);
 
 /**
  * generic_map_sg - map a set of SG buffers for streaming mode DMA
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-20  7:50   ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

Introduce new alloc/free/mmap methods that take attributes argument.
alloc/free_coherent can be implemented on top of the new alloc/free
calls with NULL attributes. dma_alloc_non_coherent can be implemented
using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
get more generic, platform independent way of allocating dma memory
buffers with specific parameters.

One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
such attribute will not have valid kernel virtual address. They might be
usefull for drivers that only exports the DMA buffers to userspace (like
for example V4L2 or ALSA).

mmap method is introduced to let the drivers create a user space mapping
for a DMA buffer in generic, architecture independent way.

TODO: update all dma_map_ops clients for all architectures

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 include/linux/dma-mapping.h |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ba8319a..08a6fa8 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -16,10 +16,15 @@ enum dma_data_direction {
 };
 
 struct dma_map_ops {
-	void* (*alloc_coherent)(struct device *dev, size_t size,
-				dma_addr_t *dma_handle, gfp_t gfp);
-	void (*free_coherent)(struct device *dev, size_t size,
-			      void *vaddr, dma_addr_t dma_handle);
+	void* (*alloc)(struct device *dev, size_t size,
+				dma_addr_t *dma_handle, gfp_t gfp,
+				struct dma_attrs *attrs);
+	void (*free)(struct device *dev, size_t size,
+			      void *vaddr, dma_addr_t dma_handle,
+			      struct dma_attrs *attrs);
+	int (*mmap)(struct device *, struct vm_area_struct *,
+			  void *, dma_addr_t, size_t, struct dma_attrs *attrs);
+
 	dma_addr_t (*map_page)(struct device *dev, struct page *page,
 			       unsigned long offset, size_t size,
 			       enum dma_data_direction dir,
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

Introduce new alloc/free/mmap methods that take attributes argument.
alloc/free_coherent can be implemented on top of the new alloc/free
calls with NULL attributes. dma_alloc_non_coherent can be implemented
using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
get more generic, platform independent way of allocating dma memory
buffers with specific parameters.

One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
such attribute will not have valid kernel virtual address. They might be
usefull for drivers that only exports the DMA buffers to userspace (like
for example V4L2 or ALSA).

mmap method is introduced to let the drivers create a user space mapping
for a DMA buffer in generic, architecture independent way.

TODO: update all dma_map_ops clients for all architectures

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 include/linux/dma-mapping.h |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ba8319a..08a6fa8 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -16,10 +16,15 @@ enum dma_data_direction {
 };
 
 struct dma_map_ops {
-	void* (*alloc_coherent)(struct device *dev, size_t size,
-				dma_addr_t *dma_handle, gfp_t gfp);
-	void (*free_coherent)(struct device *dev, size_t size,
-			      void *vaddr, dma_addr_t dma_handle);
+	void* (*alloc)(struct device *dev, size_t size,
+				dma_addr_t *dma_handle, gfp_t gfp,
+				struct dma_attrs *attrs);
+	void (*free)(struct device *dev, size_t size,
+			      void *vaddr, dma_addr_t dma_handle,
+			      struct dma_attrs *attrs);
+	int (*mmap)(struct device *, struct vm_area_struct *,
+			  void *, dma_addr_t, size_t, struct dma_attrs *attrs);
+
 	dma_addr_t (*map_page)(struct device *dev, struct page *page,
 			       unsigned long offset, size_t size,
 			       enum dma_data_direction dir,
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

Introduce new alloc/free/mmap methods that take attributes argument.
alloc/free_coherent can be implemented on top of the new alloc/free
calls with NULL attributes. dma_alloc_non_coherent can be implemented
using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
get more generic, platform independent way of allocating dma memory
buffers with specific parameters.

One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
such attribute will not have valid kernel virtual address. They might be
usefull for drivers that only exports the DMA buffers to userspace (like
for example V4L2 or ALSA).

mmap method is introduced to let the drivers create a user space mapping
for a DMA buffer in generic, architecture independent way.

TODO: update all dma_map_ops clients for all architectures

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 include/linux/dma-mapping.h |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ba8319a..08a6fa8 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -16,10 +16,15 @@ enum dma_data_direction {
 };
 
 struct dma_map_ops {
-	void* (*alloc_coherent)(struct device *dev, size_t size,
-				dma_addr_t *dma_handle, gfp_t gfp);
-	void (*free_coherent)(struct device *dev, size_t size,
-			      void *vaddr, dma_addr_t dma_handle);
+	void* (*alloc)(struct device *dev, size_t size,
+				dma_addr_t *dma_handle, gfp_t gfp,
+				struct dma_attrs *attrs);
+	void (*free)(struct device *dev, size_t size,
+			      void *vaddr, dma_addr_t dma_handle,
+			      struct dma_attrs *attrs);
+	int (*mmap)(struct device *, struct vm_area_struct *,
+			  void *, dma_addr_t, size_t, struct dma_attrs *attrs);
+
 	dma_addr_t (*map_page)(struct device *dev, struct page *page,
 			       unsigned long offset, size_t size,
 			       enum dma_data_direction dir,
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 8/8] ARM: dma-mapping: use alloc, mmap, free from dma_ops
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-20  7:50   ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch converts dma_alloc/free/mmap_{coherent,writecombine}
functions to use generic alloc/free/mmap methods from dma_map_ops
structure. A new DMA_ATTR_WRITE_COMBINE DMA attribute have been
introduced to implement writecombine methods.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/common/dmabounce.c        |    3 +
 arch/arm/include/asm/dma-mapping.h |  105 ++++++++++++++++++++++++++----------
 arch/arm/mm/dma-mapping.c          |   54 +++++++------------
 include/linux/dma-attrs.h          |    1 +
 4 files changed, 99 insertions(+), 64 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 5b411ef..c8338e3 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -452,6 +452,9 @@ static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
 }
 
 static struct dma_map_ops dmabounce_ops = {
+	.alloc			= arm_dma_alloc,
+	.free			= arm_dma_free,
+	.mmap			= arm_dma_mmap,
 	.map_page		= dmabounce_map_page,
 	.unmap_page		= dmabounce_unmap_page,
 	.sync_single_for_cpu	= dmabounce_sync_for_cpu,
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 7ceec8f..09518d9 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -5,6 +5,7 @@
 
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
+#include <linux/dma-attrs.h>
 #include <linux/dma-debug.h>
 
 #include <asm-generic/dma-coherent.h>
@@ -122,69 +123,115 @@ static inline void dma_free_noncoherent(struct device *dev, size_t size,
 }
 
 /**
- * dma_alloc_coherent - allocate consistent memory for DMA
+ * arm_dma_alloc - allocate consistent memory for DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @size: required memory size
  * @handle: bus-specific DMA address
+ * @attrs: optinal attributes that specific mapping properties
  *
- * Allocate some uncached, unbuffered memory for a device for
- * performing DMA.  This function allocates pages, and will
- * return the CPU-viewed address, and sets @handle to be the
- * device-viewed address.
+ * Allocate some memory for a device for performing DMA.  This function
+ * allocates pages, and will return the CPU-viewed address, and sets @handle
+ * to be the device-viewed address.
  */
-extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t);
+extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			   gfp_t gfp, struct dma_attrs *attrs);
+
+#define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL)
+
+static inline void *dma_alloc_attrs(struct device *dev, size_t size,
+				       dma_addr_t *dma_handle, gfp_t flag,
+				       struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	void *cpu_addr;
+	BUG_ON(!ops);
+
+	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
+	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
+	return cpu_addr;
+}
 
 /**
- * dma_free_coherent - free memory allocated by dma_alloc_coherent
+ * arm_dma_free - free memory allocated by arm_dma_alloc
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @size: size of memory originally requested in dma_alloc_coherent
  * @cpu_addr: CPU-view address returned from dma_alloc_coherent
  * @handle: device-view address returned from dma_alloc_coherent
+ * @attrs: optinal attributes that specific mapping properties
  *
  * Free (and unmap) a DMA buffer previously allocated by
- * dma_alloc_coherent().
+ * arm_dma_alloc().
  *
  * References to memory and mappings associated with cpu_addr/handle
  * during and after this call executing are illegal.
  */
-extern void dma_free_coherent(struct device *, size_t, void *, dma_addr_t);
+extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
+			 dma_addr_t handle, struct dma_attrs *attrs);
+
+#define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL)
+
+static inline void dma_free_attrs(struct device *dev, size_t size,
+				     void *cpu_addr, dma_addr_t dma_handle,
+				     struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	BUG_ON(!ops);
+
+	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
+	ops->free(dev, size, cpu_addr, dma_handle, attrs);
+}
 
 /**
- * dma_mmap_coherent - map a coherent DMA allocation into user space
+ * arm_dma_mmap - map a coherent DMA allocation into user space
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @vma: vm_area_struct describing requested user mapping
  * @cpu_addr: kernel CPU-view address returned from dma_alloc_coherent
  * @handle: device-view address returned from dma_alloc_coherent
  * @size: size of memory originally requested in dma_alloc_coherent
+ * @attrs: optinal attributes that specific mapping properties
  *
  * Map a coherent DMA buffer previously allocated by dma_alloc_coherent
  * into user space.  The coherent DMA buffer must not be freed by the
  * driver until the user space mapping has been released.
  */
-int dma_mmap_coherent(struct device *, struct vm_area_struct *,
-		void *, dma_addr_t, size_t);
+extern int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+			void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			struct dma_attrs *attrs);
 
+#define dma_mmap_coherent(d,v,c,h,s) dma_mmap_attrs(d,v,c,h,s,NULL)
 
-/**
- * dma_alloc_writecombine - allocate writecombining memory for DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @size: required memory size
- * @handle: bus-specific DMA address
- *
- * Allocate some uncached, buffered memory for a device for
- * performing DMA.  This function allocates pages, and will
- * return the CPU-viewed address, and sets @handle to be the
- * device-viewed address.
- */
-extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *,
-		gfp_t);
+static inline int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+				  void *cpu_addr, dma_addr_t dma_addr,
+				  size_t size, struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	BUG_ON(!ops);
+	return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
+}
 
-#define dma_free_writecombine(dev,size,cpu_addr,handle) \
-	dma_free_coherent(dev,size,cpu_addr,handle)
+static inline void *dma_alloc_writecombine(struct device *dev, size_t size,
+				       dma_addr_t *dma_handle, gfp_t flag)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_alloc_attrs(dev, size, dma_handle, flag, &attrs);
+}
 
-int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
-		void *, dma_addr_t, size_t);
+static inline void dma_free_writecombine(struct device *dev, size_t size,
+				     void *cpu_addr, dma_addr_t dma_handle)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_free_attrs(dev, size, cpu_addr, dma_handle, &attrs);
+}
 
+static inline int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
+		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, &attrs);
+}
 
 /*
  * For SA-1111, IXP425, and ADI systems  the dma-mapping functions are "magic"
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7c62e60..253e0b4 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -116,6 +116,9 @@ static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 }
 
 struct dma_map_ops dma_ops = {
+	.alloc			= arm_dma_alloc,
+	.free			= arm_dma_free,
+	.mmap			= arm_dma_mmap,
 	.map_page		= arm_dma_map_page,
 	.unmap_page		= arm_dma_unmap_page,
 	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
@@ -433,39 +436,36 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+		    gfp_t gfp, struct dma_attrs *attrs)
 {
+	pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			pgprot_writecombine(pgprot_kernel) :
+			pgprot_dmacoherent(pgprot_kernel);
 	void *memory;
 
 	if (dma_alloc_from_coherent(dev, size, handle, &memory))
 		return memory;
 
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_dmacoherent(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, prot);
 }
-EXPORT_SYMBOL(dma_alloc_coherent);
 
 /*
- * Allocate a writecombining region, in much the same way as
- * dma_alloc_coherent above.
+ * Create userspace mapping for the DMA-coherent memory.
  */
-void *
-dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
-{
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_writecombine(pgprot_kernel));
-}
-EXPORT_SYMBOL(dma_alloc_writecombine);
-
-static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size)
+int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		 struct dma_attrs *attrs)
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
 	unsigned long user_size, kern_size;
 	struct arm_vmregion *c;
 
+	vma->vm_page_prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			    pgprot_writecombine(vma->vm_page_prot) :
+			    pgprot_dmacoherent(vma->vm_page_prot);
+
 	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 
 	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
@@ -487,27 +487,12 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 	return ret;
 }
 
-int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
-		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
-{
-	vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
-}
-EXPORT_SYMBOL(dma_mmap_coherent);
-
-int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
-			  void *cpu_addr, dma_addr_t dma_addr, size_t size)
-{
-	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
-}
-EXPORT_SYMBOL(dma_mmap_writecombine);
-
 /*
  * free a page as defined by the above mapping.
  * Must not be called with IRQs disabled.
  */
-void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
+void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
+		  dma_addr_t handle, struct dma_attrs *attrs)
 {
 	WARN_ON(irqs_disabled());
 
@@ -521,7 +506,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 
 	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
 }
-EXPORT_SYMBOL(dma_free_coherent);
 
 static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	size_t size, enum dma_data_direction dir,
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index 71ad34e..ada61e1 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -13,6 +13,7 @@
 enum dma_attr {
 	DMA_ATTR_WRITE_BARRIER,
 	DMA_ATTR_WEAK_ORDERING,
+	DMA_ATTR_WRITE_COMBINE,
 	DMA_ATTR_MAX,
 };
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 8/8] ARM: dma-mapping: use alloc, mmap, free from dma_ops
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: Marek Szyprowski, Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

This patch converts dma_alloc/free/mmap_{coherent,writecombine}
functions to use generic alloc/free/mmap methods from dma_map_ops
structure. A new DMA_ATTR_WRITE_COMBINE DMA attribute have been
introduced to implement writecombine methods.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/common/dmabounce.c        |    3 +
 arch/arm/include/asm/dma-mapping.h |  105 ++++++++++++++++++++++++++----------
 arch/arm/mm/dma-mapping.c          |   54 +++++++------------
 include/linux/dma-attrs.h          |    1 +
 4 files changed, 99 insertions(+), 64 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 5b411ef..c8338e3 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -452,6 +452,9 @@ static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
 }
 
 static struct dma_map_ops dmabounce_ops = {
+	.alloc			= arm_dma_alloc,
+	.free			= arm_dma_free,
+	.mmap			= arm_dma_mmap,
 	.map_page		= dmabounce_map_page,
 	.unmap_page		= dmabounce_unmap_page,
 	.sync_single_for_cpu	= dmabounce_sync_for_cpu,
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 7ceec8f..09518d9 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -5,6 +5,7 @@
 
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
+#include <linux/dma-attrs.h>
 #include <linux/dma-debug.h>
 
 #include <asm-generic/dma-coherent.h>
@@ -122,69 +123,115 @@ static inline void dma_free_noncoherent(struct device *dev, size_t size,
 }
 
 /**
- * dma_alloc_coherent - allocate consistent memory for DMA
+ * arm_dma_alloc - allocate consistent memory for DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @size: required memory size
  * @handle: bus-specific DMA address
+ * @attrs: optinal attributes that specific mapping properties
  *
- * Allocate some uncached, unbuffered memory for a device for
- * performing DMA.  This function allocates pages, and will
- * return the CPU-viewed address, and sets @handle to be the
- * device-viewed address.
+ * Allocate some memory for a device for performing DMA.  This function
+ * allocates pages, and will return the CPU-viewed address, and sets @handle
+ * to be the device-viewed address.
  */
-extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t);
+extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			   gfp_t gfp, struct dma_attrs *attrs);
+
+#define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL)
+
+static inline void *dma_alloc_attrs(struct device *dev, size_t size,
+				       dma_addr_t *dma_handle, gfp_t flag,
+				       struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	void *cpu_addr;
+	BUG_ON(!ops);
+
+	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
+	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
+	return cpu_addr;
+}
 
 /**
- * dma_free_coherent - free memory allocated by dma_alloc_coherent
+ * arm_dma_free - free memory allocated by arm_dma_alloc
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @size: size of memory originally requested in dma_alloc_coherent
  * @cpu_addr: CPU-view address returned from dma_alloc_coherent
  * @handle: device-view address returned from dma_alloc_coherent
+ * @attrs: optinal attributes that specific mapping properties
  *
  * Free (and unmap) a DMA buffer previously allocated by
- * dma_alloc_coherent().
+ * arm_dma_alloc().
  *
  * References to memory and mappings associated with cpu_addr/handle
  * during and after this call executing are illegal.
  */
-extern void dma_free_coherent(struct device *, size_t, void *, dma_addr_t);
+extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
+			 dma_addr_t handle, struct dma_attrs *attrs);
+
+#define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL)
+
+static inline void dma_free_attrs(struct device *dev, size_t size,
+				     void *cpu_addr, dma_addr_t dma_handle,
+				     struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	BUG_ON(!ops);
+
+	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
+	ops->free(dev, size, cpu_addr, dma_handle, attrs);
+}
 
 /**
- * dma_mmap_coherent - map a coherent DMA allocation into user space
+ * arm_dma_mmap - map a coherent DMA allocation into user space
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @vma: vm_area_struct describing requested user mapping
  * @cpu_addr: kernel CPU-view address returned from dma_alloc_coherent
  * @handle: device-view address returned from dma_alloc_coherent
  * @size: size of memory originally requested in dma_alloc_coherent
+ * @attrs: optinal attributes that specific mapping properties
  *
  * Map a coherent DMA buffer previously allocated by dma_alloc_coherent
  * into user space.  The coherent DMA buffer must not be freed by the
  * driver until the user space mapping has been released.
  */
-int dma_mmap_coherent(struct device *, struct vm_area_struct *,
-		void *, dma_addr_t, size_t);
+extern int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+			void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			struct dma_attrs *attrs);
 
+#define dma_mmap_coherent(d,v,c,h,s) dma_mmap_attrs(d,v,c,h,s,NULL)
 
-/**
- * dma_alloc_writecombine - allocate writecombining memory for DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @size: required memory size
- * @handle: bus-specific DMA address
- *
- * Allocate some uncached, buffered memory for a device for
- * performing DMA.  This function allocates pages, and will
- * return the CPU-viewed address, and sets @handle to be the
- * device-viewed address.
- */
-extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *,
-		gfp_t);
+static inline int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+				  void *cpu_addr, dma_addr_t dma_addr,
+				  size_t size, struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	BUG_ON(!ops);
+	return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
+}
 
-#define dma_free_writecombine(dev,size,cpu_addr,handle) \
-	dma_free_coherent(dev,size,cpu_addr,handle)
+static inline void *dma_alloc_writecombine(struct device *dev, size_t size,
+				       dma_addr_t *dma_handle, gfp_t flag)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_alloc_attrs(dev, size, dma_handle, flag, &attrs);
+}
 
-int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
-		void *, dma_addr_t, size_t);
+static inline void dma_free_writecombine(struct device *dev, size_t size,
+				     void *cpu_addr, dma_addr_t dma_handle)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_free_attrs(dev, size, cpu_addr, dma_handle, &attrs);
+}
 
+static inline int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
+		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, &attrs);
+}
 
 /*
  * For SA-1111, IXP425, and ADI systems  the dma-mapping functions are "magic"
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7c62e60..253e0b4 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -116,6 +116,9 @@ static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 }
 
 struct dma_map_ops dma_ops = {
+	.alloc			= arm_dma_alloc,
+	.free			= arm_dma_free,
+	.mmap			= arm_dma_mmap,
 	.map_page		= arm_dma_map_page,
 	.unmap_page		= arm_dma_unmap_page,
 	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
@@ -433,39 +436,36 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+		    gfp_t gfp, struct dma_attrs *attrs)
 {
+	pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			pgprot_writecombine(pgprot_kernel) :
+			pgprot_dmacoherent(pgprot_kernel);
 	void *memory;
 
 	if (dma_alloc_from_coherent(dev, size, handle, &memory))
 		return memory;
 
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_dmacoherent(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, prot);
 }
-EXPORT_SYMBOL(dma_alloc_coherent);
 
 /*
- * Allocate a writecombining region, in much the same way as
- * dma_alloc_coherent above.
+ * Create userspace mapping for the DMA-coherent memory.
  */
-void *
-dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
-{
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_writecombine(pgprot_kernel));
-}
-EXPORT_SYMBOL(dma_alloc_writecombine);
-
-static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size)
+int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		 struct dma_attrs *attrs)
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
 	unsigned long user_size, kern_size;
 	struct arm_vmregion *c;
 
+	vma->vm_page_prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			    pgprot_writecombine(vma->vm_page_prot) :
+			    pgprot_dmacoherent(vma->vm_page_prot);
+
 	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 
 	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
@@ -487,27 +487,12 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 	return ret;
 }
 
-int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
-		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
-{
-	vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
-}
-EXPORT_SYMBOL(dma_mmap_coherent);
-
-int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
-			  void *cpu_addr, dma_addr_t dma_addr, size_t size)
-{
-	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
-}
-EXPORT_SYMBOL(dma_mmap_writecombine);
-
 /*
  * free a page as defined by the above mapping.
  * Must not be called with IRQs disabled.
  */
-void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
+void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
+		  dma_addr_t handle, struct dma_attrs *attrs)
 {
 	WARN_ON(irqs_disabled());
 
@@ -521,7 +506,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 
 	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
 }
-EXPORT_SYMBOL(dma_free_coherent);
 
 static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	size_t size, enum dma_data_direction dir,
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index 71ad34e..ada61e1 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -13,6 +13,7 @@
 enum dma_attr {
 	DMA_ATTR_WRITE_BARRIER,
 	DMA_ATTR_WEAK_ORDERING,
+	DMA_ATTR_WRITE_COMBINE,
 	DMA_ATTR_MAX,
 };
 
-- 
1.7.1.569.g6f426

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 8/8] ARM: dma-mapping: use alloc, mmap, free from dma_ops
@ 2011-06-20  7:50   ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

This patch converts dma_alloc/free/mmap_{coherent,writecombine}
functions to use generic alloc/free/mmap methods from dma_map_ops
structure. A new DMA_ATTR_WRITE_COMBINE DMA attribute have been
introduced to implement writecombine methods.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/common/dmabounce.c        |    3 +
 arch/arm/include/asm/dma-mapping.h |  105 ++++++++++++++++++++++++++----------
 arch/arm/mm/dma-mapping.c          |   54 +++++++------------
 include/linux/dma-attrs.h          |    1 +
 4 files changed, 99 insertions(+), 64 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 5b411ef..c8338e3 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -452,6 +452,9 @@ static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
 }
 
 static struct dma_map_ops dmabounce_ops = {
+	.alloc			= arm_dma_alloc,
+	.free			= arm_dma_free,
+	.mmap			= arm_dma_mmap,
 	.map_page		= dmabounce_map_page,
 	.unmap_page		= dmabounce_unmap_page,
 	.sync_single_for_cpu	= dmabounce_sync_for_cpu,
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 7ceec8f..09518d9 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -5,6 +5,7 @@
 
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
+#include <linux/dma-attrs.h>
 #include <linux/dma-debug.h>
 
 #include <asm-generic/dma-coherent.h>
@@ -122,69 +123,115 @@ static inline void dma_free_noncoherent(struct device *dev, size_t size,
 }
 
 /**
- * dma_alloc_coherent - allocate consistent memory for DMA
+ * arm_dma_alloc - allocate consistent memory for DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @size: required memory size
  * @handle: bus-specific DMA address
+ * @attrs: optinal attributes that specific mapping properties
  *
- * Allocate some uncached, unbuffered memory for a device for
- * performing DMA.  This function allocates pages, and will
- * return the CPU-viewed address, and sets @handle to be the
- * device-viewed address.
+ * Allocate some memory for a device for performing DMA.  This function
+ * allocates pages, and will return the CPU-viewed address, and sets @handle
+ * to be the device-viewed address.
  */
-extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t);
+extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			   gfp_t gfp, struct dma_attrs *attrs);
+
+#define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL)
+
+static inline void *dma_alloc_attrs(struct device *dev, size_t size,
+				       dma_addr_t *dma_handle, gfp_t flag,
+				       struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	void *cpu_addr;
+	BUG_ON(!ops);
+
+	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
+	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
+	return cpu_addr;
+}
 
 /**
- * dma_free_coherent - free memory allocated by dma_alloc_coherent
+ * arm_dma_free - free memory allocated by arm_dma_alloc
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @size: size of memory originally requested in dma_alloc_coherent
  * @cpu_addr: CPU-view address returned from dma_alloc_coherent
  * @handle: device-view address returned from dma_alloc_coherent
+ * @attrs: optinal attributes that specific mapping properties
  *
  * Free (and unmap) a DMA buffer previously allocated by
- * dma_alloc_coherent().
+ * arm_dma_alloc().
  *
  * References to memory and mappings associated with cpu_addr/handle
  * during and after this call executing are illegal.
  */
-extern void dma_free_coherent(struct device *, size_t, void *, dma_addr_t);
+extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
+			 dma_addr_t handle, struct dma_attrs *attrs);
+
+#define dma_free_coherent(d,s,c,h) dma_free_attrs(d,s,c,h,NULL)
+
+static inline void dma_free_attrs(struct device *dev, size_t size,
+				     void *cpu_addr, dma_addr_t dma_handle,
+				     struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	BUG_ON(!ops);
+
+	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
+	ops->free(dev, size, cpu_addr, dma_handle, attrs);
+}
 
 /**
- * dma_mmap_coherent - map a coherent DMA allocation into user space
+ * arm_dma_mmap - map a coherent DMA allocation into user space
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
  * @vma: vm_area_struct describing requested user mapping
  * @cpu_addr: kernel CPU-view address returned from dma_alloc_coherent
  * @handle: device-view address returned from dma_alloc_coherent
  * @size: size of memory originally requested in dma_alloc_coherent
+ * @attrs: optinal attributes that specific mapping properties
  *
  * Map a coherent DMA buffer previously allocated by dma_alloc_coherent
  * into user space.  The coherent DMA buffer must not be freed by the
  * driver until the user space mapping has been released.
  */
-int dma_mmap_coherent(struct device *, struct vm_area_struct *,
-		void *, dma_addr_t, size_t);
+extern int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+			void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			struct dma_attrs *attrs);
 
+#define dma_mmap_coherent(d,v,c,h,s) dma_mmap_attrs(d,v,c,h,s,NULL)
 
-/**
- * dma_alloc_writecombine - allocate writecombining memory for DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @size: required memory size
- * @handle: bus-specific DMA address
- *
- * Allocate some uncached, buffered memory for a device for
- * performing DMA.  This function allocates pages, and will
- * return the CPU-viewed address, and sets @handle to be the
- * device-viewed address.
- */
-extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *,
-		gfp_t);
+static inline int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+				  void *cpu_addr, dma_addr_t dma_addr,
+				  size_t size, struct dma_attrs *attrs)
+{
+	struct dma_map_ops *ops = get_dma_ops(dev);
+	BUG_ON(!ops);
+	return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
+}
 
-#define dma_free_writecombine(dev,size,cpu_addr,handle) \
-	dma_free_coherent(dev,size,cpu_addr,handle)
+static inline void *dma_alloc_writecombine(struct device *dev, size_t size,
+				       dma_addr_t *dma_handle, gfp_t flag)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_alloc_attrs(dev, size, dma_handle, flag, &attrs);
+}
 
-int dma_mmap_writecombine(struct device *, struct vm_area_struct *,
-		void *, dma_addr_t, size_t);
+static inline void dma_free_writecombine(struct device *dev, size_t size,
+				     void *cpu_addr, dma_addr_t dma_handle)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_free_attrs(dev, size, cpu_addr, dma_handle, &attrs);
+}
 
+static inline int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
+		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
+{
+	DEFINE_DMA_ATTRS(attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs);
+	return dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, &attrs);
+}
 
 /*
  * For SA-1111, IXP425, and ADI systems  the dma-mapping functions are "magic"
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7c62e60..253e0b4 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -116,6 +116,9 @@ static int arm_dma_set_mask(struct device *dev, u64 dma_mask)
 }
 
 struct dma_map_ops dma_ops = {
+	.alloc			= arm_dma_alloc,
+	.free			= arm_dma_free,
+	.mmap			= arm_dma_mmap,
 	.map_page		= arm_dma_map_page,
 	.unmap_page		= arm_dma_unmap_page,
 	.sync_single_for_cpu	= arm_dma_sync_single_for_cpu,
@@ -433,39 +436,36 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+		    gfp_t gfp, struct dma_attrs *attrs)
 {
+	pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			pgprot_writecombine(pgprot_kernel) :
+			pgprot_dmacoherent(pgprot_kernel);
 	void *memory;
 
 	if (dma_alloc_from_coherent(dev, size, handle, &memory))
 		return memory;
 
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_dmacoherent(pgprot_kernel));
+	return __dma_alloc(dev, size, handle, gfp, prot);
 }
-EXPORT_SYMBOL(dma_alloc_coherent);
 
 /*
- * Allocate a writecombining region, in much the same way as
- * dma_alloc_coherent above.
+ * Create userspace mapping for the DMA-coherent memory.
  */
-void *
-dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
-{
-	return __dma_alloc(dev, size, handle, gfp,
-			   pgprot_writecombine(pgprot_kernel));
-}
-EXPORT_SYMBOL(dma_alloc_writecombine);
-
-static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size)
+int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
+		 struct dma_attrs *attrs)
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
 	unsigned long user_size, kern_size;
 	struct arm_vmregion *c;
 
+	vma->vm_page_prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			    pgprot_writecombine(vma->vm_page_prot) :
+			    pgprot_dmacoherent(vma->vm_page_prot);
+
 	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 
 	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
@@ -487,27 +487,12 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 	return ret;
 }
 
-int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
-		      void *cpu_addr, dma_addr_t dma_addr, size_t size)
-{
-	vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
-}
-EXPORT_SYMBOL(dma_mmap_coherent);
-
-int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
-			  void *cpu_addr, dma_addr_t dma_addr, size_t size)
-{
-	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
-	return dma_mmap(dev, vma, cpu_addr, dma_addr, size);
-}
-EXPORT_SYMBOL(dma_mmap_writecombine);
-
 /*
  * free a page as defined by the above mapping.
  * Must not be called with IRQs disabled.
  */
-void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
+void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
+		  dma_addr_t handle, struct dma_attrs *attrs)
 {
 	WARN_ON(irqs_disabled());
 
@@ -521,7 +506,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 
 	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
 }
-EXPORT_SYMBOL(dma_free_coherent);
 
 static void dma_cache_maint_page(struct page *page, unsigned long offset,
 	size_t size, enum dma_data_direction dir,
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index 71ad34e..ada61e1 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -13,6 +13,7 @@
 enum dma_attr {
 	DMA_ATTR_WRITE_BARRIER,
 	DMA_ATTR_WEAK_ORDERING,
+	DMA_ATTR_WRITE_COMBINE,
 	DMA_ATTR_MAX,
 };
 
-- 
1.7.1.569.g6f426

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
  2011-06-20  7:50   ` Marek Szyprowski
  (?)
@ 2011-06-20  8:35     ` Michal Nazarewicz
  -1 siblings, 0 replies; 201+ messages in thread
From: Michal Nazarewicz @ 2011-06-20  8:35 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch, Marek Szyprowski
  Cc: Kyungmin Park, Arnd Bergmann, Joerg Roedel, Russell King - ARM Linux

On Mon, 20 Jun 2011 09:50:06 +0200, Marek Szyprowski  
<m.szyprowski@samsung.com> wrote:
> +static inline void dma_sync_single_for_cpu(struct device *dev,

I wouldn't really put inline here or in the function below.

> +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> +{
> +	BUG_ON(!valid_dma_direction(dir));
> +
> +	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
> +
> +	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
> +		return;
> +
> +	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

I know it is just copy'n'paste but how about:

	if (dmabounce_sync_for_cpu(dev, handle, size, dir))
		__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

> +}
> +
> +static inline void dma_sync_single_for_device(struct device *dev,
> +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> +{
> +	BUG_ON(!valid_dma_direction(dir));
> +
> +	debug_dma_sync_single_for_device(dev, handle, size, dir);
> +
> +	if (!dmabounce_sync_for_device(dev, handle, size, dir))
> +		return;
> +
> +	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);

Same as above.

> +}
> +
>  /**
>   * dma_sync_single_range_for_cpu
>   * @dev: valid struct device pointer, or NULL for ISA and EISA-like  
> devices

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-06-20  8:35     ` Michal Nazarewicz
  0 siblings, 0 replies; 201+ messages in thread
From: Michal Nazarewicz @ 2011-06-20  8:35 UTC (permalink / raw)
  To: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch, Marek Szyprowski
  Cc: Kyungmin Park, Arnd Bergmann, Joerg Roedel, Russell King - ARM Linux

On Mon, 20 Jun 2011 09:50:06 +0200, Marek Szyprowski  
<m.szyprowski@samsung.com> wrote:
> +static inline void dma_sync_single_for_cpu(struct device *dev,

I wouldn't really put inline here or in the function below.

> +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> +{
> +	BUG_ON(!valid_dma_direction(dir));
> +
> +	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
> +
> +	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
> +		return;
> +
> +	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

I know it is just copy'n'paste but how about:

	if (dmabounce_sync_for_cpu(dev, handle, size, dir))
		__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

> +}
> +
> +static inline void dma_sync_single_for_device(struct device *dev,
> +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> +{
> +	BUG_ON(!valid_dma_direction(dir));
> +
> +	debug_dma_sync_single_for_device(dev, handle, size, dir);
> +
> +	if (!dmabounce_sync_for_device(dev, handle, size, dir))
> +		return;
> +
> +	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);

Same as above.

> +}
> +
>  /**
>   * dma_sync_single_range_for_cpu
>   * @dev: valid struct device pointer, or NULL for ISA and EISA-like  
> devices

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-06-20  8:35     ` Michal Nazarewicz
  0 siblings, 0 replies; 201+ messages in thread
From: Michal Nazarewicz @ 2011-06-20  8:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 20 Jun 2011 09:50:06 +0200, Marek Szyprowski  
<m.szyprowski@samsung.com> wrote:
> +static inline void dma_sync_single_for_cpu(struct device *dev,

I wouldn't really put inline here or in the function below.

> +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> +{
> +	BUG_ON(!valid_dma_direction(dir));
> +
> +	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
> +
> +	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
> +		return;
> +
> +	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

I know it is just copy'n'paste but how about:

	if (dmabounce_sync_for_cpu(dev, handle, size, dir))
		__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

> +}
> +
> +static inline void dma_sync_single_for_device(struct device *dev,
> +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> +{
> +	BUG_ON(!valid_dma_direction(dir));
> +
> +	debug_dma_sync_single_for_device(dev, handle, size, dir);
> +
> +	if (!dmabounce_sync_for_device(dev, handle, size, dir))
> +		return;
> +
> +	__dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir);

Same as above.

> +}
> +
>  /**
>   * dma_sync_single_range_for_cpu
>   * @dev: valid struct device pointer, or NULL for ISA and EISA-like  
> devices

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-06-20 10:46       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 10:46 UTC (permalink / raw)
  To: 'Michal Nazarewicz',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: 'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel', 'Russell King - ARM Linux',
	Marek Szyprowski

Hello,

On Monday, June 20, 2011 10:36 AM Michal Nazarewicz wrote:

> On Mon, 20 Jun 2011 09:50:06 +0200, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > +static inline void dma_sync_single_for_cpu(struct device *dev,
> 
> I wouldn't really put inline here or in the function below.
> 
> > +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> > +{
> > +	BUG_ON(!valid_dma_direction(dir));
> > +
> > +	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
> > +
> > +	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
> > +		return;
> > +
> > +	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
> 
> I know it is just copy'n'paste but how about:

This patch is just about moving the code between the files and I wanted just
to show what's being changed and how. There is a final cleanup anyway in the
separate patch. All these patches are meant to start the discussion about
the way the dma mapping can be redesigned for further extensions with generic
iommu support. 

> 
> 	if (dmabounce_sync_for_cpu(dev, handle, size, dir))
> 		__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

The above lines will be removed by the next patches in this series, so I
really see no point in changing this.

(snipped)

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-06-20 10:46       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 10:46 UTC (permalink / raw)
  To: 'Michal Nazarewicz',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch
  Cc: 'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel', 'Russell King - ARM Linux',
	Marek Szyprowski

Hello,

On Monday, June 20, 2011 10:36 AM Michal Nazarewicz wrote:

> On Mon, 20 Jun 2011 09:50:06 +0200, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > +static inline void dma_sync_single_for_cpu(struct device *dev,
> 
> I wouldn't really put inline here or in the function below.
> 
> > +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> > +{
> > +	BUG_ON(!valid_dma_direction(dir));
> > +
> > +	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
> > +
> > +	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
> > +		return;
> > +
> > +	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
> 
> I know it is just copy'n'paste but how about:

This patch is just about moving the code between the files and I wanted just
to show what's being changed and how. There is a final cleanup anyway in the
separate patch. All these patches are meant to start the discussion about
the way the dma mapping can be redesigned for further extensions with generic
iommu support. 

> 
> 	if (dmabounce_sync_for_cpu(dev, handle, size, dir))
> 		__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

The above lines will be removed by the next patches in this series, so I
really see no point in changing this.

(snipped)

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-06-20 10:46       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 20, 2011 10:36 AM Michal Nazarewicz wrote:

> On Mon, 20 Jun 2011 09:50:06 +0200, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > +static inline void dma_sync_single_for_cpu(struct device *dev,
> 
> I wouldn't really put inline here or in the function below.
> 
> > +		dma_addr_t handle, size_t size, enum dma_data_direction dir)
> > +{
> > +	BUG_ON(!valid_dma_direction(dir));
> > +
> > +	debug_dma_sync_single_for_cpu(dev, handle, size, dir);
> > +
> > +	if (!dmabounce_sync_for_cpu(dev, handle, size, dir))
> > +		return;
> > +
> > +	__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);
> 
> I know it is just copy'n'paste but how about:

This patch is just about moving the code between the files and I wanted just
to show what's being changed and how. There is a final cleanup anyway in the
separate patch. All these patches are meant to start the discussion about
the way the dma mapping can be redesigned for further extensions with generic
iommu support. 

> 
> 	if (dmabounce_sync_for_cpu(dev, handle, size, dir))
> 		__dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir);

The above lines will be removed by the next patches in this series, so I
really see no point in changing this.

(snipped)

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
  2011-06-20  7:50   ` Marek Szyprowski
  (?)
@ 2011-06-20 14:33     ` KyongHo Cho
  -1 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:33 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Russell King - ARM Linux, Joerg Roedel,
	Arnd Bergmann

Hi.

Great job.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> +static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
> +{
> +       dev->archdata.dma_ops = ops;
> +}
> +

Who calls set_dma_ops()?
In the mach. initialization part?
What if a device driver does not want to use arch's dma_map_ops
when machine init procedure set a dma_map_ops?
Even though, may arch defiens their dma_map_ops in archdata of device structure,
I think it is not a good idea that is device structure contains a
pointer to dma_map_ops
that may not be common to all devices in a board.

I also think that it is better to attach and to detach dma_map_ops dynamically.
Moreover, a mapping is not permanent in our Exynos platform
because a System MMU may be turned off while runtime.

DMA API must come with IOMMU API to initialize IOMMU in runtime.

Regards,
Cho KyongHo.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-20 14:33     ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:33 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Russell King - ARM Linux, Joerg Roedel,
	Arnd Bergmann

Hi.

Great job.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> +static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
> +{
> +       dev->archdata.dma_ops = ops;
> +}
> +

Who calls set_dma_ops()?
In the mach. initialization part?
What if a device driver does not want to use arch's dma_map_ops
when machine init procedure set a dma_map_ops?
Even though, may arch defiens their dma_map_ops in archdata of device structure,
I think it is not a good idea that is device structure contains a
pointer to dma_map_ops
that may not be common to all devices in a board.

I also think that it is better to attach and to detach dma_map_ops dynamically.
Moreover, a mapping is not permanent in our Exynos platform
because a System MMU may be turned off while runtime.

DMA API must come with IOMMU API to initialize IOMMU in runtime.

Regards,
Cho KyongHo.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-20 14:33     ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi.

Great job.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> +static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
> +{
> + ? ? ? dev->archdata.dma_ops = ops;
> +}
> +

Who calls set_dma_ops()?
In the mach. initialization part?
What if a device driver does not want to use arch's dma_map_ops
when machine init procedure set a dma_map_ops?
Even though, may arch defiens their dma_map_ops in archdata of device structure,
I think it is not a good idea that is device structure contains a
pointer to dma_map_ops
that may not be common to all devices in a board.

I also think that it is better to attach and to detach dma_map_ops dynamically.
Moreover, a mapping is not permanent in our Exynos platform
because a System MMU may be turned off while runtime.

DMA API must come with IOMMU API to initialize IOMMU in runtime.

Regards,
Cho KyongHo.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
  2011-06-20  7:50   ` Marek Szyprowski
  (?)
@ 2011-06-20 14:37     ` KyongHo Cho
  -1 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:37 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> -extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> +extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
>                enum dma_data_direction, struct dma_attrs *attrs);
> -extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> +extern void generic_dma_unmap_sg(struct device *, struct scatterlist *, int,
>                enum dma_data_direction, struct dma_attrs *attrs);
> -extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
> +extern void generic_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
>                enum dma_data_direction);
> -extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
> +extern void generic_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
>                enum dma_data_direction);
>

I don't understand why you changed arm_dma_*() with generic_dma_*()
even though they're functionality is not changed
and they are still specific to ARM.
They look like that they are generic in the kernel code.

Regards,
Cho KyongHo.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20 14:37     ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:37 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> -extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> +extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
>                enum dma_data_direction, struct dma_attrs *attrs);
> -extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> +extern void generic_dma_unmap_sg(struct device *, struct scatterlist *, int,
>                enum dma_data_direction, struct dma_attrs *attrs);
> -extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
> +extern void generic_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
>                enum dma_data_direction);
> -extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
> +extern void generic_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
>                enum dma_data_direction);
>

I don't understand why you changed arm_dma_*() with generic_dma_*()
even though they're functionality is not changed
and they are still specific to ARM.
They look like that they are generic in the kernel code.

Regards,
Cho KyongHo.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20 14:37     ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> -extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> +extern int generic_dma_map_sg(struct device *, struct scatterlist *, int,
> ? ? ? ? ? ? ? ?enum dma_data_direction, struct dma_attrs *attrs);
> -extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> +extern void generic_dma_unmap_sg(struct device *, struct scatterlist *, int,
> ? ? ? ? ? ? ? ?enum dma_data_direction, struct dma_attrs *attrs);
> -extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
> +extern void generic_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
> ? ? ? ? ? ? ? ?enum dma_data_direction);
> -extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
> +extern void generic_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
> ? ? ? ? ? ? ? ?enum dma_data_direction);
>

I don't understand why you changed arm_dma_*() with generic_dma_*()
even though they're functionality is not changed
and they are still specific to ARM.
They look like that they are generic in the kernel code.

Regards,
Cho KyongHo.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-20 14:39     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel

On Mon, Jun 20, 2011 at 09:50:07AM +0200, Marek Szyprowski wrote:
> This patch consolidates dma_map_single and dma_map_page calls. This is
> required to let dma-mapping framework on ARM architecture use common,
> generic dma-mapping helpers.

This breaks DMA API debugging, which requires that dma_map_page and
dma_unmap_page are paired separately from dma_map_single and
dma_unmap_single().

This also breaks dmabounce when used with a highmem-enabled system -
dmabounce refuses the dma_map_page() API but allows the dma_map_single()
API.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-20 14:39     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel

On Mon, Jun 20, 2011 at 09:50:07AM +0200, Marek Szyprowski wrote:
> This patch consolidates dma_map_single and dma_map_page calls. This is
> required to let dma-mapping framework on ARM architecture use common,
> generic dma-mapping helpers.

This breaks DMA API debugging, which requires that dma_map_page and
dma_unmap_page are paired separately from dma_map_single and
dma_unmap_single().

This also breaks dmabounce when used with a highmem-enabled system -
dmabounce refuses the dma_map_page() API but allows the dma_map_single()
API.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-20 14:39     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 20, 2011 at 09:50:07AM +0200, Marek Szyprowski wrote:
> This patch consolidates dma_map_single and dma_map_page calls. This is
> required to let dma-mapping framework on ARM architecture use common,
> generic dma-mapping helpers.

This breaks DMA API debugging, which requires that dma_map_page and
dma_unmap_page are paired separately from dma_map_single and
dma_unmap_single().

This also breaks dmabounce when used with a highmem-enabled system -
dmabounce refuses the dma_map_page() API but allows the dma_map_single()
API.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20 14:40     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:40 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel

On Mon, Jun 20, 2011 at 09:50:09AM +0200, Marek Szyprowski wrote:
> This patch converts all dma_sg methods to be generic (independent of the
> current DMA mapping implementation for ARM architecture). All dma sg
> operations are now implemented on top of respective
> dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.

No.  We really don't want to do this.  If we want to move the dsb() out
of the mapping functions (which I have a patch for) to avoid doing a
dsb() on each and every sg segment, then we must not use the generic
stuff.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20 14:40     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:40 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel

On Mon, Jun 20, 2011 at 09:50:09AM +0200, Marek Szyprowski wrote:
> This patch converts all dma_sg methods to be generic (independent of the
> current DMA mapping implementation for ARM architecture). All dma sg
> operations are now implemented on top of respective
> dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.

No.  We really don't want to do this.  If we want to move the dsb() out
of the mapping functions (which I have a patch for) to avoid doing a
dsb() on each and every sg segment, then we must not use the generic
stuff.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20 14:40     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 20, 2011 at 09:50:09AM +0200, Marek Szyprowski wrote:
> This patch converts all dma_sg methods to be generic (independent of the
> current DMA mapping implementation for ARM architecture). All dma sg
> operations are now implemented on top of respective
> dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.

No.  We really don't want to do this.  If we want to move the dsb() out
of the mapping functions (which I have a patch for) to avoid doing a
dsb() on each and every sg segment, then we must not use the generic
stuff.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
  2011-06-20  7:50   ` Marek Szyprowski
  (?)
@ 2011-06-20 14:42     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Joerg Roedel, Arnd Bergmann

On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> This patch removes dma bounce hooks from the common dma mapping
> implementation on ARM architecture and creates a separate set of
> dma_map_ops for dma bounce devices.

Why all this additional indirection for no gain?

> @@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
>  		 * We don't need to sync the DMA buffer since
>  		 * it was allocated via the coherent allocators.
>  		 */
> -		__dma_single_cpu_to_dev(ptr, size, dir);
> +		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
>  	}
>  
>  	return dma_addr;
> @@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
>  		}
>  		free_safe_buffer(dev->archdata.dmabounce, buf);
>  	} else {
> -		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
> +		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
>  	}
>  }
>  

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-20 14:42     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Joerg Roedel, Arnd Bergmann

On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> This patch removes dma bounce hooks from the common dma mapping
> implementation on ARM architecture and creates a separate set of
> dma_map_ops for dma bounce devices.

Why all this additional indirection for no gain?

> @@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
>  		 * We don't need to sync the DMA buffer since
>  		 * it was allocated via the coherent allocators.
>  		 */
> -		__dma_single_cpu_to_dev(ptr, size, dir);
> +		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
>  	}
>  
>  	return dma_addr;
> @@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
>  		}
>  		free_safe_buffer(dev->archdata.dmabounce, buf);
>  	} else {
> -		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
> +		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
>  	}
>  }
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-20 14:42     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> This patch removes dma bounce hooks from the common dma mapping
> implementation on ARM architecture and creates a separate set of
> dma_map_ops for dma bounce devices.

Why all this additional indirection for no gain?

> @@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size,
>  		 * We don't need to sync the DMA buffer since
>  		 * it was allocated via the coherent allocators.
>  		 */
> -		__dma_single_cpu_to_dev(ptr, size, dir);
> +		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
>  	}
>  
>  	return dma_addr;
> @@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev, dma_addr_t dma_addr,
>  		}
>  		free_safe_buffer(dev->archdata.dmabounce, buf);
>  	} else {
> -		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
> +		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
>  	}
>  }
>  

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-20  7:50   ` Marek Szyprowski
  (?)
@ 2011-06-20 14:45     ` KyongHo Cho
  -1 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:45 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:

>  struct dma_map_ops {
> -       void* (*alloc_coherent)(struct device *dev, size_t size,
> -                               dma_addr_t *dma_handle, gfp_t gfp);
> -       void (*free_coherent)(struct device *dev, size_t size,
> -                             void *vaddr, dma_addr_t dma_handle);
> +       void* (*alloc)(struct device *dev, size_t size,
> +                               dma_addr_t *dma_handle, gfp_t gfp,
> +                               struct dma_attrs *attrs);
> +       void (*free)(struct device *dev, size_t size,
> +                             void *vaddr, dma_addr_t dma_handle,
> +                             struct dma_attrs *attrs);
> +       int (*mmap)(struct device *, struct vm_area_struct *,
> +                         void *, dma_addr_t, size_t, struct dma_attrs *attrs);
> +
>        dma_addr_t (*map_page)(struct device *dev, struct page *page,
>                               unsigned long offset, size_t size,
>                               enum dma_data_direction dir,

I still don't agree with your idea that change alloc_coherent() with alloc().
As I said before, we actually do not need dma_alloc_writecombine() anymore
because it is not different from dma_alloc_coherent() in ARM.
Most of other architectures do not have dma_alloc_writecombine().
If you want dma_alloc_coherent() to allocate user virtual address,
I believe that it is also available with mmap() you introduced.

Regards,
Cho KyongHo.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20 14:45     ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:45 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel,
	Russell King - ARM Linux

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:

>  struct dma_map_ops {
> -       void* (*alloc_coherent)(struct device *dev, size_t size,
> -                               dma_addr_t *dma_handle, gfp_t gfp);
> -       void (*free_coherent)(struct device *dev, size_t size,
> -                             void *vaddr, dma_addr_t dma_handle);
> +       void* (*alloc)(struct device *dev, size_t size,
> +                               dma_addr_t *dma_handle, gfp_t gfp,
> +                               struct dma_attrs *attrs);
> +       void (*free)(struct device *dev, size_t size,
> +                             void *vaddr, dma_addr_t dma_handle,
> +                             struct dma_attrs *attrs);
> +       int (*mmap)(struct device *, struct vm_area_struct *,
> +                         void *, dma_addr_t, size_t, struct dma_attrs *attrs);
> +
>        dma_addr_t (*map_page)(struct device *dev, struct page *page,
>                               unsigned long offset, size_t size,
>                               enum dma_data_direction dir,

I still don't agree with your idea that change alloc_coherent() with alloc().
As I said before, we actually do not need dma_alloc_writecombine() anymore
because it is not different from dma_alloc_coherent() in ARM.
Most of other architectures do not have dma_alloc_writecombine().
If you want dma_alloc_coherent() to allocate user virtual address,
I believe that it is also available with mmap() you introduced.

Regards,
Cho KyongHo.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20 14:45     ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:

> ?struct dma_map_ops {
> - ? ? ? void* (*alloc_coherent)(struct device *dev, size_t size,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? dma_addr_t *dma_handle, gfp_t gfp);
> - ? ? ? void (*free_coherent)(struct device *dev, size_t size,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? void *vaddr, dma_addr_t dma_handle);
> + ? ? ? void* (*alloc)(struct device *dev, size_t size,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? dma_addr_t *dma_handle, gfp_t gfp,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct dma_attrs *attrs);
> + ? ? ? void (*free)(struct device *dev, size_t size,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? void *vaddr, dma_addr_t dma_handle,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct dma_attrs *attrs);
> + ? ? ? int (*mmap)(struct device *, struct vm_area_struct *,
> + ? ? ? ? ? ? ? ? ? ? ? ? void *, dma_addr_t, size_t, struct dma_attrs *attrs);
> +
> ? ? ? ?dma_addr_t (*map_page)(struct device *dev, struct page *page,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long offset, size_t size,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum dma_data_direction dir,

I still don't agree with your idea that change alloc_coherent() with alloc().
As I said before, we actually do not need dma_alloc_writecombine() anymore
because it is not different from dma_alloc_coherent() in ARM.
Most of other architectures do not have dma_alloc_writecombine().
If you want dma_alloc_coherent() to allocate user virtual address,
I believe that it is also available with mmap() you introduced.

Regards,
Cho KyongHo.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20 15:06       ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 15:06 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: linux-arch, Arnd Bergmann, Joerg Roedel, linaro-mm-sig, linux-mm,
	Kyungmin Park, linux-arm-kernel, Marek Szyprowski

On Mon, Jun 20, 2011 at 11:45:41PM +0900, KyongHo Cho wrote:
> I still don't agree with your idea that change alloc_coherent() with alloc().
> As I said before, we actually do not need dma_alloc_writecombine() anymore
> because it is not different from dma_alloc_coherent() in ARM.

Wrong - there is a difference.  For pre-ARMv6 CPUs, it returns memory
with different attributes from DMA coherent memory.

And we're not going to sweep away pre-ARMv6 CPUs any time soon.  So
you can't ignore dma_alloc_writecombine() which must remain to sanely
support framebuffers.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20 15:06       ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 15:06 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: Marek Szyprowski, linux-arm-kernel, linaro-mm-sig, linux-mm,
	linux-arch, Kyungmin Park, Arnd Bergmann, Joerg Roedel

On Mon, Jun 20, 2011 at 11:45:41PM +0900, KyongHo Cho wrote:
> I still don't agree with your idea that change alloc_coherent() with alloc().
> As I said before, we actually do not need dma_alloc_writecombine() anymore
> because it is not different from dma_alloc_coherent() in ARM.

Wrong - there is a difference.  For pre-ARMv6 CPUs, it returns memory
with different attributes from DMA coherent memory.

And we're not going to sweep away pre-ARMv6 CPUs any time soon.  So
you can't ignore dma_alloc_writecombine() which must remain to sanely
support framebuffers.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20 15:06       ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 15:06 UTC (permalink / raw)
  To: KyongHo Cho
  Cc: Marek Szyprowski, linux-arm-kernel, linaro-mm-sig, linux-mm,
	linux-arch, Kyungmin Park, Arnd Bergmann, Joerg Roedel

On Mon, Jun 20, 2011 at 11:45:41PM +0900, KyongHo Cho wrote:
> I still don't agree with your idea that change alloc_coherent() with alloc().
> As I said before, we actually do not need dma_alloc_writecombine() anymore
> because it is not different from dma_alloc_coherent() in ARM.

Wrong - there is a difference.  For pre-ARMv6 CPUs, it returns memory
with different attributes from DMA coherent memory.

And we're not going to sweep away pre-ARMv6 CPUs any time soon.  So
you can't ignore dma_alloc_writecombine() which must remain to sanely
support framebuffers.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20 15:06       ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-06-20 15:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 20, 2011 at 11:45:41PM +0900, KyongHo Cho wrote:
> I still don't agree with your idea that change alloc_coherent() with alloc().
> As I said before, we actually do not need dma_alloc_writecombine() anymore
> because it is not different from dma_alloc_coherent() in ARM.

Wrong - there is a difference.  For pre-ARMv6 CPUs, it returns memory
with different attributes from DMA coherent memory.

And we're not going to sweep away pre-ARMv6 CPUs any time soon.  So
you can't ignore dma_alloc_writecombine() which must remain to sanely
support framebuffers.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-20 15:06       ` Russell King - ARM Linux
  (?)
@ 2011-06-20 15:14         ` KyongHo Cho
  -1 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 15:14 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, Arnd Bergmann, Joerg Roedel, linaro-mm-sig, linux-mm,
	Kyungmin Park, linux-arm-kernel

On Tue, Jun 21, 2011 at 12:06 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> Wrong - there is a difference.  For pre-ARMv6 CPUs, it returns memory
> with different attributes from DMA coherent memory.
>
> And we're not going to sweep away pre-ARMv6 CPUs any time soon.  So
> you can't ignore dma_alloc_writecombine() which must remain to sanely
> support framebuffers.
>
OK. Thanks.

Then, I think we can implement dma_alloc_writecombine() away from dma_map_ops.
IMHO, those devices that use dma_alloc_writecombine() are enough with
the default dma_map_ops.
Removing a member from dma_map_ops is too heavy work.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20 15:14         ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 15:14 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, Arnd Bergmann, Joerg Roedel, linaro-mm-sig, linux-mm,
	Kyungmin Park, linux-arm-kernel

On Tue, Jun 21, 2011 at 12:06 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> Wrong - there is a difference.  For pre-ARMv6 CPUs, it returns memory
> with different attributes from DMA coherent memory.
>
> And we're not going to sweep away pre-ARMv6 CPUs any time soon.  So
> you can't ignore dma_alloc_writecombine() which must remain to sanely
> support framebuffers.
>
OK. Thanks.

Then, I think we can implement dma_alloc_writecombine() away from dma_map_ops.
IMHO, those devices that use dma_alloc_writecombine() are enough with
the default dma_map_ops.
Removing a member from dma_map_ops is too heavy work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-20 15:14         ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-20 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 21, 2011 at 12:06 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> Wrong - there is a difference. ?For pre-ARMv6 CPUs, it returns memory
> with different attributes from DMA coherent memory.
>
> And we're not going to sweep away pre-ARMv6 CPUs any time soon. ?So
> you can't ignore dma_alloc_writecombine() which must remain to sanely
> support framebuffers.
>
OK. Thanks.

Then, I think we can implement dma_alloc_writecombine() away from dma_map_ops.
IMHO, those devices that use dma_alloc_writecombine() are enough with
the default dma_map_ops.
Removing a member from dma_map_ops is too heavy work.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
  2011-06-20 14:39     ` Russell King - ARM Linux
  (?)
@ 2011-06-20 15:15       ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:15 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel',
	Marek Szyprowski

Hello,

On Monday, June 20, 2011 4:39 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:07AM +0200, Marek Szyprowski wrote:
> > This patch consolidates dma_map_single and dma_map_page calls. This is
> > required to let dma-mapping framework on ARM architecture use common,
> > generic dma-mapping helpers.
> 
> This breaks DMA API debugging, which requires that dma_map_page and
> dma_unmap_page are paired separately from dma_map_single and
> dma_unmap_single().

Ok, right. This can be fixed by creating appropriate static inline functions
in dma-mapping.h and moving dma_debug_* calls there. These function will be
later removed by using dma_map_ops and include/asm-generic/dma-mapping-common.h
inlines, which do all the dma_debug_* calls correctly anyway. 

> This also breaks dmabounce when used with a highmem-enabled system -
> dmabounce refuses the dma_map_page() API but allows the dma_map_single()
> API.

I really not sure how this change will break dma bounce code. 

Does it mean that it is allowed to call dma_map_single() on kmapped HIGH_MEM 
page?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-20 15:15       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:15 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel',
	Marek Szyprowski

Hello,

On Monday, June 20, 2011 4:39 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:07AM +0200, Marek Szyprowski wrote:
> > This patch consolidates dma_map_single and dma_map_page calls. This is
> > required to let dma-mapping framework on ARM architecture use common,
> > generic dma-mapping helpers.
> 
> This breaks DMA API debugging, which requires that dma_map_page and
> dma_unmap_page are paired separately from dma_map_single and
> dma_unmap_single().

Ok, right. This can be fixed by creating appropriate static inline functions
in dma-mapping.h and moving dma_debug_* calls there. These function will be
later removed by using dma_map_ops and include/asm-generic/dma-mapping-common.h
inlines, which do all the dma_debug_* calls correctly anyway. 

> This also breaks dmabounce when used with a highmem-enabled system -
> dmabounce refuses the dma_map_page() API but allows the dma_map_single()
> API.

I really not sure how this change will break dma bounce code. 

Does it mean that it is allowed to call dma_map_single() on kmapped HIGH_MEM 
page?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-20 15:15       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 20, 2011 4:39 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:07AM +0200, Marek Szyprowski wrote:
> > This patch consolidates dma_map_single and dma_map_page calls. This is
> > required to let dma-mapping framework on ARM architecture use common,
> > generic dma-mapping helpers.
> 
> This breaks DMA API debugging, which requires that dma_map_page and
> dma_unmap_page are paired separately from dma_map_single and
> dma_unmap_single().

Ok, right. This can be fixed by creating appropriate static inline functions
in dma-mapping.h and moving dma_debug_* calls there. These function will be
later removed by using dma_map_ops and include/asm-generic/dma-mapping-common.h
inlines, which do all the dma_debug_* calls correctly anyway. 

> This also breaks dmabounce when used with a highmem-enabled system -
> dmabounce refuses the dma_map_page() API but allows the dma_map_single()
> API.

I really not sure how this change will break dma bounce code. 

Does it mean that it is allowed to call dma_map_single() on kmapped HIGH_MEM 
page?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
  2011-06-20 14:40     ` Russell King - ARM Linux
  (?)
@ 2011-06-20 15:23       ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:23 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel',
	Marek Szyprowski

Hello,

On Monday, June 20, 2011 4:40 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:09AM +0200, Marek Szyprowski wrote:
> > This patch converts all dma_sg methods to be generic (independent of the
> > current DMA mapping implementation for ARM architecture). All dma sg
> > operations are now implemented on top of respective
> > dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.
> 
> No.  We really don't want to do this.

I assume you want to keep the current design for performance reasons?

It's really not a problem for me. I can change my patches to keep 
arm_dma_*_sg_* functions and create some stubs for dmabounce version.

> If we want to move the dsb() out of the mapping functions (which I
> have a patch for) to avoid doing a dsb() on each and every sg segment,
> then we must not use the generic stuff.

Ok, specialized (optimized) version of dma_*_sg_* operations are definitely
better. The current version just calls respective dma_single_* operations in
a loop, so in my patches I decided to create some generic version just to
simplify the code. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20 15:23       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:23 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel',
	Marek Szyprowski

Hello,

On Monday, June 20, 2011 4:40 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:09AM +0200, Marek Szyprowski wrote:
> > This patch converts all dma_sg methods to be generic (independent of the
> > current DMA mapping implementation for ARM architecture). All dma sg
> > operations are now implemented on top of respective
> > dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.
> 
> No.  We really don't want to do this.

I assume you want to keep the current design for performance reasons?

It's really not a problem for me. I can change my patches to keep 
arm_dma_*_sg_* functions and create some stubs for dmabounce version.

> If we want to move the dsb() out of the mapping functions (which I
> have a patch for) to avoid doing a dsb() on each and every sg segment,
> then we must not use the generic stuff.

Ok, specialized (optimized) version of dma_*_sg_* operations are definitely
better. The current version just calls respective dma_single_* operations in
a loop, so in my patches I decided to create some generic version just to
simplify the code. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops
@ 2011-06-20 15:23       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 20, 2011 4:40 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:09AM +0200, Marek Szyprowski wrote:
> > This patch converts all dma_sg methods to be generic (independent of the
> > current DMA mapping implementation for ARM architecture). All dma sg
> > operations are now implemented on top of respective
> > dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.
> 
> No.  We really don't want to do this.

I assume you want to keep the current design for performance reasons?

It's really not a problem for me. I can change my patches to keep 
arm_dma_*_sg_* functions and create some stubs for dmabounce version.

> If we want to move the dsb() out of the mapping functions (which I
> have a patch for) to avoid doing a dsb() on each and every sg segment,
> then we must not use the generic stuff.

Ok, specialized (optimized) version of dma_*_sg_* operations are definitely
better. The current version just calls respective dma_single_* operations in
a loop, so in my patches I decided to create some generic version just to
simplify the code. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-20 15:31       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:31 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Arnd Bergmann',
	Marek Szyprowski

Hello,

On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > This patch removes dma bounce hooks from the common dma mapping
> > implementation on ARM architecture and creates a separate set of
> > dma_map_ops for dma bounce devices.
> 
> Why all this additional indirection for no gain?

I've did it to really separate dmabounce code and let it be completely 
independent of particular internal functions of the main generic dma-mapping
code.

dmabounce is just one of possible dma-mapping implementation and it is really
convenient to have it closed into common interface (dma_map_ops) rather than
having it spread around and hardcoded behind some #ifdefs in generic ARM
dma-mapping.

There will be also other dma-mapping implementations in the future - I 
thinking mainly of some iommu capable versions. 

In terms of speed I really doubt that these changes have any impact on the
system performance, but they significantly improves the code readability 
(see next patch with cleanup of dma-mapping.c).

> > @@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device
> *dev, void *ptr, size_t size,
> >  		 * We don't need to sync the DMA buffer since
> >  		 * it was allocated via the coherent allocators.
> >  		 */
> > -		__dma_single_cpu_to_dev(ptr, size, dir);
> > +		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
> >  	}
> >
> >  	return dma_addr;
> > @@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev,
> dma_addr_t dma_addr,
> >  		}
> >  		free_safe_buffer(dev->archdata.dmabounce, buf);
> >  	} else {
> > -		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
> > +		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
> >  	}
> >  }

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-20 15:31       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:31 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Arnd Bergmann',
	Marek Szyprowski

Hello,

On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > This patch removes dma bounce hooks from the common dma mapping
> > implementation on ARM architecture and creates a separate set of
> > dma_map_ops for dma bounce devices.
> 
> Why all this additional indirection for no gain?

I've did it to really separate dmabounce code and let it be completely 
independent of particular internal functions of the main generic dma-mapping
code.

dmabounce is just one of possible dma-mapping implementation and it is really
convenient to have it closed into common interface (dma_map_ops) rather than
having it spread around and hardcoded behind some #ifdefs in generic ARM
dma-mapping.

There will be also other dma-mapping implementations in the future - I 
thinking mainly of some iommu capable versions. 

In terms of speed I really doubt that these changes have any impact on the
system performance, but they significantly improves the code readability 
(see next patch with cleanup of dma-mapping.c).

> > @@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device
> *dev, void *ptr, size_t size,
> >  		 * We don't need to sync the DMA buffer since
> >  		 * it was allocated via the coherent allocators.
> >  		 */
> > -		__dma_single_cpu_to_dev(ptr, size, dir);
> > +		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
> >  	}
> >
> >  	return dma_addr;
> > @@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev,
> dma_addr_t dma_addr,
> >  		}
> >  		free_safe_buffer(dev->archdata.dmabounce, buf);
> >  	} else {
> > -		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
> > +		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
> >  	}
> >  }

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-20 15:31       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-20 15:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:

> On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > This patch removes dma bounce hooks from the common dma mapping
> > implementation on ARM architecture and creates a separate set of
> > dma_map_ops for dma bounce devices.
> 
> Why all this additional indirection for no gain?

I've did it to really separate dmabounce code and let it be completely 
independent of particular internal functions of the main generic dma-mapping
code.

dmabounce is just one of possible dma-mapping implementation and it is really
convenient to have it closed into common interface (dma_map_ops) rather than
having it spread around and hardcoded behind some #ifdefs in generic ARM
dma-mapping.

There will be also other dma-mapping implementations in the future - I 
thinking mainly of some iommu capable versions. 

In terms of speed I really doubt that these changes have any impact on the
system performance, but they significantly improves the code readability 
(see next patch with cleanup of dma-mapping.c).

> > @@ -278,7 +278,7 @@ static inline dma_addr_t map_single(struct device
> *dev, void *ptr, size_t size,
> >  		 * We don't need to sync the DMA buffer since
> >  		 * it was allocated via the coherent allocators.
> >  		 */
> > -		__dma_single_cpu_to_dev(ptr, size, dir);
> > +		dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
> >  	}
> >
> >  	return dma_addr;
> > @@ -317,7 +317,7 @@ static inline void unmap_single(struct device *dev,
> dma_addr_t dma_addr,
> >  		}
> >  		free_safe_buffer(dev->archdata.dmabounce, buf);
> >  	} else {
> > -		__dma_single_dev_to_cpu(dma_to_virt(dev, dma_addr), size, dir);
> > +		dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir);
> >  	}
> >  }

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-20 14:45     ` KyongHo Cho
  (?)
@ 2011-06-21 11:23       ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-21 11:23 UTC (permalink / raw)
  To: 'KyongHo Cho'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel', 'Russell King - ARM Linux'

Hello,

On Monday, June 20, 2011 4:46 PM KyongHo Cho wrote:

> On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> 
> >  struct dma_map_ops {
> > -       void* (*alloc_coherent)(struct device *dev, size_t size,
> > -                               dma_addr_t *dma_handle, gfp_t gfp);
> > -       void (*free_coherent)(struct device *dev, size_t size,
> > -                             void *vaddr, dma_addr_t dma_handle);
> > +       void* (*alloc)(struct device *dev, size_t size,
> > +                               dma_addr_t *dma_handle, gfp_t gfp,
> > +                               struct dma_attrs *attrs);
> > +       void (*free)(struct device *dev, size_t size,
> > +                             void *vaddr, dma_addr_t dma_handle,
> > +                             struct dma_attrs *attrs);
> > +       int (*mmap)(struct device *, struct vm_area_struct *,
> > +                         void *, dma_addr_t, size_t, struct dma_attrs
> *attrs);
> > +
> >        dma_addr_t (*map_page)(struct device *dev, struct page *page,
> >                               unsigned long offset, size_t size,
> >                               enum dma_data_direction dir,
> 
> I still don't agree with your idea that change alloc_coherent() with
> alloc().

> As I said before, we actually do not need dma_alloc_writecombine() anymore
> because it is not different from dma_alloc_coherent() in ARM.

You already got a reply that dropping dma_alloc_writecombine() is no
go on ARM.

> Most of other architectures do not have dma_alloc_writecombine().

That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
doesn't support the attribute, it is just ignored. That's the whole
point of the attributes extension. Once a driver is converted to 
dma_alloc_attrs(), it can be used without any changes either on platforms
that supports some specific attributes or the one that doesn't implement
support for any of them.

> If you want dma_alloc_coherent() to allocate user virtual address,
> I believe that it is also available with mmap() you introduced.

Allocation is a separate operation from mapping to userspace. Mmap
operation should just map the buffer (represented by a cookie of type
dma_addr_t) to user address space.

Note that some drivers (like framebuffer drivers for example) also
needs to have both types of mapping - one for user space and one for
kernel virtual space.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-21 11:23       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-21 11:23 UTC (permalink / raw)
  To: 'KyongHo Cho'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel', 'Russell King - ARM Linux'

Hello,

On Monday, June 20, 2011 4:46 PM KyongHo Cho wrote:

> On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> 
> >  struct dma_map_ops {
> > -       void* (*alloc_coherent)(struct device *dev, size_t size,
> > -                               dma_addr_t *dma_handle, gfp_t gfp);
> > -       void (*free_coherent)(struct device *dev, size_t size,
> > -                             void *vaddr, dma_addr_t dma_handle);
> > +       void* (*alloc)(struct device *dev, size_t size,
> > +                               dma_addr_t *dma_handle, gfp_t gfp,
> > +                               struct dma_attrs *attrs);
> > +       void (*free)(struct device *dev, size_t size,
> > +                             void *vaddr, dma_addr_t dma_handle,
> > +                             struct dma_attrs *attrs);
> > +       int (*mmap)(struct device *, struct vm_area_struct *,
> > +                         void *, dma_addr_t, size_t, struct dma_attrs
> *attrs);
> > +
> >        dma_addr_t (*map_page)(struct device *dev, struct page *page,
> >                               unsigned long offset, size_t size,
> >                               enum dma_data_direction dir,
> 
> I still don't agree with your idea that change alloc_coherent() with
> alloc().

> As I said before, we actually do not need dma_alloc_writecombine() anymore
> because it is not different from dma_alloc_coherent() in ARM.

You already got a reply that dropping dma_alloc_writecombine() is no
go on ARM.

> Most of other architectures do not have dma_alloc_writecombine().

That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
doesn't support the attribute, it is just ignored. That's the whole
point of the attributes extension. Once a driver is converted to 
dma_alloc_attrs(), it can be used without any changes either on platforms
that supports some specific attributes or the one that doesn't implement
support for any of them.

> If you want dma_alloc_coherent() to allocate user virtual address,
> I believe that it is also available with mmap() you introduced.

Allocation is a separate operation from mapping to userspace. Mmap
operation should just map the buffer (represented by a cookie of type
dma_addr_t) to user address space.

Note that some drivers (like framebuffer drivers for example) also
needs to have both types of mapping - one for user space and one for
kernel virtual space.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-21 11:23       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-21 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 20, 2011 4:46 PM KyongHo Cho wrote:

> On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> 
> > ?struct dma_map_ops {
> > - ? ? ? void* (*alloc_coherent)(struct device *dev, size_t size,
> > - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? dma_addr_t *dma_handle, gfp_t gfp);
> > - ? ? ? void (*free_coherent)(struct device *dev, size_t size,
> > - ? ? ? ? ? ? ? ? ? ? ? ? ? ? void *vaddr, dma_addr_t dma_handle);
> > + ? ? ? void* (*alloc)(struct device *dev, size_t size,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? dma_addr_t *dma_handle, gfp_t gfp,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct dma_attrs *attrs);
> > + ? ? ? void (*free)(struct device *dev, size_t size,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? void *vaddr, dma_addr_t dma_handle,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct dma_attrs *attrs);
> > + ? ? ? int (*mmap)(struct device *, struct vm_area_struct *,
> > + ? ? ? ? ? ? ? ? ? ? ? ? void *, dma_addr_t, size_t, struct dma_attrs
> *attrs);
> > +
> > ? ? ? ?dma_addr_t (*map_page)(struct device *dev, struct page *page,
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long offset, size_t size,
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum dma_data_direction dir,
> 
> I still don't agree with your idea that change alloc_coherent() with
> alloc().

> As I said before, we actually do not need dma_alloc_writecombine() anymore
> because it is not different from dma_alloc_coherent() in ARM.

You already got a reply that dropping dma_alloc_writecombine() is no
go on ARM.

> Most of other architectures do not have dma_alloc_writecombine().

That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
doesn't support the attribute, it is just ignored. That's the whole
point of the attributes extension. Once a driver is converted to 
dma_alloc_attrs(), it can be used without any changes either on platforms
that supports some specific attributes or the one that doesn't implement
support for any of them.

> If you want dma_alloc_coherent() to allocate user virtual address,
> I believe that it is also available with mmap() you introduced.

Allocation is a separate operation from mapping to userspace. Mmap
operation should just map the buffer (represented by a cookie of type
dma_addr_t) to user address space.

Note that some drivers (like framebuffer drivers for example) also
needs to have both types of mapping - one for user space and one for
kernel virtual space.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-21 11:47       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-21 11:47 UTC (permalink / raw)
  To: 'KyongHo Cho'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Joerg Roedel', 'Arnd Bergmann'

Hello,

On Monday, June 20, 2011 4:33 PM KyongHo Cho wrote:

> On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > +static inline void set_dma_ops(struct device *dev, struct dma_map_ops
> *ops)
> > +{
> > +       dev->archdata.dma_ops = ops;
> > +}
> > +
> 
> Who calls set_dma_ops()?
> In the mach. initialization part?

Yes, some board, machine or device bus initialization code is supposed to
call this function. Just 'git grep dma_set_ops' and you will see. In my
patch series one of the clients of set_dma_ops function is dmabounce 
framework (it is called in dmabounce_register_dev() function).

> What if a device driver does not want to use arch's dma_map_ops
> when machine init procedure set a dma_map_ops?

Could you elaborate on this case? The whole point of dma-mapping framework
is to hide the implementation of DMA mapping operation from the driver. 
The driver should never fiddle with dma map ops directly.

> Even though, may arch defiens their dma_map_ops in archdata of device
> structure,
> I think it is not a good idea that is device structure contains a
> pointer to dma_map_ops
> that may not be common to all devices in a board.

It is up to the board/bus startup code to set dma ops correctly.

> I also think that it is better to attach and to detach dma_map_ops
> dynamically.

What's the point of such operations? Why do you want to change dma
mapping methods in runtime?

> Moreover, a mapping is not permanent in our Exynos platform
> because a System MMU may be turned off while runtime.

This is theoretically possible. The System MMU (Samsung IOMMU
controller) driver can change dma_map_ops back to NULL on remove moving
back the client device to generic ARM dma mapping implementation.

> DMA API must come with IOMMU API to initialize IOMMU in runtime.

I don't understand what's the problem here. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-21 11:47       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-21 11:47 UTC (permalink / raw)
  To: 'KyongHo Cho'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Joerg Roedel', 'Arnd Bergmann'

Hello,

On Monday, June 20, 2011 4:33 PM KyongHo Cho wrote:

> On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > +static inline void set_dma_ops(struct device *dev, struct dma_map_ops
> *ops)
> > +{
> > +       dev->archdata.dma_ops = ops;
> > +}
> > +
> 
> Who calls set_dma_ops()?
> In the mach. initialization part?

Yes, some board, machine or device bus initialization code is supposed to
call this function. Just 'git grep dma_set_ops' and you will see. In my
patch series one of the clients of set_dma_ops function is dmabounce 
framework (it is called in dmabounce_register_dev() function).

> What if a device driver does not want to use arch's dma_map_ops
> when machine init procedure set a dma_map_ops?

Could you elaborate on this case? The whole point of dma-mapping framework
is to hide the implementation of DMA mapping operation from the driver. 
The driver should never fiddle with dma map ops directly.

> Even though, may arch defiens their dma_map_ops in archdata of device
> structure,
> I think it is not a good idea that is device structure contains a
> pointer to dma_map_ops
> that may not be common to all devices in a board.

It is up to the board/bus startup code to set dma ops correctly.

> I also think that it is better to attach and to detach dma_map_ops
> dynamically.

What's the point of such operations? Why do you want to change dma
mapping methods in runtime?

> Moreover, a mapping is not permanent in our Exynos platform
> because a System MMU may be turned off while runtime.

This is theoretically possible. The System MMU (Samsung IOMMU
controller) driver can change dma_map_ops back to NULL on remove moving
back the client device to generic ARM dma mapping implementation.

> DMA API must come with IOMMU API to initialize IOMMU in runtime.

I don't understand what's the problem here. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-21 11:47       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-21 11:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 20, 2011 4:33 PM KyongHo Cho wrote:

> On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > +static inline void set_dma_ops(struct device *dev, struct dma_map_ops
> *ops)
> > +{
> > + ? ? ? dev->archdata.dma_ops = ops;
> > +}
> > +
> 
> Who calls set_dma_ops()?
> In the mach. initialization part?

Yes, some board, machine or device bus initialization code is supposed to
call this function. Just 'git grep dma_set_ops' and you will see. In my
patch series one of the clients of set_dma_ops function is dmabounce 
framework (it is called in dmabounce_register_dev() function).

> What if a device driver does not want to use arch's dma_map_ops
> when machine init procedure set a dma_map_ops?

Could you elaborate on this case? The whole point of dma-mapping framework
is to hide the implementation of DMA mapping operation from the driver. 
The driver should never fiddle with dma map ops directly.

> Even though, may arch defiens their dma_map_ops in archdata of device
> structure,
> I think it is not a good idea that is device structure contains a
> pointer to dma_map_ops
> that may not be common to all devices in a board.

It is up to the board/bus startup code to set dma ops correctly.

> I also think that it is better to attach and to detach dma_map_ops
> dynamically.

What's the point of such operations? Why do you want to change dma
mapping methods in runtime?

> Moreover, a mapping is not permanent in our Exynos platform
> because a System MMU may be turned off while runtime.

This is theoretically possible. The System MMU (Samsung IOMMU
controller) driver can change dma_map_ops back to NULL on remove moving
back the client device to generic ARM dma mapping implementation.

> DMA API must come with IOMMU API to initialize IOMMU in runtime.

I don't understand what's the problem here. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-21 11:23       ` Marek Szyprowski
  (?)
@ 2011-06-22  0:00         ` KyongHo Cho
  -1 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-22  0:00 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arch, Russell King - ARM Linux, Arnd Bergmann,
	Joerg Roedel, linaro-mm-sig, linux-mm, Kyungmin Park,
	linux-arm-kernel

2011/6/21 Marek Szyprowski <m.szyprowski@samsung.com>:
>
> You already got a reply that dropping dma_alloc_writecombine() is no
> go on ARM.
>
Yes.

> That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
> can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
> doesn't support the attribute, it is just ignored. That's the whole
> point of the attributes extension. Once a driver is converted to
> dma_alloc_attrs(), it can be used without any changes either on platforms
> that supports some specific attributes or the one that doesn't implement
> support for any of them.
>
I just worried that removing an existing construct is not a simple job.
You introduced 3 attributes: DMA_ATTR_WRITE_COMBINE, ***COHERENT and
***NO_KERNEL_VADDR

I replied earlier, I also indicated that write combining option for
dma_alloc_*() is required.
But it does not mean dma_map_ops must support that.
I think dma_alloc_writecombine() can be implemented regardless of dma_map_ops.

> Allocation is a separate operation from mapping to userspace. Mmap
> operation should just map the buffer (represented by a cookie of type
> dma_addr_t) to user address space.
>
> Note that some drivers (like framebuffer drivers for example) also
> needs to have both types of mapping - one for user space and one for
> kernel virtual space.

I meant that I think DMA_ATTR_NOKERNELVADDR is not required because
you also introduced mmap callback.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-22  0:00         ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-22  0:00 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arch, Russell King - ARM Linux, Arnd Bergmann,
	Joerg Roedel, linaro-mm-sig, linux-mm, Kyungmin Park,
	linux-arm-kernel

2011/6/21 Marek Szyprowski <m.szyprowski@samsung.com>:
>
> You already got a reply that dropping dma_alloc_writecombine() is no
> go on ARM.
>
Yes.

> That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
> can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
> doesn't support the attribute, it is just ignored. That's the whole
> point of the attributes extension. Once a driver is converted to
> dma_alloc_attrs(), it can be used without any changes either on platforms
> that supports some specific attributes or the one that doesn't implement
> support for any of them.
>
I just worried that removing an existing construct is not a simple job.
You introduced 3 attributes: DMA_ATTR_WRITE_COMBINE, ***COHERENT and
***NO_KERNEL_VADDR

I replied earlier, I also indicated that write combining option for
dma_alloc_*() is required.
But it does not mean dma_map_ops must support that.
I think dma_alloc_writecombine() can be implemented regardless of dma_map_ops.

> Allocation is a separate operation from mapping to userspace. Mmap
> operation should just map the buffer (represented by a cookie of type
> dma_addr_t) to user address space.
>
> Note that some drivers (like framebuffer drivers for example) also
> needs to have both types of mapping - one for user space and one for
> kernel virtual space.

I meant that I think DMA_ATTR_NOKERNELVADDR is not required because
you also introduced mmap callback.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-22  0:00         ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-22  0:00 UTC (permalink / raw)
  To: linux-arm-kernel

2011/6/21 Marek Szyprowski <m.szyprowski@samsung.com>:
>
> You already got a reply that dropping dma_alloc_writecombine() is no
> go on ARM.
>
Yes.

> That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
> can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
> doesn't support the attribute, it is just ignored. That's the whole
> point of the attributes extension. Once a driver is converted to
> dma_alloc_attrs(), it can be used without any changes either on platforms
> that supports some specific attributes or the one that doesn't implement
> support for any of them.
>
I just worried that removing an existing construct is not a simple job.
You introduced 3 attributes: DMA_ATTR_WRITE_COMBINE, ***COHERENT and
***NO_KERNEL_VADDR

I replied earlier, I also indicated that write combining option for
dma_alloc_*() is required.
But it does not mean dma_map_ops must support that.
I think dma_alloc_writecombine() can be implemented regardless of dma_map_ops.

> Allocation is a separate operation from mapping to userspace. Mmap
> operation should just map the buffer (represented by a cookie of type
> dma_addr_t) to user address space.
>
> Note that some drivers (like framebuffer drivers for example) also
> needs to have both types of mapping - one for user space and one for
> kernel virtual space.

I meant that I think DMA_ATTR_NOKERNELVADDR is not required because
you also introduced mmap callback.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-22  4:53   ` Subash Patel
  -1 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-22  4:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Russell King - ARM Linux, Joerg Roedel,
	Arnd Bergmann

Hi Marek,

On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> Hello,
>
> This patch series is a continuation of my works on implementing generic
> IOMMU support in DMA mapping framework for ARM architecture. Now I
> focused on the DMA mapping framework itself. It turned out that adding
> support for common dma_map_ops structure was not that hard as I initally
> thought. After some modification most of the code fits really well to
> the generic dma_map_ops methods.
>
> The only change required to dma_map_ops is a new alloc function. During
> the discussion on Linaro Memory Management meeting in Budapest we got
> the idea that we can have only one alloc/free/mmap function with
> additional attributes argument. This way all different kinds of
> architecture specific buffer mappings can be hidden behind the
> attributes without the need of creating several versions of dma_alloc_
> function. I also noticed that the dma_alloc_noncoherent() function can
> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> will just ignore such attribute.
>
> Another good use case for alloc methods with attributes is the
> possibility to allocate buffer without a valid kernel mapping. There are
> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> buffers to user space. Such drivers don't touch the buffer data at all.
> For such buffers we can avoid the creation of a mapping in kernel
> virtual address space, saving precious vmalloc area. Such buffers might
> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.

Are you trying to say here, that the buffer would be allocated in the 
user space, and we just use it to map it to the device in DMA+IOMMU 
framework?

>
> All the changes introduced in this patch series are intended to prepare
> a good ground for upcoming generic IOMMU integration to DMA mapping
> framework on ARM architecture.
>
> For more information about proof-of-concept IOMMU implementation in DMA
> mapping framework, please refer to my previous set of patches:
> http://www.spinics.net/lists/linux-mm/msg19856.html
>
> I've tried to split the redesign into a set of single-step changes for
> easier review and understanding. If there is anything that needs further
> clarification, please don't hesitate to ask.
>
> The patches are prepared on top of Linux Kernel v3.0-rc3.
>
> The proposed changes have been tested on Samsung Exynos4 platform. I've
> also tested dmabounce code (by manually registering support for DMA
> bounce for some of the devices available on my board), although my
> hardware have no such strict requirements. Would be great if one could
> test my patches on different ARM architectures to check if I didn't
> break anything.
>
> Best regards

Regards,
Subash
SISO-SLG

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  4:53   ` Subash Patel
  0 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-22  4:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Russell King - ARM Linux, Joerg Roedel,
	Arnd Bergmann

Hi Marek,

On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> Hello,
>
> This patch series is a continuation of my works on implementing generic
> IOMMU support in DMA mapping framework for ARM architecture. Now I
> focused on the DMA mapping framework itself. It turned out that adding
> support for common dma_map_ops structure was not that hard as I initally
> thought. After some modification most of the code fits really well to
> the generic dma_map_ops methods.
>
> The only change required to dma_map_ops is a new alloc function. During
> the discussion on Linaro Memory Management meeting in Budapest we got
> the idea that we can have only one alloc/free/mmap function with
> additional attributes argument. This way all different kinds of
> architecture specific buffer mappings can be hidden behind the
> attributes without the need of creating several versions of dma_alloc_
> function. I also noticed that the dma_alloc_noncoherent() function can
> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> will just ignore such attribute.
>
> Another good use case for alloc methods with attributes is the
> possibility to allocate buffer without a valid kernel mapping. There are
> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> buffers to user space. Such drivers don't touch the buffer data at all.
> For such buffers we can avoid the creation of a mapping in kernel
> virtual address space, saving precious vmalloc area. Such buffers might
> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.

Are you trying to say here, that the buffer would be allocated in the 
user space, and we just use it to map it to the device in DMA+IOMMU 
framework?

>
> All the changes introduced in this patch series are intended to prepare
> a good ground for upcoming generic IOMMU integration to DMA mapping
> framework on ARM architecture.
>
> For more information about proof-of-concept IOMMU implementation in DMA
> mapping framework, please refer to my previous set of patches:
> http://www.spinics.net/lists/linux-mm/msg19856.html
>
> I've tried to split the redesign into a set of single-step changes for
> easier review and understanding. If there is anything that needs further
> clarification, please don't hesitate to ask.
>
> The patches are prepared on top of Linux Kernel v3.0-rc3.
>
> The proposed changes have been tested on Samsung Exynos4 platform. I've
> also tested dmabounce code (by manually registering support for DMA
> bounce for some of the devices available on my board), although my
> hardware have no such strict requirements. Would be great if one could
> test my patches on different ARM architectures to check if I didn't
> break anything.
>
> Best regards

Regards,
Subash
SISO-SLG

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  4:53   ` Subash Patel
  0 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-22  4:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marek,

On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> Hello,
>
> This patch series is a continuation of my works on implementing generic
> IOMMU support in DMA mapping framework for ARM architecture. Now I
> focused on the DMA mapping framework itself. It turned out that adding
> support for common dma_map_ops structure was not that hard as I initally
> thought. After some modification most of the code fits really well to
> the generic dma_map_ops methods.
>
> The only change required to dma_map_ops is a new alloc function. During
> the discussion on Linaro Memory Management meeting in Budapest we got
> the idea that we can have only one alloc/free/mmap function with
> additional attributes argument. This way all different kinds of
> architecture specific buffer mappings can be hidden behind the
> attributes without the need of creating several versions of dma_alloc_
> function. I also noticed that the dma_alloc_noncoherent() function can
> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> will just ignore such attribute.
>
> Another good use case for alloc methods with attributes is the
> possibility to allocate buffer without a valid kernel mapping. There are
> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> buffers to user space. Such drivers don't touch the buffer data at all.
> For such buffers we can avoid the creation of a mapping in kernel
> virtual address space, saving precious vmalloc area. Such buffers might
> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.

Are you trying to say here, that the buffer would be allocated in the 
user space, and we just use it to map it to the device in DMA+IOMMU 
framework?

>
> All the changes introduced in this patch series are intended to prepare
> a good ground for upcoming generic IOMMU integration to DMA mapping
> framework on ARM architecture.
>
> For more information about proof-of-concept IOMMU implementation in DMA
> mapping framework, please refer to my previous set of patches:
> http://www.spinics.net/lists/linux-mm/msg19856.html
>
> I've tried to split the redesign into a set of single-step changes for
> easier review and understanding. If there is anything that needs further
> clarification, please don't hesitate to ask.
>
> The patches are prepared on top of Linux Kernel v3.0-rc3.
>
> The proposed changes have been tested on Samsung Exynos4 platform. I've
> also tested dmabounce code (by manually registering support for DMA
> bounce for some of the devices available on my board), although my
> hardware have no such strict requirements. Would be great if one could
> test my patches on different ARM architectures to check if I didn't
> break anything.
>
> Best regards

Regards,
Subash
SISO-SLG

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 8/8] ARM: dma-mapping: use alloc, mmap, free from dma_ops
  2011-06-20  7:50   ` Marek Szyprowski
  (?)
@ 2011-06-22  6:53     ` KyongHo Cho
  -1 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-22  6:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Russell King - ARM Linux, Joerg Roedel,
	Arnd Bergmann

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:

> -extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t);
> +extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> +                          gfp_t gfp, struct dma_attrs *attrs);
> +
> +#define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL)
> +
> +static inline void *dma_alloc_attrs(struct device *dev, size_t size,
> +                                      dma_addr_t *dma_handle, gfp_t flag,
> +                                      struct dma_attrs *attrs)
> +{
> +       struct dma_map_ops *ops = get_dma_ops(dev);
> +       void *cpu_addr;
> +       BUG_ON(!ops);
> +
> +       cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> +       debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
> +       return cpu_addr;
> +}
>

Apart from the necessity of alloc_attr,
I hope the callback implementations to check if a function pointer is NULL.
Suppose that a system want to use default ARM implementation of dma_alloc_*()
while it uses its own implementations of dma_map_*().


With your suggestion,
we have only one option:

void *my_alloc(...) {
return dma_alloc_coherent(NULL, ...);
}

struct dma_map_ops ops = {
.alloc_coherent = &my_alloc,
...
};



I think the following method is simpler:

struct dma_map_ops ops = {
.alloc_coherent = NULL,
...
};

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 8/8] ARM: dma-mapping: use alloc, mmap, free from dma_ops
@ 2011-06-22  6:53     ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-22  6:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Russell King - ARM Linux, Joerg Roedel,
	Arnd Bergmann

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:

> -extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t);
> +extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> +                          gfp_t gfp, struct dma_attrs *attrs);
> +
> +#define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL)
> +
> +static inline void *dma_alloc_attrs(struct device *dev, size_t size,
> +                                      dma_addr_t *dma_handle, gfp_t flag,
> +                                      struct dma_attrs *attrs)
> +{
> +       struct dma_map_ops *ops = get_dma_ops(dev);
> +       void *cpu_addr;
> +       BUG_ON(!ops);
> +
> +       cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> +       debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
> +       return cpu_addr;
> +}
>

Apart from the necessity of alloc_attr,
I hope the callback implementations to check if a function pointer is NULL.
Suppose that a system want to use default ARM implementation of dma_alloc_*()
while it uses its own implementations of dma_map_*().


With your suggestion,
we have only one option:

void *my_alloc(...) {
return dma_alloc_coherent(NULL, ...);
}

struct dma_map_ops ops = {
.alloc_coherent = &my_alloc,
...
};



I think the following method is simpler:

struct dma_map_ops ops = {
.alloc_coherent = NULL,
...
};

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH 8/8] ARM: dma-mapping: use alloc, mmap, free from dma_ops
@ 2011-06-22  6:53     ` KyongHo Cho
  0 siblings, 0 replies; 201+ messages in thread
From: KyongHo Cho @ 2011-06-22  6:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi.

On Mon, Jun 20, 2011 at 4:50 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:

> -extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t);
> +extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> + ? ? ? ? ? ? ? ? ? ? ? ? ?gfp_t gfp, struct dma_attrs *attrs);
> +
> +#define dma_alloc_coherent(d,s,h,f) dma_alloc_attrs(d,s,h,f,NULL)
> +
> +static inline void *dma_alloc_attrs(struct device *dev, size_t size,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?dma_addr_t *dma_handle, gfp_t flag,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct dma_attrs *attrs)
> +{
> + ? ? ? struct dma_map_ops *ops = get_dma_ops(dev);
> + ? ? ? void *cpu_addr;
> + ? ? ? BUG_ON(!ops);
> +
> + ? ? ? cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
> + ? ? ? debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
> + ? ? ? return cpu_addr;
> +}
>

Apart from the necessity of alloc_attr,
I hope the callback implementations to check if a function pointer is NULL.
Suppose that a system want to use default ARM implementation of dma_alloc_*()
while it uses its own implementations of dma_map_*().


With your suggestion,
we have only one option:

void *my_alloc(...) {
return dma_alloc_coherent(NULL, ...);
}

struct dma_map_ops ops = {
.alloc_coherent = &my_alloc,
...
};



I think the following method is simpler:

struct dma_map_ops ops = {
.alloc_coherent = NULL,
...
};

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  6:59     ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-22  6:59 UTC (permalink / raw)
  To: 'Subash Patel'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Joerg Roedel', 'Arnd Bergmann',
	Marek Szyprowski

Hello,

On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:

> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > This patch series is a continuation of my works on implementing generic
> > IOMMU support in DMA mapping framework for ARM architecture. Now I
> > focused on the DMA mapping framework itself. It turned out that adding
> > support for common dma_map_ops structure was not that hard as I initally
> > thought. After some modification most of the code fits really well to
> > the generic dma_map_ops methods.
> >
> > The only change required to dma_map_ops is a new alloc function. During
> > the discussion on Linaro Memory Management meeting in Budapest we got
> > the idea that we can have only one alloc/free/mmap function with
> > additional attributes argument. This way all different kinds of
> > architecture specific buffer mappings can be hidden behind the
> > attributes without the need of creating several versions of dma_alloc_
> > function. I also noticed that the dma_alloc_noncoherent() function can
> > be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> > Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> > will just ignore such attribute.
> >
> > Another good use case for alloc methods with attributes is the
> > possibility to allocate buffer without a valid kernel mapping. There are
> > a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> > buffers to user space. Such drivers don't touch the buffer data at all.
> > For such buffers we can avoid the creation of a mapping in kernel
> > virtual address space, saving precious vmalloc area. Such buffers might
> > be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
> 
> Are you trying to say here, that the buffer would be allocated in the
> user space, and we just use it to map it to the device in DMA+IOMMU
> framework?

Nope. I proposed an extension which would allow you to allocate a buffer
without creating the kernel mapping for it. Right now dma_alloc_coherent()
performs 3 operations:
1. allocates memory for the buffer
2. creates coherent kernel mapping for the buffer
3. translates physical buffer address to DMA address that can be used by
the hardware.

dma_mmap_coherent makes additional mapping for the buffer in user process
virtual address space.

I want make the step 2 in dma_alloc_coherent() optional to save virtual
address space: it is really limited resource. I really want to avoid 
wasting it for mapping 128MiB buffers just to create full-HD processing
hardware pipeline, where no drivers will use kernel mapping at all.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  6:59     ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-22  6:59 UTC (permalink / raw)
  To: 'Subash Patel'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Joerg Roedel', 'Arnd Bergmann',
	Marek Szyprowski

Hello,

On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:

> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > This patch series is a continuation of my works on implementing generic
> > IOMMU support in DMA mapping framework for ARM architecture. Now I
> > focused on the DMA mapping framework itself. It turned out that adding
> > support for common dma_map_ops structure was not that hard as I initally
> > thought. After some modification most of the code fits really well to
> > the generic dma_map_ops methods.
> >
> > The only change required to dma_map_ops is a new alloc function. During
> > the discussion on Linaro Memory Management meeting in Budapest we got
> > the idea that we can have only one alloc/free/mmap function with
> > additional attributes argument. This way all different kinds of
> > architecture specific buffer mappings can be hidden behind the
> > attributes without the need of creating several versions of dma_alloc_
> > function. I also noticed that the dma_alloc_noncoherent() function can
> > be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> > Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> > will just ignore such attribute.
> >
> > Another good use case for alloc methods with attributes is the
> > possibility to allocate buffer without a valid kernel mapping. There are
> > a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> > buffers to user space. Such drivers don't touch the buffer data at all.
> > For such buffers we can avoid the creation of a mapping in kernel
> > virtual address space, saving precious vmalloc area. Such buffers might
> > be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
> 
> Are you trying to say here, that the buffer would be allocated in the
> user space, and we just use it to map it to the device in DMA+IOMMU
> framework?

Nope. I proposed an extension which would allow you to allocate a buffer
without creating the kernel mapping for it. Right now dma_alloc_coherent()
performs 3 operations:
1. allocates memory for the buffer
2. creates coherent kernel mapping for the buffer
3. translates physical buffer address to DMA address that can be used by
the hardware.

dma_mmap_coherent makes additional mapping for the buffer in user process
virtual address space.

I want make the step 2 in dma_alloc_coherent() optional to save virtual
address space: it is really limited resource. I really want to avoid 
wasting it for mapping 128MiB buffers just to create full-HD processing
hardware pipeline, where no drivers will use kernel mapping at all.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  6:59     ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-22  6:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:

> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > This patch series is a continuation of my works on implementing generic
> > IOMMU support in DMA mapping framework for ARM architecture. Now I
> > focused on the DMA mapping framework itself. It turned out that adding
> > support for common dma_map_ops structure was not that hard as I initally
> > thought. After some modification most of the code fits really well to
> > the generic dma_map_ops methods.
> >
> > The only change required to dma_map_ops is a new alloc function. During
> > the discussion on Linaro Memory Management meeting in Budapest we got
> > the idea that we can have only one alloc/free/mmap function with
> > additional attributes argument. This way all different kinds of
> > architecture specific buffer mappings can be hidden behind the
> > attributes without the need of creating several versions of dma_alloc_
> > function. I also noticed that the dma_alloc_noncoherent() function can
> > be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> > Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> > will just ignore such attribute.
> >
> > Another good use case for alloc methods with attributes is the
> > possibility to allocate buffer without a valid kernel mapping. There are
> > a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> > buffers to user space. Such drivers don't touch the buffer data at all.
> > For such buffers we can avoid the creation of a mapping in kernel
> > virtual address space, saving precious vmalloc area. Such buffers might
> > be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
> 
> Are you trying to say here, that the buffer would be allocated in the
> user space, and we just use it to map it to the device in DMA+IOMMU
> framework?

Nope. I proposed an extension which would allow you to allocate a buffer
without creating the kernel mapping for it. Right now dma_alloc_coherent()
performs 3 operations:
1. allocates memory for the buffer
2. creates coherent kernel mapping for the buffer
3. translates physical buffer address to DMA address that can be used by
the hardware.

dma_mmap_coherent makes additional mapping for the buffer in user process
virtual address space.

I want make the step 2 in dma_alloc_coherent() optional to save virtual
address space: it is really limited resource. I really want to avoid 
wasting it for mapping 128MiB buffers just to create full-HD processing
hardware pipeline, where no drivers will use kernel mapping at all.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-22  6:59     ` Marek Szyprowski
  (?)
@ 2011-06-22  8:53       ` Subash Patel
  -1 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-22  8:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Joerg Roedel', 'Arnd Bergmann'



On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
> Hello,
>
> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>
>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>> Hello,
>>>
>>> This patch series is a continuation of my works on implementing generic
>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>> focused on the DMA mapping framework itself. It turned out that adding
>>> support for common dma_map_ops structure was not that hard as I initally
>>> thought. After some modification most of the code fits really well to
>>> the generic dma_map_ops methods.
>>>
>>> The only change required to dma_map_ops is a new alloc function. During
>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>> the idea that we can have only one alloc/free/mmap function with
>>> additional attributes argument. This way all different kinds of
>>> architecture specific buffer mappings can be hidden behind the
>>> attributes without the need of creating several versions of dma_alloc_
>>> function. I also noticed that the dma_alloc_noncoherent() function can
>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>> will just ignore such attribute.
>>>
>>> Another good use case for alloc methods with attributes is the
>>> possibility to allocate buffer without a valid kernel mapping. There are
>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>> buffers to user space. Such drivers don't touch the buffer data at all.
>>> For such buffers we can avoid the creation of a mapping in kernel
>>> virtual address space, saving precious vmalloc area. Such buffers might
>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>
>> Are you trying to say here, that the buffer would be allocated in the
>> user space, and we just use it to map it to the device in DMA+IOMMU
>> framework?
>
> Nope. I proposed an extension which would allow you to allocate a buffer
> without creating the kernel mapping for it. Right now dma_alloc_coherent()
> performs 3 operations:
> 1. allocates memory for the buffer
> 2. creates coherent kernel mapping for the buffer
> 3. translates physical buffer address to DMA address that can be used by
> the hardware.
>
> dma_mmap_coherent makes additional mapping for the buffer in user process
> virtual address space.
>
> I want make the step 2 in dma_alloc_coherent() optional to save virtual
> address space: it is really limited resource. I really want to avoid
> wasting it for mapping 128MiB buffers just to create full-HD processing
> hardware pipeline, where no drivers will use kernel mapping at all.
>

I think by (2) above, you are referring to 
__dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual 
address for the drivers use. That makes sense now.

I have a query in similar lines, but related to user virtual address 
space. Is it feasible to extend these DMA interfaces(and IOMMU), to map 
a user allocated buffer into the hardware?

> Best regards

Regards,
Subash
SISO-SLG

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  8:53       ` Subash Patel
  0 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-22  8:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Joerg Roedel', 'Arnd Bergmann'



On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
> Hello,
>
> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>
>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>> Hello,
>>>
>>> This patch series is a continuation of my works on implementing generic
>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>> focused on the DMA mapping framework itself. It turned out that adding
>>> support for common dma_map_ops structure was not that hard as I initally
>>> thought. After some modification most of the code fits really well to
>>> the generic dma_map_ops methods.
>>>
>>> The only change required to dma_map_ops is a new alloc function. During
>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>> the idea that we can have only one alloc/free/mmap function with
>>> additional attributes argument. This way all different kinds of
>>> architecture specific buffer mappings can be hidden behind the
>>> attributes without the need of creating several versions of dma_alloc_
>>> function. I also noticed that the dma_alloc_noncoherent() function can
>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>> will just ignore such attribute.
>>>
>>> Another good use case for alloc methods with attributes is the
>>> possibility to allocate buffer without a valid kernel mapping. There are
>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>> buffers to user space. Such drivers don't touch the buffer data at all.
>>> For such buffers we can avoid the creation of a mapping in kernel
>>> virtual address space, saving precious vmalloc area. Such buffers might
>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>
>> Are you trying to say here, that the buffer would be allocated in the
>> user space, and we just use it to map it to the device in DMA+IOMMU
>> framework?
>
> Nope. I proposed an extension which would allow you to allocate a buffer
> without creating the kernel mapping for it. Right now dma_alloc_coherent()
> performs 3 operations:
> 1. allocates memory for the buffer
> 2. creates coherent kernel mapping for the buffer
> 3. translates physical buffer address to DMA address that can be used by
> the hardware.
>
> dma_mmap_coherent makes additional mapping for the buffer in user process
> virtual address space.
>
> I want make the step 2 in dma_alloc_coherent() optional to save virtual
> address space: it is really limited resource. I really want to avoid
> wasting it for mapping 128MiB buffers just to create full-HD processing
> hardware pipeline, where no drivers will use kernel mapping at all.
>

I think by (2) above, you are referring to 
__dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual 
address for the drivers use. That makes sense now.

I have a query in similar lines, but related to user virtual address 
space. Is it feasible to extend these DMA interfaces(and IOMMU), to map 
a user allocated buffer into the hardware?

> Best regards

Regards,
Subash
SISO-SLG

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  8:53       ` Subash Patel
  0 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-22  8:53 UTC (permalink / raw)
  To: linux-arm-kernel



On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
> Hello,
>
> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>
>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>> Hello,
>>>
>>> This patch series is a continuation of my works on implementing generic
>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>> focused on the DMA mapping framework itself. It turned out that adding
>>> support for common dma_map_ops structure was not that hard as I initally
>>> thought. After some modification most of the code fits really well to
>>> the generic dma_map_ops methods.
>>>
>>> The only change required to dma_map_ops is a new alloc function. During
>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>> the idea that we can have only one alloc/free/mmap function with
>>> additional attributes argument. This way all different kinds of
>>> architecture specific buffer mappings can be hidden behind the
>>> attributes without the need of creating several versions of dma_alloc_
>>> function. I also noticed that the dma_alloc_noncoherent() function can
>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>> will just ignore such attribute.
>>>
>>> Another good use case for alloc methods with attributes is the
>>> possibility to allocate buffer without a valid kernel mapping. There are
>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>> buffers to user space. Such drivers don't touch the buffer data at all.
>>> For such buffers we can avoid the creation of a mapping in kernel
>>> virtual address space, saving precious vmalloc area. Such buffers might
>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>
>> Are you trying to say here, that the buffer would be allocated in the
>> user space, and we just use it to map it to the device in DMA+IOMMU
>> framework?
>
> Nope. I proposed an extension which would allow you to allocate a buffer
> without creating the kernel mapping for it. Right now dma_alloc_coherent()
> performs 3 operations:
> 1. allocates memory for the buffer
> 2. creates coherent kernel mapping for the buffer
> 3. translates physical buffer address to DMA address that can be used by
> the hardware.
>
> dma_mmap_coherent makes additional mapping for the buffer in user process
> virtual address space.
>
> I want make the step 2 in dma_alloc_coherent() optional to save virtual
> address space: it is really limited resource. I really want to avoid
> wasting it for mapping 128MiB buffers just to create full-HD processing
> hardware pipeline, where no drivers will use kernel mapping at all.
>

I think by (2) above, you are referring to 
__dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual 
address for the drivers use. That makes sense now.

I have a query in similar lines, but related to user virtual address 
space. Is it feasible to extend these DMA interfaces(and IOMMU), to map 
a user allocated buffer into the hardware?

> Best regards

Regards,
Subash
SISO-SLG

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-22  8:53       ` Subash Patel
  (?)
@ 2011-06-22  9:27         ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-22  9:27 UTC (permalink / raw)
  To: 'Subash Patel'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Joerg Roedel', 'Arnd Bergmann',
	Marek Szyprowski

Hello,

On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:

> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
> >
> >> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> >>> Hello,
> >>>
> >>> This patch series is a continuation of my works on implementing generic
> >>> IOMMU support in DMA mapping framework for ARM architecture. Now I
> >>> focused on the DMA mapping framework itself. It turned out that adding
> >>> support for common dma_map_ops structure was not that hard as I
> initally
> >>> thought. After some modification most of the code fits really well to
> >>> the generic dma_map_ops methods.
> >>>
> >>> The only change required to dma_map_ops is a new alloc function. During
> >>> the discussion on Linaro Memory Management meeting in Budapest we got
> >>> the idea that we can have only one alloc/free/mmap function with
> >>> additional attributes argument. This way all different kinds of
> >>> architecture specific buffer mappings can be hidden behind the
> >>> attributes without the need of creating several versions of dma_alloc_
> >>> function. I also noticed that the dma_alloc_noncoherent() function can
> >>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> >>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> >>> will just ignore such attribute.
> >>>
> >>> Another good use case for alloc methods with attributes is the
> >>> possibility to allocate buffer without a valid kernel mapping. There
> are
> >>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> >>> buffers to user space. Such drivers don't touch the buffer data at all.
> >>> For such buffers we can avoid the creation of a mapping in kernel
> >>> virtual address space, saving precious vmalloc area. Such buffers might
> >>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
> >>
> >> Are you trying to say here, that the buffer would be allocated in the
> >> user space, and we just use it to map it to the device in DMA+IOMMU
> >> framework?
> >
> > Nope. I proposed an extension which would allow you to allocate a buffer
> > without creating the kernel mapping for it. Right now
> dma_alloc_coherent()
> > performs 3 operations:
> > 1. allocates memory for the buffer
> > 2. creates coherent kernel mapping for the buffer
> > 3. translates physical buffer address to DMA address that can be used by
> > the hardware.
> >
> > dma_mmap_coherent makes additional mapping for the buffer in user process
> > virtual address space.
> >
> > I want make the step 2 in dma_alloc_coherent() optional to save virtual
> > address space: it is really limited resource. I really want to avoid
> > wasting it for mapping 128MiB buffers just to create full-HD processing
> > hardware pipeline, where no drivers will use kernel mapping at all.
> >
> 
> I think by (2) above, you are referring to
> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
> address for the drivers use. That makes sense now.

Well, this is particular implementation which is used on ARM. Other 
architectures might implement it differently, that's why I used generic 
description and didn't point to any particular function.

> I have a query in similar lines, but related to user virtual address
> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
> a user allocated buffer into the hardware?

This can be done with the current API, although it may not look so 
straightforward. You just need to create a scatter list of user pages
(these can be gathered with get_user_pages function) and use dma_map_sg()
function. If the dma-mapping support iommu, it can map all these pages
into a single contiguous buffer on device (DMA) address space.

Some additional 'magic' might be required to get access to pages that are
mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.

I will try to implement this feature in videobuf2-dma-config allocator
together with the next version of my patches for dma-mapping&iommu.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  9:27         ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-22  9:27 UTC (permalink / raw)
  To: 'Subash Patel'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Joerg Roedel', 'Arnd Bergmann',
	Marek Szyprowski

Hello,

On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:

> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
> >
> >> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> >>> Hello,
> >>>
> >>> This patch series is a continuation of my works on implementing generic
> >>> IOMMU support in DMA mapping framework for ARM architecture. Now I
> >>> focused on the DMA mapping framework itself. It turned out that adding
> >>> support for common dma_map_ops structure was not that hard as I
> initally
> >>> thought. After some modification most of the code fits really well to
> >>> the generic dma_map_ops methods.
> >>>
> >>> The only change required to dma_map_ops is a new alloc function. During
> >>> the discussion on Linaro Memory Management meeting in Budapest we got
> >>> the idea that we can have only one alloc/free/mmap function with
> >>> additional attributes argument. This way all different kinds of
> >>> architecture specific buffer mappings can be hidden behind the
> >>> attributes without the need of creating several versions of dma_alloc_
> >>> function. I also noticed that the dma_alloc_noncoherent() function can
> >>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> >>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> >>> will just ignore such attribute.
> >>>
> >>> Another good use case for alloc methods with attributes is the
> >>> possibility to allocate buffer without a valid kernel mapping. There
> are
> >>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> >>> buffers to user space. Such drivers don't touch the buffer data at all.
> >>> For such buffers we can avoid the creation of a mapping in kernel
> >>> virtual address space, saving precious vmalloc area. Such buffers might
> >>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
> >>
> >> Are you trying to say here, that the buffer would be allocated in the
> >> user space, and we just use it to map it to the device in DMA+IOMMU
> >> framework?
> >
> > Nope. I proposed an extension which would allow you to allocate a buffer
> > without creating the kernel mapping for it. Right now
> dma_alloc_coherent()
> > performs 3 operations:
> > 1. allocates memory for the buffer
> > 2. creates coherent kernel mapping for the buffer
> > 3. translates physical buffer address to DMA address that can be used by
> > the hardware.
> >
> > dma_mmap_coherent makes additional mapping for the buffer in user process
> > virtual address space.
> >
> > I want make the step 2 in dma_alloc_coherent() optional to save virtual
> > address space: it is really limited resource. I really want to avoid
> > wasting it for mapping 128MiB buffers just to create full-HD processing
> > hardware pipeline, where no drivers will use kernel mapping at all.
> >
> 
> I think by (2) above, you are referring to
> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
> address for the drivers use. That makes sense now.

Well, this is particular implementation which is used on ARM. Other 
architectures might implement it differently, that's why I used generic 
description and didn't point to any particular function.

> I have a query in similar lines, but related to user virtual address
> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
> a user allocated buffer into the hardware?

This can be done with the current API, although it may not look so 
straightforward. You just need to create a scatter list of user pages
(these can be gathered with get_user_pages function) and use dma_map_sg()
function. If the dma-mapping support iommu, it can map all these pages
into a single contiguous buffer on device (DMA) address space.

Some additional 'magic' might be required to get access to pages that are
mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.

I will try to implement this feature in videobuf2-dma-config allocator
together with the next version of my patches for dma-mapping&iommu.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22  9:27         ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-22  9:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:

> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
> > Hello,
> >
> > On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
> >
> >> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
> >>> Hello,
> >>>
> >>> This patch series is a continuation of my works on implementing generic
> >>> IOMMU support in DMA mapping framework for ARM architecture. Now I
> >>> focused on the DMA mapping framework itself. It turned out that adding
> >>> support for common dma_map_ops structure was not that hard as I
> initally
> >>> thought. After some modification most of the code fits really well to
> >>> the generic dma_map_ops methods.
> >>>
> >>> The only change required to dma_map_ops is a new alloc function. During
> >>> the discussion on Linaro Memory Management meeting in Budapest we got
> >>> the idea that we can have only one alloc/free/mmap function with
> >>> additional attributes argument. This way all different kinds of
> >>> architecture specific buffer mappings can be hidden behind the
> >>> attributes without the need of creating several versions of dma_alloc_
> >>> function. I also noticed that the dma_alloc_noncoherent() function can
> >>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
> >>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
> >>> will just ignore such attribute.
> >>>
> >>> Another good use case for alloc methods with attributes is the
> >>> possibility to allocate buffer without a valid kernel mapping. There
> are
> >>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
> >>> buffers to user space. Such drivers don't touch the buffer data at all.
> >>> For such buffers we can avoid the creation of a mapping in kernel
> >>> virtual address space, saving precious vmalloc area. Such buffers might
> >>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
> >>
> >> Are you trying to say here, that the buffer would be allocated in the
> >> user space, and we just use it to map it to the device in DMA+IOMMU
> >> framework?
> >
> > Nope. I proposed an extension which would allow you to allocate a buffer
> > without creating the kernel mapping for it. Right now
> dma_alloc_coherent()
> > performs 3 operations:
> > 1. allocates memory for the buffer
> > 2. creates coherent kernel mapping for the buffer
> > 3. translates physical buffer address to DMA address that can be used by
> > the hardware.
> >
> > dma_mmap_coherent makes additional mapping for the buffer in user process
> > virtual address space.
> >
> > I want make the step 2 in dma_alloc_coherent() optional to save virtual
> > address space: it is really limited resource. I really want to avoid
> > wasting it for mapping 128MiB buffers just to create full-HD processing
> > hardware pipeline, where no drivers will use kernel mapping at all.
> >
> 
> I think by (2) above, you are referring to
> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
> address for the drivers use. That makes sense now.

Well, this is particular implementation which is used on ARM. Other 
architectures might implement it differently, that's why I used generic 
description and didn't point to any particular function.

> I have a query in similar lines, but related to user virtual address
> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
> a user allocated buffer into the hardware?

This can be done with the current API, although it may not look so 
straightforward. You just need to create a scatter list of user pages
(these can be gathered with get_user_pages function) and use dma_map_sg()
function. If the dma-mapping support iommu, it can map all these pages
into a single contiguous buffer on device (DMA) address space.

Some additional 'magic' might be required to get access to pages that are
mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.

I will try to implement this feature in videobuf2-dma-config allocator
together with the next version of my patches for dma-mapping&iommu.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-22  9:27         ` Marek Szyprowski
  (?)
@ 2011-06-22 16:00           ` Jordan Crouse
  -1 siblings, 0 replies; 201+ messages in thread
From: Jordan Crouse @ 2011-06-22 16:00 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Subash Patel',
	linux-arch, 'Russell King - ARM Linux',
	'Arnd Bergmann', 'Joerg Roedel',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	linux-arm-kernel

On 06/22/2011 03:27 AM, Marek Szyprowski wrote:
> Hello,
>
> On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:
>
>> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
>>> Hello,
>>>
>>> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>>>
>>>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>>>> Hello,
>>>>>
>>>>> This patch series is a continuation of my works on implementing generic
>>>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>>>> focused on the DMA mapping framework itself. It turned out that adding
>>>>> support for common dma_map_ops structure was not that hard as I
>> initally
>>>>> thought. After some modification most of the code fits really well to
>>>>> the generic dma_map_ops methods.
>>>>>
>>>>> The only change required to dma_map_ops is a new alloc function. During
>>>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>>>> the idea that we can have only one alloc/free/mmap function with
>>>>> additional attributes argument. This way all different kinds of
>>>>> architecture specific buffer mappings can be hidden behind the
>>>>> attributes without the need of creating several versions of dma_alloc_
>>>>> function. I also noticed that the dma_alloc_noncoherent() function can
>>>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>>>> will just ignore such attribute.
>>>>>
>>>>> Another good use case for alloc methods with attributes is the
>>>>> possibility to allocate buffer without a valid kernel mapping. There
>> are
>>>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>>>> buffers to user space. Such drivers don't touch the buffer data at all.
>>>>> For such buffers we can avoid the creation of a mapping in kernel
>>>>> virtual address space, saving precious vmalloc area. Such buffers might
>>>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>>>
>>>> Are you trying to say here, that the buffer would be allocated in the
>>>> user space, and we just use it to map it to the device in DMA+IOMMU
>>>> framework?
>>>
>>> Nope. I proposed an extension which would allow you to allocate a buffer
>>> without creating the kernel mapping for it. Right now
>> dma_alloc_coherent()
>>> performs 3 operations:
>>> 1. allocates memory for the buffer
>>> 2. creates coherent kernel mapping for the buffer
>>> 3. translates physical buffer address to DMA address that can be used by
>>> the hardware.
>>>
>>> dma_mmap_coherent makes additional mapping for the buffer in user process
>>> virtual address space.
>>>
>>> I want make the step 2 in dma_alloc_coherent() optional to save virtual
>>> address space: it is really limited resource. I really want to avoid
>>> wasting it for mapping 128MiB buffers just to create full-HD processing
>>> hardware pipeline, where no drivers will use kernel mapping at all.
>>>
>>
>> I think by (2) above, you are referring to
>> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
>> address for the drivers use. That makes sense now.
>
> Well, this is particular implementation which is used on ARM. Other
> architectures might implement it differently, that's why I used generic
> description and didn't point to any particular function.
>
>> I have a query in similar lines, but related to user virtual address
>> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
>> a user allocated buffer into the hardware?
>
> This can be done with the current API, although it may not look so
> straightforward. You just need to create a scatter list of user pages
> (these can be gathered with get_user_pages function) and use dma_map_sg()
> function. If the dma-mapping support iommu, it can map all these pages
> into a single contiguous buffer on device (DMA) address space.
>
> Some additional 'magic' might be required to get access to pages that are
> mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
>
> I will try to implement this feature in videobuf2-dma-config allocator
> together with the next version of my patches for dma-mapping&iommu.

With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering arguments
for trying to map user pages. Given that our ultimate goal here is buffer
sharing, user allocated pages have limited value and appeal. If anything, I
vote that this be a far lower priority compared to the rest of the win you
have here.

Jordan

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22 16:00           ` Jordan Crouse
  0 siblings, 0 replies; 201+ messages in thread
From: Jordan Crouse @ 2011-06-22 16:00 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Subash Patel',
	linux-arch, 'Russell King - ARM Linux',
	'Arnd Bergmann', 'Joerg Roedel',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	linux-arm-kernel

On 06/22/2011 03:27 AM, Marek Szyprowski wrote:
> Hello,
>
> On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:
>
>> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
>>> Hello,
>>>
>>> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>>>
>>>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>>>> Hello,
>>>>>
>>>>> This patch series is a continuation of my works on implementing generic
>>>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>>>> focused on the DMA mapping framework itself. It turned out that adding
>>>>> support for common dma_map_ops structure was not that hard as I
>> initally
>>>>> thought. After some modification most of the code fits really well to
>>>>> the generic dma_map_ops methods.
>>>>>
>>>>> The only change required to dma_map_ops is a new alloc function. During
>>>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>>>> the idea that we can have only one alloc/free/mmap function with
>>>>> additional attributes argument. This way all different kinds of
>>>>> architecture specific buffer mappings can be hidden behind the
>>>>> attributes without the need of creating several versions of dma_alloc_
>>>>> function. I also noticed that the dma_alloc_noncoherent() function can
>>>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>>>> will just ignore such attribute.
>>>>>
>>>>> Another good use case for alloc methods with attributes is the
>>>>> possibility to allocate buffer without a valid kernel mapping. There
>> are
>>>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>>>> buffers to user space. Such drivers don't touch the buffer data at all.
>>>>> For such buffers we can avoid the creation of a mapping in kernel
>>>>> virtual address space, saving precious vmalloc area. Such buffers might
>>>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>>>
>>>> Are you trying to say here, that the buffer would be allocated in the
>>>> user space, and we just use it to map it to the device in DMA+IOMMU
>>>> framework?
>>>
>>> Nope. I proposed an extension which would allow you to allocate a buffer
>>> without creating the kernel mapping for it. Right now
>> dma_alloc_coherent()
>>> performs 3 operations:
>>> 1. allocates memory for the buffer
>>> 2. creates coherent kernel mapping for the buffer
>>> 3. translates physical buffer address to DMA address that can be used by
>>> the hardware.
>>>
>>> dma_mmap_coherent makes additional mapping for the buffer in user process
>>> virtual address space.
>>>
>>> I want make the step 2 in dma_alloc_coherent() optional to save virtual
>>> address space: it is really limited resource. I really want to avoid
>>> wasting it for mapping 128MiB buffers just to create full-HD processing
>>> hardware pipeline, where no drivers will use kernel mapping at all.
>>>
>>
>> I think by (2) above, you are referring to
>> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
>> address for the drivers use. That makes sense now.
>
> Well, this is particular implementation which is used on ARM. Other
> architectures might implement it differently, that's why I used generic
> description and didn't point to any particular function.
>
>> I have a query in similar lines, but related to user virtual address
>> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
>> a user allocated buffer into the hardware?
>
> This can be done with the current API, although it may not look so
> straightforward. You just need to create a scatter list of user pages
> (these can be gathered with get_user_pages function) and use dma_map_sg()
> function. If the dma-mapping support iommu, it can map all these pages
> into a single contiguous buffer on device (DMA) address space.
>
> Some additional 'magic' might be required to get access to pages that are
> mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
>
> I will try to implement this feature in videobuf2-dma-config allocator
> together with the next version of my patches for dma-mapping&iommu.

With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering arguments
for trying to map user pages. Given that our ultimate goal here is buffer
sharing, user allocated pages have limited value and appeal. If anything, I
vote that this be a far lower priority compared to the rest of the win you
have here.

Jordan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-22 16:00           ` Jordan Crouse
  0 siblings, 0 replies; 201+ messages in thread
From: Jordan Crouse @ 2011-06-22 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/22/2011 03:27 AM, Marek Szyprowski wrote:
> Hello,
>
> On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:
>
>> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
>>> Hello,
>>>
>>> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>>>
>>>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>>>> Hello,
>>>>>
>>>>> This patch series is a continuation of my works on implementing generic
>>>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>>>> focused on the DMA mapping framework itself. It turned out that adding
>>>>> support for common dma_map_ops structure was not that hard as I
>> initally
>>>>> thought. After some modification most of the code fits really well to
>>>>> the generic dma_map_ops methods.
>>>>>
>>>>> The only change required to dma_map_ops is a new alloc function. During
>>>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>>>> the idea that we can have only one alloc/free/mmap function with
>>>>> additional attributes argument. This way all different kinds of
>>>>> architecture specific buffer mappings can be hidden behind the
>>>>> attributes without the need of creating several versions of dma_alloc_
>>>>> function. I also noticed that the dma_alloc_noncoherent() function can
>>>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>>>> will just ignore such attribute.
>>>>>
>>>>> Another good use case for alloc methods with attributes is the
>>>>> possibility to allocate buffer without a valid kernel mapping. There
>> are
>>>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>>>> buffers to user space. Such drivers don't touch the buffer data at all.
>>>>> For such buffers we can avoid the creation of a mapping in kernel
>>>>> virtual address space, saving precious vmalloc area. Such buffers might
>>>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>>>
>>>> Are you trying to say here, that the buffer would be allocated in the
>>>> user space, and we just use it to map it to the device in DMA+IOMMU
>>>> framework?
>>>
>>> Nope. I proposed an extension which would allow you to allocate a buffer
>>> without creating the kernel mapping for it. Right now
>> dma_alloc_coherent()
>>> performs 3 operations:
>>> 1. allocates memory for the buffer
>>> 2. creates coherent kernel mapping for the buffer
>>> 3. translates physical buffer address to DMA address that can be used by
>>> the hardware.
>>>
>>> dma_mmap_coherent makes additional mapping for the buffer in user process
>>> virtual address space.
>>>
>>> I want make the step 2 in dma_alloc_coherent() optional to save virtual
>>> address space: it is really limited resource. I really want to avoid
>>> wasting it for mapping 128MiB buffers just to create full-HD processing
>>> hardware pipeline, where no drivers will use kernel mapping at all.
>>>
>>
>> I think by (2) above, you are referring to
>> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
>> address for the drivers use. That makes sense now.
>
> Well, this is particular implementation which is used on ARM. Other
> architectures might implement it differently, that's why I used generic
> description and didn't point to any particular function.
>
>> I have a query in similar lines, but related to user virtual address
>> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
>> a user allocated buffer into the hardware?
>
> This can be done with the current API, although it may not look so
> straightforward. You just need to create a scatter list of user pages
> (these can be gathered with get_user_pages function) and use dma_map_sg()
> function. If the dma-mapping support iommu, it can map all these pages
> into a single contiguous buffer on device (DMA) address space.
>
> Some additional 'magic' might be required to get access to pages that are
> mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
>
> I will try to implement this feature in videobuf2-dma-config allocator
> together with the next version of my patches for dma-mapping&iommu.

With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering arguments
for trying to map user pages. Given that our ultimate goal here is buffer
sharing, user allocated pages have limited value and appeal. If anything, I
vote that this be a far lower priority compared to the rest of the win you
have here.

Jordan

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-23 13:09             ` Subash Patel
  0 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-23 13:09 UTC (permalink / raw)
  To: Jordan Crouse, Marek Szyprowski
  Cc: linux-arch, linaro-mm-sig, linux-mm, linux-arm-kernel

Hi Jordan,

On 06/22/2011 09:30 PM, Jordan Crouse wrote:
> On 06/22/2011 03:27 AM, Marek Szyprowski wrote:
>> Hello,
>>
>> On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:
>>
>>> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
>>>> Hello,
>>>>
>>>> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>>>>
>>>>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This patch series is a continuation of my works on implementing
>>>>>> generic
>>>>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>>>>> focused on the DMA mapping framework itself. It turned out that
>>>>>> adding
>>>>>> support for common dma_map_ops structure was not that hard as I
>>> initally
>>>>>> thought. After some modification most of the code fits really well to
>>>>>> the generic dma_map_ops methods.
>>>>>>
>>>>>> The only change required to dma_map_ops is a new alloc function.
>>>>>> During
>>>>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>>>>> the idea that we can have only one alloc/free/mmap function with
>>>>>> additional attributes argument. This way all different kinds of
>>>>>> architecture specific buffer mappings can be hidden behind the
>>>>>> attributes without the need of creating several versions of
>>>>>> dma_alloc_
>>>>>> function. I also noticed that the dma_alloc_noncoherent() function
>>>>>> can
>>>>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>>>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>>>>> will just ignore such attribute.
>>>>>>
>>>>>> Another good use case for alloc methods with attributes is the
>>>>>> possibility to allocate buffer without a valid kernel mapping. There
>>> are
>>>>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>>>>> buffers to user space. Such drivers don't touch the buffer data at
>>>>>> all.
>>>>>> For such buffers we can avoid the creation of a mapping in kernel
>>>>>> virtual address space, saving precious vmalloc area. Such buffers
>>>>>> might
>>>>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>>>>
>>>>> Are you trying to say here, that the buffer would be allocated in the
>>>>> user space, and we just use it to map it to the device in DMA+IOMMU
>>>>> framework?
>>>>
>>>> Nope. I proposed an extension which would allow you to allocate a
>>>> buffer
>>>> without creating the kernel mapping for it. Right now
>>> dma_alloc_coherent()
>>>> performs 3 operations:
>>>> 1. allocates memory for the buffer
>>>> 2. creates coherent kernel mapping for the buffer
>>>> 3. translates physical buffer address to DMA address that can be
>>>> used by
>>>> the hardware.
>>>>
>>>> dma_mmap_coherent makes additional mapping for the buffer in user
>>>> process
>>>> virtual address space.
>>>>
>>>> I want make the step 2 in dma_alloc_coherent() optional to save virtual
>>>> address space: it is really limited resource. I really want to avoid
>>>> wasting it for mapping 128MiB buffers just to create full-HD processing
>>>> hardware pipeline, where no drivers will use kernel mapping at all.
>>>>
>>>
>>> I think by (2) above, you are referring to
>>> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
>>> address for the drivers use. That makes sense now.
>>
>> Well, this is particular implementation which is used on ARM. Other
>> architectures might implement it differently, that's why I used generic
>> description and didn't point to any particular function.
>>
>>> I have a query in similar lines, but related to user virtual address
>>> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
>>> a user allocated buffer into the hardware?
>>
>> This can be done with the current API, although it may not look so
>> straightforward. You just need to create a scatter list of user pages
>> (these can be gathered with get_user_pages function) and use dma_map_sg()
>> function. If the dma-mapping support iommu, it can map all these pages
>> into a single contiguous buffer on device (DMA) address space.
>>
>> Some additional 'magic' might be required to get access to pages that are
>> mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
>>
>> I will try to implement this feature in videobuf2-dma-config allocator
>> together with the next version of my patches for dma-mapping&iommu.
>
> With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering
> arguments
> for trying to map user pages. Given that our ultimate goal here is buffer
> sharing, user allocated pages have limited value and appeal. If anything, I
> vote that this be a far lower priority compared to the rest of the win you
> have here.
>

We have some rare cases, where requirements like above are also there. 
So we require to have flexibility to map user allocated buffers to 
devices as well.

Please refer to my email 
(http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000273.html)

Regards,
Subash

> Jordan
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-23 13:09             ` Subash Patel
  0 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-23 13:09 UTC (permalink / raw)
  To: Jordan Crouse, Marek Szyprowski
  Cc: linux-arch, linaro-mm-sig, linux-mm, linux-arm-kernel

Hi Jordan,

On 06/22/2011 09:30 PM, Jordan Crouse wrote:
> On 06/22/2011 03:27 AM, Marek Szyprowski wrote:
>> Hello,
>>
>> On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:
>>
>>> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
>>>> Hello,
>>>>
>>>> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>>>>
>>>>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This patch series is a continuation of my works on implementing
>>>>>> generic
>>>>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>>>>> focused on the DMA mapping framework itself. It turned out that
>>>>>> adding
>>>>>> support for common dma_map_ops structure was not that hard as I
>>> initally
>>>>>> thought. After some modification most of the code fits really well to
>>>>>> the generic dma_map_ops methods.
>>>>>>
>>>>>> The only change required to dma_map_ops is a new alloc function.
>>>>>> During
>>>>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>>>>> the idea that we can have only one alloc/free/mmap function with
>>>>>> additional attributes argument. This way all different kinds of
>>>>>> architecture specific buffer mappings can be hidden behind the
>>>>>> attributes without the need of creating several versions of
>>>>>> dma_alloc_
>>>>>> function. I also noticed that the dma_alloc_noncoherent() function
>>>>>> can
>>>>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>>>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>>>>> will just ignore such attribute.
>>>>>>
>>>>>> Another good use case for alloc methods with attributes is the
>>>>>> possibility to allocate buffer without a valid kernel mapping. There
>>> are
>>>>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>>>>> buffers to user space. Such drivers don't touch the buffer data at
>>>>>> all.
>>>>>> For such buffers we can avoid the creation of a mapping in kernel
>>>>>> virtual address space, saving precious vmalloc area. Such buffers
>>>>>> might
>>>>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>>>>
>>>>> Are you trying to say here, that the buffer would be allocated in the
>>>>> user space, and we just use it to map it to the device in DMA+IOMMU
>>>>> framework?
>>>>
>>>> Nope. I proposed an extension which would allow you to allocate a
>>>> buffer
>>>> without creating the kernel mapping for it. Right now
>>> dma_alloc_coherent()
>>>> performs 3 operations:
>>>> 1. allocates memory for the buffer
>>>> 2. creates coherent kernel mapping for the buffer
>>>> 3. translates physical buffer address to DMA address that can be
>>>> used by
>>>> the hardware.
>>>>
>>>> dma_mmap_coherent makes additional mapping for the buffer in user
>>>> process
>>>> virtual address space.
>>>>
>>>> I want make the step 2 in dma_alloc_coherent() optional to save virtual
>>>> address space: it is really limited resource. I really want to avoid
>>>> wasting it for mapping 128MiB buffers just to create full-HD processing
>>>> hardware pipeline, where no drivers will use kernel mapping at all.
>>>>
>>>
>>> I think by (2) above, you are referring to
>>> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
>>> address for the drivers use. That makes sense now.
>>
>> Well, this is particular implementation which is used on ARM. Other
>> architectures might implement it differently, that's why I used generic
>> description and didn't point to any particular function.
>>
>>> I have a query in similar lines, but related to user virtual address
>>> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
>>> a user allocated buffer into the hardware?
>>
>> This can be done with the current API, although it may not look so
>> straightforward. You just need to create a scatter list of user pages
>> (these can be gathered with get_user_pages function) and use dma_map_sg()
>> function. If the dma-mapping support iommu, it can map all these pages
>> into a single contiguous buffer on device (DMA) address space.
>>
>> Some additional 'magic' might be required to get access to pages that are
>> mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
>>
>> I will try to implement this feature in videobuf2-dma-config allocator
>> together with the next version of my patches for dma-mapping&iommu.
>
> With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering
> arguments
> for trying to map user pages. Given that our ultimate goal here is buffer
> sharing, user allocated pages have limited value and appeal. If anything, I
> vote that this be a far lower priority compared to the rest of the win you
> have here.
>

We have some rare cases, where requirements like above are also there. 
So we require to have flexibility to map user allocated buffers to 
devices as well.

Please refer to my email 
(http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000273.html)

Regards,
Subash

> Jordan
>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-23 13:09             ` Subash Patel
  0 siblings, 0 replies; 201+ messages in thread
From: Subash Patel @ 2011-06-23 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jordan,

On 06/22/2011 09:30 PM, Jordan Crouse wrote:
> On 06/22/2011 03:27 AM, Marek Szyprowski wrote:
>> Hello,
>>
>> On Wednesday, June 22, 2011 10:53 AM Subash Patel wrote:
>>
>>> On 06/22/2011 12:29 PM, Marek Szyprowski wrote:
>>>> Hello,
>>>>
>>>> On Wednesday, June 22, 2011 6:53 AM Subash Patel wrote:
>>>>
>>>>> On 06/20/2011 01:20 PM, Marek Szyprowski wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This patch series is a continuation of my works on implementing
>>>>>> generic
>>>>>> IOMMU support in DMA mapping framework for ARM architecture. Now I
>>>>>> focused on the DMA mapping framework itself. It turned out that
>>>>>> adding
>>>>>> support for common dma_map_ops structure was not that hard as I
>>> initally
>>>>>> thought. After some modification most of the code fits really well to
>>>>>> the generic dma_map_ops methods.
>>>>>>
>>>>>> The only change required to dma_map_ops is a new alloc function.
>>>>>> During
>>>>>> the discussion on Linaro Memory Management meeting in Budapest we got
>>>>>> the idea that we can have only one alloc/free/mmap function with
>>>>>> additional attributes argument. This way all different kinds of
>>>>>> architecture specific buffer mappings can be hidden behind the
>>>>>> attributes without the need of creating several versions of
>>>>>> dma_alloc_
>>>>>> function. I also noticed that the dma_alloc_noncoherent() function
>>>>>> can
>>>>>> be also implemented this way with DMA_ATTRIB_NON_COHERENT attribute.
>>>>>> Systems that just defines dma_alloc_noncoherent as dma_alloc_coherent
>>>>>> will just ignore such attribute.
>>>>>>
>>>>>> Another good use case for alloc methods with attributes is the
>>>>>> possibility to allocate buffer without a valid kernel mapping. There
>>> are
>>>>>> a number of drivers (mainly V4L2 and ALSA) that only exports the DMA
>>>>>> buffers to user space. Such drivers don't touch the buffer data at
>>>>>> all.
>>>>>> For such buffers we can avoid the creation of a mapping in kernel
>>>>>> virtual address space, saving precious vmalloc area. Such buffers
>>>>>> might
>>>>>> be allocated once a new attribute DMA_ATTRIB_NO_KERNEL_MAPPING.
>>>>>
>>>>> Are you trying to say here, that the buffer would be allocated in the
>>>>> user space, and we just use it to map it to the device in DMA+IOMMU
>>>>> framework?
>>>>
>>>> Nope. I proposed an extension which would allow you to allocate a
>>>> buffer
>>>> without creating the kernel mapping for it. Right now
>>> dma_alloc_coherent()
>>>> performs 3 operations:
>>>> 1. allocates memory for the buffer
>>>> 2. creates coherent kernel mapping for the buffer
>>>> 3. translates physical buffer address to DMA address that can be
>>>> used by
>>>> the hardware.
>>>>
>>>> dma_mmap_coherent makes additional mapping for the buffer in user
>>>> process
>>>> virtual address space.
>>>>
>>>> I want make the step 2 in dma_alloc_coherent() optional to save virtual
>>>> address space: it is really limited resource. I really want to avoid
>>>> wasting it for mapping 128MiB buffers just to create full-HD processing
>>>> hardware pipeline, where no drivers will use kernel mapping at all.
>>>>
>>>
>>> I think by (2) above, you are referring to
>>> __dma_alloc_remap()->arm_vmregion_alloc() to allocate the kernel virtual
>>> address for the drivers use. That makes sense now.
>>
>> Well, this is particular implementation which is used on ARM. Other
>> architectures might implement it differently, that's why I used generic
>> description and didn't point to any particular function.
>>
>>> I have a query in similar lines, but related to user virtual address
>>> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
>>> a user allocated buffer into the hardware?
>>
>> This can be done with the current API, although it may not look so
>> straightforward. You just need to create a scatter list of user pages
>> (these can be gathered with get_user_pages function) and use dma_map_sg()
>> function. If the dma-mapping support iommu, it can map all these pages
>> into a single contiguous buffer on device (DMA) address space.
>>
>> Some additional 'magic' might be required to get access to pages that are
>> mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
>>
>> I will try to implement this feature in videobuf2-dma-config allocator
>> together with the next version of my patches for dma-mapping&iommu.
>
> With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering
> arguments
> for trying to map user pages. Given that our ultimate goal here is buffer
> sharing, user allocated pages have limited value and appeal. If anything, I
> vote that this be a far lower priority compared to the rest of the win you
> have here.
>

We have some rare cases, where requirements like above are also there. 
So we require to have flexibility to map user allocated buffers to 
devices as well.

Please refer to my email 
(http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000273.html)

Regards,
Subash

> Jordan
>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-23 13:09             ` Subash Patel
  (?)
@ 2011-06-23 16:24               ` Michael K. Edwards
  -1 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-23 16:24 UTC (permalink / raw)
  To: Subash Patel
  Cc: Jordan Crouse, Marek Szyprowski, linux-arch, linaro-mm-sig,
	linux-arm-kernel, linux-mm

On Thu, Jun 23, 2011 at 6:09 AM, Subash Patel <subashrp@gmail.com> wrote:

> We have some rare cases, where requirements like above are also there. So we
> require to have flexibility to map user allocated buffers to devices as
> well.

Not so rare, I think.  When using the OpenGL back end, Qt routinely
allocates buffers to hold image assets (e. g., decompressed JPEGs and
the glyph cache) and then uses them as textures.  Which, if there's a
GPU that doesn't participate in the cache coherency protocol, is a
problem.  (One which we can reliably trigger on our embedded
platform.)

The best workaround we have been able to come up with is for Qt's
allocator API, which already has a "flags" parameter, to grow an
"allocate for use as texture" flag, which makes the allocation come
from a separate pool backed by a write-combining uncacheable mapping.
Then we can grovel our way through the highest-frequency use cases,
restructuring the code that writes these assets to use the approved
write-combining tricks.

In the very near future, some of these assets are likely to come from
other hardware blocks, such as a hardware JPEG decoder (Subash's use
case), a V4L2 capture device, or a OpenMAX H.264 decoder.  Those may
add orthogonal allocation requirements, such as page alignment or
allocation from tightly coupled memory.  The only entity that knows
what buffers might be passed where is the userland application (or
potentially a userland media framework, like StageFright or
GStreamer).

So the solution that I'd like to see is for none of these drivers to
do their own allocation of buffers that aren't for strictly internal
use.  Instead, the userland application should ask each component for
a "buffer attributes" structure, and merge the attributes of the
components that may touch a given buffer in order to get the
allocation attributes for that buffer (or for the hugepage from which
it will carve out many like it).

The userland would ask the kernel to do the appropriate allocation --
ideally by passing in the merged allocation attributes and getting
back a file descriptor, which can be passed around to other processes
(over local domain sockets) and mmap'ed.  The buffers themselves would
have to be registered with each driver that uses them; i. e., the
driver's buffer allocation API is replaced with a buffer registration
API.  If the driver doesn't like the attributes of the mapping from
which the buffer was allocated, the registration fails.

I will try to get around to producing some code that does this soon,
at least for the Qt/GPU texture asset allocation/registration use
case.

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-23 16:24               ` Michael K. Edwards
  0 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-23 16:24 UTC (permalink / raw)
  To: Subash Patel
  Cc: Jordan Crouse, Marek Szyprowski, linux-arch, linaro-mm-sig,
	linux-arm-kernel, linux-mm

On Thu, Jun 23, 2011 at 6:09 AM, Subash Patel <subashrp@gmail.com> wrote:

> We have some rare cases, where requirements like above are also there. So we
> require to have flexibility to map user allocated buffers to devices as
> well.

Not so rare, I think.  When using the OpenGL back end, Qt routinely
allocates buffers to hold image assets (e. g., decompressed JPEGs and
the glyph cache) and then uses them as textures.  Which, if there's a
GPU that doesn't participate in the cache coherency protocol, is a
problem.  (One which we can reliably trigger on our embedded
platform.)

The best workaround we have been able to come up with is for Qt's
allocator API, which already has a "flags" parameter, to grow an
"allocate for use as texture" flag, which makes the allocation come
from a separate pool backed by a write-combining uncacheable mapping.
Then we can grovel our way through the highest-frequency use cases,
restructuring the code that writes these assets to use the approved
write-combining tricks.

In the very near future, some of these assets are likely to come from
other hardware blocks, such as a hardware JPEG decoder (Subash's use
case), a V4L2 capture device, or a OpenMAX H.264 decoder.  Those may
add orthogonal allocation requirements, such as page alignment or
allocation from tightly coupled memory.  The only entity that knows
what buffers might be passed where is the userland application (or
potentially a userland media framework, like StageFright or
GStreamer).

So the solution that I'd like to see is for none of these drivers to
do their own allocation of buffers that aren't for strictly internal
use.  Instead, the userland application should ask each component for
a "buffer attributes" structure, and merge the attributes of the
components that may touch a given buffer in order to get the
allocation attributes for that buffer (or for the hugepage from which
it will carve out many like it).

The userland would ask the kernel to do the appropriate allocation --
ideally by passing in the merged allocation attributes and getting
back a file descriptor, which can be passed around to other processes
(over local domain sockets) and mmap'ed.  The buffers themselves would
have to be registered with each driver that uses them; i. e., the
driver's buffer allocation API is replaced with a buffer registration
API.  If the driver doesn't like the attributes of the mapping from
which the buffer was allocated, the registration fails.

I will try to get around to producing some code that does this soon,
at least for the Qt/GPU texture asset allocation/registration use
case.

Cheers,
- Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-23 16:24               ` Michael K. Edwards
  0 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-23 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 23, 2011 at 6:09 AM, Subash Patel <subashrp@gmail.com> wrote:

> We have some rare cases, where requirements like above are also there. So we
> require to have flexibility to map user allocated buffers to devices as
> well.

Not so rare, I think.  When using the OpenGL back end, Qt routinely
allocates buffers to hold image assets (e. g., decompressed JPEGs and
the glyph cache) and then uses them as textures.  Which, if there's a
GPU that doesn't participate in the cache coherency protocol, is a
problem.  (One which we can reliably trigger on our embedded
platform.)

The best workaround we have been able to come up with is for Qt's
allocator API, which already has a "flags" parameter, to grow an
"allocate for use as texture" flag, which makes the allocation come
from a separate pool backed by a write-combining uncacheable mapping.
Then we can grovel our way through the highest-frequency use cases,
restructuring the code that writes these assets to use the approved
write-combining tricks.

In the very near future, some of these assets are likely to come from
other hardware blocks, such as a hardware JPEG decoder (Subash's use
case), a V4L2 capture device, or a OpenMAX H.264 decoder.  Those may
add orthogonal allocation requirements, such as page alignment or
allocation from tightly coupled memory.  The only entity that knows
what buffers might be passed where is the userland application (or
potentially a userland media framework, like StageFright or
GStreamer).

So the solution that I'd like to see is for none of these drivers to
do their own allocation of buffers that aren't for strictly internal
use.  Instead, the userland application should ask each component for
a "buffer attributes" structure, and merge the attributes of the
components that may touch a given buffer in order to get the
allocation attributes for that buffer (or for the hugepage from which
it will carve out many like it).

The userland would ask the kernel to do the appropriate allocation --
ideally by passing in the merged allocation attributes and getting
back a file descriptor, which can be passed around to other processes
(over local domain sockets) and mmap'ed.  The buffers themselves would
have to be registered with each driver that uses them; i. e., the
driver's buffer allocation API is replaced with a buffer registration
API.  If the driver doesn't like the attributes of the mapping from
which the buffer was allocated, the registration fails.

I will try to get around to producing some code that does this soon,
at least for the Qt/GPU texture asset allocation/registration use
case.

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-23 16:24               ` Michael K. Edwards
  (?)
@ 2011-06-23 22:09                 ` Michael K. Edwards
  -1 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-23 22:09 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Subash Patel, Jordan Crouse, Marek Szyprowski, linux-arch,
	linaro-mm-sig, linux-arm-kernel, linux-mm

Jonathan -

I'm inviting you to this conversation (and to linaro-mm-sig, if you'd
care to participate!), because I'd really like your commentary on what
it takes to make write-combining fully effective on various ARMv7
implementations.

The current threads:
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000334.html
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000263.html

Archive link for a related discussion:
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-April/000003.html

Getting full write-combining performance on Intel architectures
involves a somewhat delicate dance:
  http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/

And I expect something similar to be necessary in order to avoid the
read-modify-write penalty for write-combining buffers on ARMv7.  (NEON
store-multiple operations can fill an entire 64-byte entry in the
victim buffer in one opcode; I don't know whether this is enough to
stop the L3 memory system from reading the data before clobbering it.)

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-23 22:09                 ` Michael K. Edwards
  0 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-23 22:09 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Subash Patel, Jordan Crouse, Marek Szyprowski, linux-arch,
	linaro-mm-sig, linux-arm-kernel, linux-mm

Jonathan -

I'm inviting you to this conversation (and to linaro-mm-sig, if you'd
care to participate!), because I'd really like your commentary on what
it takes to make write-combining fully effective on various ARMv7
implementations.

The current threads:
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000334.html
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000263.html

Archive link for a related discussion:
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-April/000003.html

Getting full write-combining performance on Intel architectures
involves a somewhat delicate dance:
  http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/

And I expect something similar to be necessary in order to avoid the
read-modify-write penalty for write-combining buffers on ARMv7.  (NEON
store-multiple operations can fill an entire 64-byte entry in the
victim buffer in one opcode; I don't know whether this is enough to
stop the L3 memory system from reading the data before clobbering it.)

Cheers,
- Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-23 22:09                 ` Michael K. Edwards
  0 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-23 22:09 UTC (permalink / raw)
  To: linux-arm-kernel

Jonathan -

I'm inviting you to this conversation (and to linaro-mm-sig, if you'd
care to participate!), because I'd really like your commentary on what
it takes to make write-combining fully effective on various ARMv7
implementations.

The current threads:
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000334.html
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-June/000263.html

Archive link for a related discussion:
  http://lists.linaro.org/pipermail/linaro-mm-sig/2011-April/000003.html

Getting full write-combining performance on Intel architectures
involves a somewhat delicate dance:
  http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/

And I expect something similar to be necessary in order to avoid the
read-modify-write penalty for write-combining buffers on ARMv7.  (NEON
store-multiple operations can fill an entire 64-byte entry in the
victim buffer in one opcode; I don't know whether this is enough to
stop the L3 memory system from reading the data before clobbering it.)

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-22  0:00         ` KyongHo Cho
  (?)
@ 2011-06-24  7:20           ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-24  7:20 UTC (permalink / raw)
  To: 'KyongHo Cho'
  Cc: linux-arch, 'Russell King - ARM Linux',
	'Arnd Bergmann', 'Joerg Roedel',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	linux-arm-kernel, Marek Szyprowski

Hello,

On Wednesday, June 22, 2011 2:01 AM KyongHo Cho wrote:

> 2011/6/21 Marek Szyprowski <m.szyprowski@samsung.com>:
> >
> > You already got a reply that dropping dma_alloc_writecombine() is no
> > go on ARM.
> >
> Yes.
> 
> > That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
> > can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
> > doesn't support the attribute, it is just ignored. That's the whole
> > point of the attributes extension. Once a driver is converted to
> > dma_alloc_attrs(), it can be used without any changes either on platforms
> > that supports some specific attributes or the one that doesn't implement
> > support for any of them.
> >
> I just worried that removing an existing construct is not a simple job.
> You introduced 3 attributes: DMA_ATTR_WRITE_COMBINE, ***COHERENT and
> ***NO_KERNEL_VADDR

COHERENT is the default behavior when no attribute is provided. I also
don't want to remove existing calls to dma_alloc_coherent() and 
dma_alloc_writecombine() from the drivers. This can be done later, once
dma_alloc_attrs() API will stabilize.

> I replied earlier, I also indicated that write combining option for
> dma_alloc_*() is required.
> But it does not mean dma_map_ops must support that.
> I think dma_alloc_writecombine() can be implemented regardless of
> dma_map_ops.

The discussion about the need of dma_alloc_attrs() has been performed on
Memory Management Summit at Linaro Meeting in Budapest. It simplifies the
API and provides full backward compatibility for existing drivers.
 
> > Allocation is a separate operation from mapping to userspace. Mmap
> > operation should just map the buffer (represented by a cookie of type
> > dma_addr_t) to user address space.
> >
> > Note that some drivers (like framebuffer drivers for example) also
> > needs to have both types of mapping - one for user space and one for
> > kernel virtual space.
> 
> I meant that I think DMA_ATTR_NOKERNELVADDR is not required because
> you also introduced mmap callback.

I've already said that mmap callback is not related to the fact that the
buffer has been allocated with or without respective kernel virtual space
mapping. I really don't get what do you mean here.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24  7:20           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-24  7:20 UTC (permalink / raw)
  To: 'KyongHo Cho'
  Cc: linux-arch, 'Russell King - ARM Linux',
	'Arnd Bergmann', 'Joerg Roedel',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	linux-arm-kernel, Marek Szyprowski

Hello,

On Wednesday, June 22, 2011 2:01 AM KyongHo Cho wrote:

> 2011/6/21 Marek Szyprowski <m.szyprowski@samsung.com>:
> >
> > You already got a reply that dropping dma_alloc_writecombine() is no
> > go on ARM.
> >
> Yes.
> 
> > That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
> > can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
> > doesn't support the attribute, it is just ignored. That's the whole
> > point of the attributes extension. Once a driver is converted to
> > dma_alloc_attrs(), it can be used without any changes either on platforms
> > that supports some specific attributes or the one that doesn't implement
> > support for any of them.
> >
> I just worried that removing an existing construct is not a simple job.
> You introduced 3 attributes: DMA_ATTR_WRITE_COMBINE, ***COHERENT and
> ***NO_KERNEL_VADDR

COHERENT is the default behavior when no attribute is provided. I also
don't want to remove existing calls to dma_alloc_coherent() and 
dma_alloc_writecombine() from the drivers. This can be done later, once
dma_alloc_attrs() API will stabilize.

> I replied earlier, I also indicated that write combining option for
> dma_alloc_*() is required.
> But it does not mean dma_map_ops must support that.
> I think dma_alloc_writecombine() can be implemented regardless of
> dma_map_ops.

The discussion about the need of dma_alloc_attrs() has been performed on
Memory Management Summit at Linaro Meeting in Budapest. It simplifies the
API and provides full backward compatibility for existing drivers.
 
> > Allocation is a separate operation from mapping to userspace. Mmap
> > operation should just map the buffer (represented by a cookie of type
> > dma_addr_t) to user address space.
> >
> > Note that some drivers (like framebuffer drivers for example) also
> > needs to have both types of mapping - one for user space and one for
> > kernel virtual space.
> 
> I meant that I think DMA_ATTR_NOKERNELVADDR is not required because
> you also introduced mmap callback.

I've already said that mmap callback is not related to the fact that the
buffer has been allocated with or without respective kernel virtual space
mapping. I really don't get what do you mean here.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24  7:20           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-24  7:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Wednesday, June 22, 2011 2:01 AM KyongHo Cho wrote:

> 2011/6/21 Marek Szyprowski <m.szyprowski@samsung.com>:
> >
> > You already got a reply that dropping dma_alloc_writecombine() is no
> > go on ARM.
> >
> Yes.
> 
> > That's not a problem. Once we agree on dma_alloc_attrs(), the drivers
> > can be changed to use DMA_ATTR_WRITE_COMBINE attribute. If the platform
> > doesn't support the attribute, it is just ignored. That's the whole
> > point of the attributes extension. Once a driver is converted to
> > dma_alloc_attrs(), it can be used without any changes either on platforms
> > that supports some specific attributes or the one that doesn't implement
> > support for any of them.
> >
> I just worried that removing an existing construct is not a simple job.
> You introduced 3 attributes: DMA_ATTR_WRITE_COMBINE, ***COHERENT and
> ***NO_KERNEL_VADDR

COHERENT is the default behavior when no attribute is provided. I also
don't want to remove existing calls to dma_alloc_coherent() and 
dma_alloc_writecombine() from the drivers. This can be done later, once
dma_alloc_attrs() API will stabilize.

> I replied earlier, I also indicated that write combining option for
> dma_alloc_*() is required.
> But it does not mean dma_map_ops must support that.
> I think dma_alloc_writecombine() can be implemented regardless of
> dma_map_ops.

The discussion about the need of dma_alloc_attrs() has been performed on
Memory Management Summit at Linaro Meeting in Budapest. It simplifies the
API and provides full backward compatibility for existing drivers.
 
> > Allocation is a separate operation from mapping to userspace. Mmap
> > operation should just map the buffer (represented by a cookie of type
> > dma_addr_t) to user address space.
> >
> > Note that some drivers (like framebuffer drivers for example) also
> > needs to have both types of mapping - one for user space and one for
> > kernel virtual space.
> 
> I meant that I think DMA_ATTR_NOKERNELVADDR is not required because
> you also introduced mmap callback.

I've already said that mmap callback is not related to the fact that the
buffer has been allocated with or without respective kernel virtual space
mapping. I really don't get what do you mean here.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-24  8:39         ` 'Joerg Roedel'
  0 siblings, 0 replies; 201+ messages in thread
From: 'Joerg Roedel' @ 2011-06-24  8:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'KyongHo Cho',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Arnd Bergmann'

On Tue, Jun 21, 2011 at 01:47:03PM +0200, Marek Szyprowski wrote:
> > I also think that it is better to attach and to detach dma_map_ops
> > dynamically.
> 
> What's the point of such operations? Why do you want to change dma
> mapping methods in runtime?

That is dangerous. You have to make sure that there are no mappings
granted to the the device driver before changing the dma_ops of a device
at runtime. Otherwise existing mappings for a device may disappear and
confuse the driver and the device.

Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-24  8:39         ` 'Joerg Roedel'
  0 siblings, 0 replies; 201+ messages in thread
From: 'Joerg Roedel' @ 2011-06-24  8:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'KyongHo Cho',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Russell King - ARM Linux',
	'Arnd Bergmann'

On Tue, Jun 21, 2011 at 01:47:03PM +0200, Marek Szyprowski wrote:
> > I also think that it is better to attach and to detach dma_map_ops
> > dynamically.
> 
> What's the point of such operations? Why do you want to change dma
> mapping methods in runtime?

That is dangerous. You have to make sure that there are no mappings
granted to the the device driver before changing the dma_ops of a device
at runtime. Otherwise existing mappings for a device may disappear and
confuse the driver and the device.

Joerg


^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-24  8:39         ` 'Joerg Roedel'
  0 siblings, 0 replies; 201+ messages in thread
From: 'Joerg Roedel' @ 2011-06-24  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 21, 2011 at 01:47:03PM +0200, Marek Szyprowski wrote:
> > I also think that it is better to attach and to detach dma_map_ops
> > dynamically.
> 
> What's the point of such operations? Why do you want to change dma
> mapping methods in runtime?

That is dangerous. You have to make sure that there are no mappings
granted to the the device driver before changing the dma_ops of a device
at runtime. Otherwise existing mappings for a device may disappear and
confuse the driver and the device.

Joerg

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-20  7:50 ` Marek Szyprowski
  (?)
@ 2011-06-24  9:18   ` Joerg Roedel
  -1 siblings, 0 replies; 201+ messages in thread
From: Joerg Roedel @ 2011-06-24  9:18 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Russell King - ARM Linux

Hi Marek,

On Mon, Jun 20, 2011 at 09:50:05AM +0200, Marek Szyprowski wrote:
> This patch series is a continuation of my works on implementing generic
> IOMMU support in DMA mapping framework for ARM architecture. Now I
> focused on the DMA mapping framework itself. It turned out that adding
> support for common dma_map_ops structure was not that hard as I initally
> thought. After some modification most of the code fits really well to
> the generic dma_map_ops methods.

I appreciate your progress on this generic dma_ops implementation. But
for now it looks very ARM specific. Do you have plans to extend it to
non-ARM iommu-api implementations too?

	Joerg

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-24  9:18   ` Joerg Roedel
  0 siblings, 0 replies; 201+ messages in thread
From: Joerg Roedel @ 2011-06-24  9:18 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Russell King - ARM Linux

Hi Marek,

On Mon, Jun 20, 2011 at 09:50:05AM +0200, Marek Szyprowski wrote:
> This patch series is a continuation of my works on implementing generic
> IOMMU support in DMA mapping framework for ARM architecture. Now I
> focused on the DMA mapping framework itself. It turned out that adding
> support for common dma_map_ops structure was not that hard as I initally
> thought. After some modification most of the code fits really well to
> the generic dma_map_ops methods.

I appreciate your progress on this generic dma_ops implementation. But
for now it looks very ARM specific. Do you have plans to extend it to
non-ARM iommu-api implementations too?

	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-24  9:18   ` Joerg Roedel
  0 siblings, 0 replies; 201+ messages in thread
From: Joerg Roedel @ 2011-06-24  9:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marek,

On Mon, Jun 20, 2011 at 09:50:05AM +0200, Marek Szyprowski wrote:
> This patch series is a continuation of my works on implementing generic
> IOMMU support in DMA mapping framework for ARM architecture. Now I
> focused on the DMA mapping framework itself. It turned out that adding
> support for common dma_map_ops structure was not that hard as I initally
> thought. After some modification most of the code fits really well to
> the generic dma_map_ops methods.

I appreciate your progress on this generic dma_ops implementation. But
for now it looks very ARM specific. Do you have plans to extend it to
non-ARM iommu-api implementations too?

	Joerg

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-24  9:18   ` Joerg Roedel
  (?)
@ 2011-06-24 14:26     ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-24 14:26 UTC (permalink / raw)
  To: 'Joerg Roedel'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Russell King - ARM Linux',
	Marek Szyprowski

Hello,

On Friday, June 24, 2011 11:18 AM Joerg Roedel wrote:

> On Mon, Jun 20, 2011 at 09:50:05AM +0200, Marek Szyprowski wrote:
> > This patch series is a continuation of my works on implementing generic
> > IOMMU support in DMA mapping framework for ARM architecture. Now I
> > focused on the DMA mapping framework itself. It turned out that adding
> > support for common dma_map_ops structure was not that hard as I initally
> > thought. After some modification most of the code fits really well to
> > the generic dma_map_ops methods.
> 
> I appreciate your progress on this generic dma_ops implementation. But
> for now it looks very ARM specific. Do you have plans to extend it to
> non-ARM iommu-api implementations too?

These works are just a first step to create an environment for real iommu 
& dma-mapping integration. It is much easier to work on IOMMU integration
once the dma-mapping operations can be easily changed for particular devices.
dma_map_ops gives such flexibility. It is also a de-facto standard interface
for other architectures so it was really desired to work on iommu
implementation on top of dma_map_ops.

Of course my patches will be ARM-centric, but I hope to isolate ARM-specific
from generic parts, which can be easily reused on other platforms. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-24 14:26     ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-24 14:26 UTC (permalink / raw)
  To: 'Joerg Roedel'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Russell King - ARM Linux',
	Marek Szyprowski

Hello,

On Friday, June 24, 2011 11:18 AM Joerg Roedel wrote:

> On Mon, Jun 20, 2011 at 09:50:05AM +0200, Marek Szyprowski wrote:
> > This patch series is a continuation of my works on implementing generic
> > IOMMU support in DMA mapping framework for ARM architecture. Now I
> > focused on the DMA mapping framework itself. It turned out that adding
> > support for common dma_map_ops structure was not that hard as I initally
> > thought. After some modification most of the code fits really well to
> > the generic dma_map_ops methods.
> 
> I appreciate your progress on this generic dma_ops implementation. But
> for now it looks very ARM specific. Do you have plans to extend it to
> non-ARM iommu-api implementations too?

These works are just a first step to create an environment for real iommu 
& dma-mapping integration. It is much easier to work on IOMMU integration
once the dma-mapping operations can be easily changed for particular devices.
dma_map_ops gives such flexibility. It is also a de-facto standard interface
for other architectures so it was really desired to work on iommu
implementation on top of dma_map_ops.

Of course my patches will be ARM-centric, but I hope to isolate ARM-specific
from generic parts, which can be easily reused on other platforms. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-24 14:26     ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-24 14:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, June 24, 2011 11:18 AM Joerg Roedel wrote:

> On Mon, Jun 20, 2011 at 09:50:05AM +0200, Marek Szyprowski wrote:
> > This patch series is a continuation of my works on implementing generic
> > IOMMU support in DMA mapping framework for ARM architecture. Now I
> > focused on the DMA mapping framework itself. It turned out that adding
> > support for common dma_map_ops structure was not that hard as I initally
> > thought. After some modification most of the code fits really well to
> > the generic dma_map_ops methods.
> 
> I appreciate your progress on this generic dma_ops implementation. But
> for now it looks very ARM specific. Do you have plans to extend it to
> non-ARM iommu-api implementations too?

These works are just a first step to create an environment for real iommu 
& dma-mapping integration. It is much easier to work on IOMMU integration
once the dma-mapping operations can be easily changed for particular devices.
dma_map_ops gives such flexibility. It is also a de-facto standard interface
for other architectures so it was really desired to work on iommu
implementation on top of dma_map_ops.

Of course my patches will be ARM-centric, but I hope to isolate ARM-specific
from generic parts, which can be easily reused on other platforms. 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-24 15:20             ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:20 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: Marek Szyprowski, 'Subash Patel',
	linux-arch, 'Russell King - ARM Linux',
	'Joerg Roedel',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	linux-arm-kernel

On Wednesday 22 June 2011, Jordan Crouse wrote:
> >> I have a query in similar lines, but related to user virtual address
> >> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
> >> a user allocated buffer into the hardware?
> >
> > This can be done with the current API, although it may not look so
> > straightforward. You just need to create a scatter list of user pages
> > (these can be gathered with get_user_pages function) and use dma_map_sg()
> > function. If the dma-mapping support iommu, it can map all these pages
> > into a single contiguous buffer on device (DMA) address space.
> >
> > Some additional 'magic' might be required to get access to pages that are
> > mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
> >
> > I will try to implement this feature in videobuf2-dma-config allocator
> > together with the next version of my patches for dma-mapping&iommu.
> 
> With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering arguments
> for trying to map user pages. Given that our ultimate goal here is buffer
> sharing, user allocated pages have limited value and appeal. If anything, I
> vote that this be a far lower priority compared to the rest of the win you
> have here.

I agree. Mapping user-allocated buffers is extremely hard to get right
when there are extra constraints. If it doesn't work already for some driver,
I wouldn't put too much effort into making it work for more special cases.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-24 15:20             ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:20 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: Marek Szyprowski, 'Subash Patel',
	linux-arch, 'Russell King - ARM Linux',
	'Joerg Roedel',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	linux-arm-kernel

On Wednesday 22 June 2011, Jordan Crouse wrote:
> >> I have a query in similar lines, but related to user virtual address
> >> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
> >> a user allocated buffer into the hardware?
> >
> > This can be done with the current API, although it may not look so
> > straightforward. You just need to create a scatter list of user pages
> > (these can be gathered with get_user_pages function) and use dma_map_sg()
> > function. If the dma-mapping support iommu, it can map all these pages
> > into a single contiguous buffer on device (DMA) address space.
> >
> > Some additional 'magic' might be required to get access to pages that are
> > mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
> >
> > I will try to implement this feature in videobuf2-dma-config allocator
> > together with the next version of my patches for dma-mapping&iommu.
> 
> With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering arguments
> for trying to map user pages. Given that our ultimate goal here is buffer
> sharing, user allocated pages have limited value and appeal. If anything, I
> vote that this be a far lower priority compared to the rest of the win you
> have here.

I agree. Mapping user-allocated buffers is extremely hard to get right
when there are extra constraints. If it doesn't work already for some driver,
I wouldn't put too much effort into making it work for more special cases.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-24 15:20             ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 22 June 2011, Jordan Crouse wrote:
> >> I have a query in similar lines, but related to user virtual address
> >> space. Is it feasible to extend these DMA interfaces(and IOMMU), to map
> >> a user allocated buffer into the hardware?
> >
> > This can be done with the current API, although it may not look so
> > straightforward. You just need to create a scatter list of user pages
> > (these can be gathered with get_user_pages function) and use dma_map_sg()
> > function. If the dma-mapping support iommu, it can map all these pages
> > into a single contiguous buffer on device (DMA) address space.
> >
> > Some additional 'magic' might be required to get access to pages that are
> > mapped with pure PFN (VM_PFNMAP flag), but imho it still can be done.
> >
> > I will try to implement this feature in videobuf2-dma-config allocator
> > together with the next version of my patches for dma-mapping&iommu.
> 
> With luck DMA_ATTRIB_NO_KERNEL_MAPPING should remove any lingering arguments
> for trying to map user pages. Given that our ultimate goal here is buffer
> sharing, user allocated pages have limited value and appeal. If anything, I
> vote that this be a far lower priority compared to the rest of the win you
> have here.

I agree. Mapping user-allocated buffers is extremely hard to get right
when there are extra constraints. If it doesn't work already for some driver,
I wouldn't put too much effort into making it work for more special cases.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-24 15:24         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:24 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

On Monday 20 June 2011, Marek Szyprowski wrote:
> > This also breaks dmabounce when used with a highmem-enabled system -
> > dmabounce refuses the dma_map_page() API but allows the dma_map_single()
> > API.
> 
> I really not sure how this change will break dma bounce code. 
> 
> Does it mean that it is allowed to call dma_map_single() on kmapped HIGH_MEM 
> page?

dma_map_single on a kmapped page already doesn't work, the argument needs to
be inside of the linear mapping in order for virt_to_page to work.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-24 15:24         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:24 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

On Monday 20 June 2011, Marek Szyprowski wrote:
> > This also breaks dmabounce when used with a highmem-enabled system -
> > dmabounce refuses the dma_map_page() API but allows the dma_map_single()
> > API.
> 
> I really not sure how this change will break dma bounce code. 
> 
> Does it mean that it is allowed to call dma_map_single() on kmapped HIGH_MEM 
> page?

dma_map_single on a kmapped page already doesn't work, the argument needs to
be inside of the linear mapping in order for virt_to_page to work.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-24 15:24         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 20 June 2011, Marek Szyprowski wrote:
> > This also breaks dmabounce when used with a highmem-enabled system -
> > dmabounce refuses the dma_map_page() API but allows the dma_map_single()
> > API.
> 
> I really not sure how this change will break dma bounce code. 
> 
> Does it mean that it is allowed to call dma_map_single() on kmapped HIGH_MEM 
> page?

dma_map_single on a kmapped page already doesn't work, the argument needs to
be inside of the linear mapping in order for virt_to_page to work.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-24 15:36     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arch, Russell King - ARM Linux, Joerg Roedel,
	linaro-mm-sig, linux-mm, Kyungmin Park, linux-arm-kernel

On Monday 20 June 2011, Marek Szyprowski wrote:
> This patch modifies dma-mapping implementation on ARM architecture to
> use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> helpers.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

This is a good idea in general, but I have a few concerns about details:

First of all, should we only allow using dma_map_ops on ARM, or do we
also want to support a case where these are all inlined as before?

I suppose for the majority of the cases, the overhead of the indirect
function call is near-zero, compared to the overhead of the cache
management operation, so it would only make a difference for coherent
systems without an IOMMU. Do we care about micro-optimizing those?

> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 799669d..f4e4968 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -10,6 +10,27 @@
>  #include <asm-generic/dma-coherent.h>
>  #include <asm/memory.h>
>  
> +extern struct dma_map_ops dma_ops;
> +
> +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> +{
> +	if (dev->archdata.dma_ops)
> +		return dev->archdata.dma_ops;
> +	return &dma_ops;
> +}

I would not name the global structure just 'dma_ops', the identifier could
too easily conflict with a local variable in some driver. How about
arm_dma_ops or linear_dma_ops instead?

>  /*
>   * The scatter list versions of the above methods.
>   */
> -extern int dma_map_sg(struct device *, struct scatterlist *, int,
> -		enum dma_data_direction);
> -extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
> +extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> +		enum dma_data_direction, struct dma_attrs *attrs);
> +extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> +		enum dma_data_direction, struct dma_attrs *attrs);
> +extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
>  		enum dma_data_direction);
> -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
> +extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
>  		enum dma_data_direction);
> -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
> -		enum dma_data_direction);
> -

You should not need to make these symbols visible in the header file any
more unless they are used outside of the main file later.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-24 15:36     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Joerg Roedel, Russell King - ARM Linux

On Monday 20 June 2011, Marek Szyprowski wrote:
> This patch modifies dma-mapping implementation on ARM architecture to
> use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> helpers.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

This is a good idea in general, but I have a few concerns about details:

First of all, should we only allow using dma_map_ops on ARM, or do we
also want to support a case where these are all inlined as before?

I suppose for the majority of the cases, the overhead of the indirect
function call is near-zero, compared to the overhead of the cache
management operation, so it would only make a difference for coherent
systems without an IOMMU. Do we care about micro-optimizing those?

> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 799669d..f4e4968 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -10,6 +10,27 @@
>  #include <asm-generic/dma-coherent.h>
>  #include <asm/memory.h>
>  
> +extern struct dma_map_ops dma_ops;
> +
> +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> +{
> +	if (dev->archdata.dma_ops)
> +		return dev->archdata.dma_ops;
> +	return &dma_ops;
> +}

I would not name the global structure just 'dma_ops', the identifier could
too easily conflict with a local variable in some driver. How about
arm_dma_ops or linear_dma_ops instead?

>  /*
>   * The scatter list versions of the above methods.
>   */
> -extern int dma_map_sg(struct device *, struct scatterlist *, int,
> -		enum dma_data_direction);
> -extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
> +extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> +		enum dma_data_direction, struct dma_attrs *attrs);
> +extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> +		enum dma_data_direction, struct dma_attrs *attrs);
> +extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
>  		enum dma_data_direction);
> -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
> +extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
>  		enum dma_data_direction);
> -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
> -		enum dma_data_direction);
> -

You should not need to make these symbols visible in the header file any
more unless they are used outside of the main file later.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-24 15:36     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:36 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Joerg Roedel, Russell King - ARM Linux

On Monday 20 June 2011, Marek Szyprowski wrote:
> This patch modifies dma-mapping implementation on ARM architecture to
> use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> helpers.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

This is a good idea in general, but I have a few concerns about details:

First of all, should we only allow using dma_map_ops on ARM, or do we
also want to support a case where these are all inlined as before?

I suppose for the majority of the cases, the overhead of the indirect
function call is near-zero, compared to the overhead of the cache
management operation, so it would only make a difference for coherent
systems without an IOMMU. Do we care about micro-optimizing those?

> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 799669d..f4e4968 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -10,6 +10,27 @@
>  #include <asm-generic/dma-coherent.h>
>  #include <asm/memory.h>
>  
> +extern struct dma_map_ops dma_ops;
> +
> +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> +{
> +	if (dev->archdata.dma_ops)
> +		return dev->archdata.dma_ops;
> +	return &dma_ops;
> +}

I would not name the global structure just 'dma_ops', the identifier could
too easily conflict with a local variable in some driver. How about
arm_dma_ops or linear_dma_ops instead?

>  /*
>   * The scatter list versions of the above methods.
>   */
> -extern int dma_map_sg(struct device *, struct scatterlist *, int,
> -		enum dma_data_direction);
> -extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
> +extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> +		enum dma_data_direction, struct dma_attrs *attrs);
> +extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> +		enum dma_data_direction, struct dma_attrs *attrs);
> +extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
>  		enum dma_data_direction);
> -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
> +extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
>  		enum dma_data_direction);
> -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
> -		enum dma_data_direction);
> -

You should not need to make these symbols visible in the header file any
more unless they are used outside of the main file later.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-24 15:36     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 20 June 2011, Marek Szyprowski wrote:
> This patch modifies dma-mapping implementation on ARM architecture to
> use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> helpers.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

This is a good idea in general, but I have a few concerns about details:

First of all, should we only allow using dma_map_ops on ARM, or do we
also want to support a case where these are all inlined as before?

I suppose for the majority of the cases, the overhead of the indirect
function call is near-zero, compared to the overhead of the cache
management operation, so it would only make a difference for coherent
systems without an IOMMU. Do we care about micro-optimizing those?

> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 799669d..f4e4968 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -10,6 +10,27 @@
>  #include <asm-generic/dma-coherent.h>
>  #include <asm/memory.h>
>  
> +extern struct dma_map_ops dma_ops;
> +
> +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> +{
> +	if (dev->archdata.dma_ops)
> +		return dev->archdata.dma_ops;
> +	return &dma_ops;
> +}

I would not name the global structure just 'dma_ops', the identifier could
too easily conflict with a local variable in some driver. How about
arm_dma_ops or linear_dma_ops instead?

>  /*
>   * The scatter list versions of the above methods.
>   */
> -extern int dma_map_sg(struct device *, struct scatterlist *, int,
> -		enum dma_data_direction);
> -extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
> +extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> +		enum dma_data_direction, struct dma_attrs *attrs);
> +extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> +		enum dma_data_direction, struct dma_attrs *attrs);
> +extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
>  		enum dma_data_direction);
> -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
> +extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
>  		enum dma_data_direction);
> -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int,
> -		enum dma_data_direction);
> -

You should not need to make these symbols visible in the header file any
more unless they are used outside of the main file later.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-24 15:47         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:47 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

On Monday 20 June 2011, Marek Szyprowski wrote:
> On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:
> 
> > On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > > This patch removes dma bounce hooks from the common dma mapping
> > > implementation on ARM architecture and creates a separate set of
> > > dma_map_ops for dma bounce devices.
> > 
> > Why all this additional indirection for no gain?
> 
> I've did it to really separate dmabounce code and let it be completely 
> independent of particular internal functions of the main generic dma-mapping
> code.
> 
> dmabounce is just one of possible dma-mapping implementation and it is really
> convenient to have it closed into common interface (dma_map_ops) rather than
> having it spread around and hardcoded behind some #ifdefs in generic ARM
> dma-mapping.
> 
> There will be also other dma-mapping implementations in the future - I 
> thinking mainly of some iommu capable versions. 
> 
> In terms of speed I really doubt that these changes have any impact on the
> system performance, but they significantly improves the code readability 
> (see next patch with cleanup of dma-mapping.c).

Yes. I believe the main effect of splitting out dmabounce into its own
set of operations is improved readability for people that are not
familiar with the existing code (which excludes Russell ;-) ), by
separating the two codepaths and losing various #ifdef.

The simplification becomes more obvious when you look at patch 6, which
removes a lot of the code that becomes redundant after this one.

Still, patches 5 and 6 are certainly not essential, nothing depends on
that and if Russell still doesn't like them, they can easily be dropped.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-24 15:47         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:47 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

On Monday 20 June 2011, Marek Szyprowski wrote:
> On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:
> 
> > On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > > This patch removes dma bounce hooks from the common dma mapping
> > > implementation on ARM architecture and creates a separate set of
> > > dma_map_ops for dma bounce devices.
> > 
> > Why all this additional indirection for no gain?
> 
> I've did it to really separate dmabounce code and let it be completely 
> independent of particular internal functions of the main generic dma-mapping
> code.
> 
> dmabounce is just one of possible dma-mapping implementation and it is really
> convenient to have it closed into common interface (dma_map_ops) rather than
> having it spread around and hardcoded behind some #ifdefs in generic ARM
> dma-mapping.
> 
> There will be also other dma-mapping implementations in the future - I 
> thinking mainly of some iommu capable versions. 
> 
> In terms of speed I really doubt that these changes have any impact on the
> system performance, but they significantly improves the code readability 
> (see next patch with cleanup of dma-mapping.c).

Yes. I believe the main effect of splitting out dmabounce into its own
set of operations is improved readability for people that are not
familiar with the existing code (which excludes Russell ;-) ), by
separating the two codepaths and losing various #ifdef.

The simplification becomes more obvious when you look at patch 6, which
removes a lot of the code that becomes redundant after this one.

Still, patches 5 and 6 are certainly not essential, nothing depends on
that and if Russell still doesn't like them, they can easily be dropped.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-24 15:47         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 20 June 2011, Marek Szyprowski wrote:
> On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:
> 
> > On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > > This patch removes dma bounce hooks from the common dma mapping
> > > implementation on ARM architecture and creates a separate set of
> > > dma_map_ops for dma bounce devices.
> > 
> > Why all this additional indirection for no gain?
> 
> I've did it to really separate dmabounce code and let it be completely 
> independent of particular internal functions of the main generic dma-mapping
> code.
> 
> dmabounce is just one of possible dma-mapping implementation and it is really
> convenient to have it closed into common interface (dma_map_ops) rather than
> having it spread around and hardcoded behind some #ifdefs in generic ARM
> dma-mapping.
> 
> There will be also other dma-mapping implementations in the future - I 
> thinking mainly of some iommu capable versions. 
> 
> In terms of speed I really doubt that these changes have any impact on the
> system performance, but they significantly improves the code readability 
> (see next patch with cleanup of dma-mapping.c).

Yes. I believe the main effect of splitting out dmabounce into its own
set of operations is improved readability for people that are not
familiar with the existing code (which excludes Russell ;-) ), by
separating the two codepaths and losing various #ifdef.

The simplification becomes more obvious when you look at patch 6, which
removes a lot of the code that becomes redundant after this one.

Still, patches 5 and 6 are certainly not essential, nothing depends on
that and if Russell still doesn't like them, they can easily be dropped.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 15:51     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:51 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Joerg Roedel, Russell King - ARM Linux

On Monday 20 June 2011, Marek Szyprowski wrote:
> Introduce new alloc/free/mmap methods that take attributes argument.
> alloc/free_coherent can be implemented on top of the new alloc/free
> calls with NULL attributes. dma_alloc_non_coherent can be implemented
> using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> get more generic, platform independent way of allocating dma memory
> buffers with specific parameters.
> 
> One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> such attribute will not have valid kernel virtual address. They might be
> usefull for drivers that only exports the DMA buffers to userspace (like
> for example V4L2 or ALSA).
> 
> mmap method is introduced to let the drivers create a user space mapping
> for a DMA buffer in generic, architecture independent way.
> 
> TODO: update all dma_map_ops clients for all architectures
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Yes, I think that is good, but the change needs to be done atomically
across all architectures. This should be easy enough as I believe
all other architectures that use dma_map_ops don't even require
dma_alloc_noncoherent but just define it to dma_alloc_coherent
because they have only coherent memory in regular device drivers.

On a related note, do you plan to make the CMA work use this
transparently, or do you want to have a DMA_ATTR_LARGE or
DMA_ATTR_CONTIGUOUS for CMA?

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 15:51     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:51 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Joerg Roedel, Russell King - ARM Linux

On Monday 20 June 2011, Marek Szyprowski wrote:
> Introduce new alloc/free/mmap methods that take attributes argument.
> alloc/free_coherent can be implemented on top of the new alloc/free
> calls with NULL attributes. dma_alloc_non_coherent can be implemented
> using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> get more generic, platform independent way of allocating dma memory
> buffers with specific parameters.
> 
> One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> such attribute will not have valid kernel virtual address. They might be
> usefull for drivers that only exports the DMA buffers to userspace (like
> for example V4L2 or ALSA).
> 
> mmap method is introduced to let the drivers create a user space mapping
> for a DMA buffer in generic, architecture independent way.
> 
> TODO: update all dma_map_ops clients for all architectures
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Yes, I think that is good, but the change needs to be done atomically
across all architectures. This should be easy enough as I believe
all other architectures that use dma_map_ops don't even require
dma_alloc_noncoherent but just define it to dma_alloc_coherent
because they have only coherent memory in regular device drivers.

On a related note, do you plan to make the CMA work use this
transparently, or do you want to have a DMA_ATTR_LARGE or
DMA_ATTR_CONTIGUOUS for CMA?

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 15:51     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 20 June 2011, Marek Szyprowski wrote:
> Introduce new alloc/free/mmap methods that take attributes argument.
> alloc/free_coherent can be implemented on top of the new alloc/free
> calls with NULL attributes. dma_alloc_non_coherent can be implemented
> using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> get more generic, platform independent way of allocating dma memory
> buffers with specific parameters.
> 
> One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> such attribute will not have valid kernel virtual address. They might be
> usefull for drivers that only exports the DMA buffers to userspace (like
> for example V4L2 or ALSA).
> 
> mmap method is introduced to let the drivers create a user space mapping
> for a DMA buffer in generic, architecture independent way.
> 
> TODO: update all dma_map_ops clients for all architectures
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

Yes, I think that is good, but the change needs to be done atomically
across all architectures. This should be easy enough as I believe
all other architectures that use dma_map_ops don't even require
dma_alloc_noncoherent but just define it to dma_alloc_coherent
because they have only coherent memory in regular device drivers.

On a related note, do you plan to make the CMA work use this
transparently, or do you want to have a DMA_ATTR_LARGE or
DMA_ATTR_CONTIGUOUS for CMA?

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 15:53     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Joerg Roedel, Russell King - ARM Linux

On Monday 20 June 2011, Marek Szyprowski wrote:
> mmap method is introduced to let the drivers create a user space mapping
> for a DMA buffer in generic, architecture independent way.

One more thing: please split out the mmap change into a separate patch.
I sense that there might be some objections to that, and it's better
to let people know about it early on than having them complain
later when that has already been merged.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 15:53     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Joerg Roedel, Russell King - ARM Linux

On Monday 20 June 2011, Marek Szyprowski wrote:
> mmap method is introduced to let the drivers create a user space mapping
> for a DMA buffer in generic, architecture independent way.

One more thing: please split out the mmap change into a separate patch.
I sense that there might be some objections to that, and it's better
to let people know about it early on than having them complain
later when that has already been merged.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 15:53     ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 15:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 20 June 2011, Marek Szyprowski wrote:
> mmap method is introduced to let the drivers create a user space mapping
> for a DMA buffer in generic, architecture independent way.

One more thing: please split out the mmap change into a separate patch.
I sense that there might be some objections to that, and it's better
to let people know about it early on than having them complain
later when that has already been merged.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-24 15:51     ` Arnd Bergmann
  (?)
@ 2011-06-24 16:15       ` James Bottomley
  -1 siblings, 0 replies; 201+ messages in thread
From: James Bottomley @ 2011-06-24 16:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-arm-kernel, linaro-mm-sig, linux-mm,
	linux-arch, Kyungmin Park, Joerg Roedel,
	Russell King - ARM Linux

On Fri, 2011-06-24 at 17:51 +0200, Arnd Bergmann wrote:
> On Monday 20 June 2011, Marek Szyprowski wrote:
> > Introduce new alloc/free/mmap methods that take attributes argument.
> > alloc/free_coherent can be implemented on top of the new alloc/free
> > calls with NULL attributes. dma_alloc_non_coherent can be implemented
> > using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> > use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> > get more generic, platform independent way of allocating dma memory
> > buffers with specific parameters.
> > 
> > One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> > such attribute will not have valid kernel virtual address. They might be
> > usefull for drivers that only exports the DMA buffers to userspace (like
> > for example V4L2 or ALSA).
> > 
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> > 
> > TODO: update all dma_map_ops clients for all architectures
> > 
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Yes, I think that is good, but the change needs to be done atomically
> across all architectures. This should be easy enough as I believe
> all other architectures that use dma_map_ops don't even require
> dma_alloc_noncoherent

This statement is definitely not true of parisc, and also, I believe,
not true of sh, so that would have to figure in the conversion work too.

James


>  but just define it to dma_alloc_coherent
> because they have only coherent memory in regular device drivers.
> 
> On a related note, do you plan to make the CMA work use this
> transparently, or do you want to have a DMA_ATTR_LARGE or
> DMA_ATTR_CONTIGUOUS for CMA?
> 
> 	Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 16:15       ` James Bottomley
  0 siblings, 0 replies; 201+ messages in thread
From: James Bottomley @ 2011-06-24 16:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-arm-kernel, linaro-mm-sig, linux-mm,
	linux-arch, Kyungmin Park, Joerg Roedel,
	Russell King - ARM Linux

On Fri, 2011-06-24 at 17:51 +0200, Arnd Bergmann wrote:
> On Monday 20 June 2011, Marek Szyprowski wrote:
> > Introduce new alloc/free/mmap methods that take attributes argument.
> > alloc/free_coherent can be implemented on top of the new alloc/free
> > calls with NULL attributes. dma_alloc_non_coherent can be implemented
> > using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> > use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> > get more generic, platform independent way of allocating dma memory
> > buffers with specific parameters.
> > 
> > One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> > such attribute will not have valid kernel virtual address. They might be
> > usefull for drivers that only exports the DMA buffers to userspace (like
> > for example V4L2 or ALSA).
> > 
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> > 
> > TODO: update all dma_map_ops clients for all architectures
> > 
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Yes, I think that is good, but the change needs to be done atomically
> across all architectures. This should be easy enough as I believe
> all other architectures that use dma_map_ops don't even require
> dma_alloc_noncoherent

This statement is definitely not true of parisc, and also, I believe,
not true of sh, so that would have to figure in the conversion work too.

James


>  but just define it to dma_alloc_coherent
> because they have only coherent memory in regular device drivers.
> 
> On a related note, do you plan to make the CMA work use this
> transparently, or do you want to have a DMA_ATTR_LARGE or
> DMA_ATTR_CONTIGUOUS for CMA?
> 
> 	Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 16:15       ` James Bottomley
  0 siblings, 0 replies; 201+ messages in thread
From: James Bottomley @ 2011-06-24 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2011-06-24 at 17:51 +0200, Arnd Bergmann wrote:
> On Monday 20 June 2011, Marek Szyprowski wrote:
> > Introduce new alloc/free/mmap methods that take attributes argument.
> > alloc/free_coherent can be implemented on top of the new alloc/free
> > calls with NULL attributes. dma_alloc_non_coherent can be implemented
> > using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> > use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> > get more generic, platform independent way of allocating dma memory
> > buffers with specific parameters.
> > 
> > One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> > such attribute will not have valid kernel virtual address. They might be
> > usefull for drivers that only exports the DMA buffers to userspace (like
> > for example V4L2 or ALSA).
> > 
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> > 
> > TODO: update all dma_map_ops clients for all architectures
> > 
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Yes, I think that is good, but the change needs to be done atomically
> across all architectures. This should be easy enough as I believe
> all other architectures that use dma_map_ops don't even require
> dma_alloc_noncoherent

This statement is definitely not true of parisc, and also, I believe,
not true of sh, so that would have to figure in the conversion work too.

James


>  but just define it to dma_alloc_coherent
> because they have only coherent memory in regular device drivers.
> 
> On a related note, do you plan to make the CMA work use this
> transparently, or do you want to have a DMA_ATTR_LARGE or
> DMA_ATTR_CONTIGUOUS for CMA?
> 
> 	Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-24 16:15       ` James Bottomley
  (?)
@ 2011-06-24 16:23         ` Arnd Bergmann
  -1 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 16:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Marek Szyprowski, linux-arm-kernel, linaro-mm-sig, linux-mm,
	linux-arch, Kyungmin Park, Joerg Roedel,
	Russell King - ARM Linux

On Friday 24 June 2011, James Bottomley wrote:
> On Fri, 2011-06-24 at 17:51 +0200, Arnd Bergmann wrote:
> > Yes, I think that is good, but the change needs to be done atomically
> > across all architectures. This should be easy enough as I believe
> > all other architectures that use dma_map_ops don't even require
> > dma_alloc_noncoherent
> 
> This statement is definitely not true of parisc, and also, I believe,
> not true of sh, so that would have to figure in the conversion work too.

As far as I can tell, parisc uses its own hppa_dma_ops, not
dma_map_ops, and arch/sh/include/asm/dma-mapping.h contains an
unconditional

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)

If you want to change parisc to use dma_map_ops then I would suggest
adding another attribute for alloc_noncoherent.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 16:23         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 16:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Marek Szyprowski, linux-arm-kernel, linaro-mm-sig, linux-mm,
	linux-arch, Kyungmin Park, Joerg Roedel,
	Russell King - ARM Linux

On Friday 24 June 2011, James Bottomley wrote:
> On Fri, 2011-06-24 at 17:51 +0200, Arnd Bergmann wrote:
> > Yes, I think that is good, but the change needs to be done atomically
> > across all architectures. This should be easy enough as I believe
> > all other architectures that use dma_map_ops don't even require
> > dma_alloc_noncoherent
> 
> This statement is definitely not true of parisc, and also, I believe,
> not true of sh, so that would have to figure in the conversion work too.

As far as I can tell, parisc uses its own hppa_dma_ops, not
dma_map_ops, and arch/sh/include/asm/dma-mapping.h contains an
unconditional

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)

If you want to change parisc to use dma_map_ops then I would suggest
adding another attribute for alloc_noncoherent.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-24 16:23         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-24 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 24 June 2011, James Bottomley wrote:
> On Fri, 2011-06-24 at 17:51 +0200, Arnd Bergmann wrote:
> > Yes, I think that is good, but the change needs to be done atomically
> > across all architectures. This should be easy enough as I believe
> > all other architectures that use dma_map_ops don't even require
> > dma_alloc_noncoherent
> 
> This statement is definitely not true of parisc, and also, I believe,
> not true of sh, so that would have to figure in the conversion work too.

As far as I can tell, parisc uses its own hppa_dma_ops, not
dma_map_ops, and arch/sh/include/asm/dma-mapping.h contains an
unconditional

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)

If you want to change parisc to use dma_map_ops then I would suggest
adding another attribute for alloc_noncoherent.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-25  5:23                   ` Jonathan Morton
  0 siblings, 0 replies; 201+ messages in thread
From: Jonathan Morton @ 2011-06-25  5:23 UTC (permalink / raw)
  To: M.K.Edwards
  Cc: Subash Patel, Jordan Crouse, Marek Szyprowski, linux-arch,
	linaro-mm-sig, linux-arm-kernel, linux-mm

On 24 June 2011 01:09, Michael K. Edwards <m.k.edwards@gmail.com> wrote:
> Jonathan -
>
> I'm inviting you to this conversation (and to linaro-mm-sig, if you'd
> care to participate!), because I'd really like your commentary on what
> it takes to make write-combining fully effective on various ARMv7
> implementations.

Thanks for the invite.  I'm not fully conversant with the kernel-level
intricacy, but I do know what application code sees, and I have a
fairly good idea of what happens at the SDRAM pinout.

> Getting full write-combining performance on Intel architectures
> involves a somewhat delicate dance:
>  http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/

At a high level, that looks like a pretty effective technique (and
well explained) that works cross-platform with detail changes.
However, that describes *read* combining on an uncached area, rather
than write combining.

Write combining is easy - in my experience it Just Works on ARMv7 SoCs
in general.  In practice, I've found that you can write pretty much
anything to uncached memory and the write-combiner will deal with it
fairly intelligently.  Small writes sufficiently close together in
time and contiguous in area will be combined reliably.  This assumes
that the region is marked write-combinable, which should always be the
case for plain SDRAM.  So memcpy() *to* an uncached zone works okay,
even wtih an alignment mismatch.

Read combining is much harder as soon as you turn off the cache, which
defeats all of the nice auto-prefetching mechanisms that tend to be
built into modern caches.  Even the newest ARM SoCs disable any and
all speculative behaviour for uncached reads - it is then not possible
to set up a "streaming read" even explicitly (even though you could
reasonably express such using PLD).

There will typically be a full cache-miss latency per instruction (20+
cycles on A8), even if the addresses are exactly sequential.  If they
are not sequential, or if the memory controller does a read or write
to somewhere else in the meantime, you also get the CAS or RAS
latencies of about 25ns each, which hurt badly (CAS and RAS have not
sped up appreciably, in real terms, since PC133 a decade ago - sense
amps are not so good at fulfilling Moore's Law).  So on a 1GHz
Cortex-A8, you can spend 80 clock cycles waiting for a memory load to
complete - that's about 20 for the memory system to figure out it's
uncacheable and cause the CPU core to replay the instruction twice, 50
waiting for the SDRAM chip to spin up, and another 10 as a fudge
factor and to allow the data to percolate up.

This situation is sufficiently common that I assume (and I tell my
colleagues to assume) that this is the case.  If a vendor were to turn
off write-combining for a memory area, I would complain very loudly to
them once I discovered it.  So far, though, I can only wish that they
would sort out the memory hierarchy to make framebuffer & video reads
better.

I *have* found one vendor who appears to put GPU command buffers in
cached memory, but this necessitates a manual cache cleaning exercise
every time the command buffer is flushed.  This is a substantial
overhead too, but is perhaps easier to optimise.

IMO this whole problem is a hardware design fault.  It's SDRAM
directly wired to the chip; there's nothing going on that the memory
controller doesn't know about.  So why isn't the last cache level part
of / attached to the memory controller, so that it can be used
transparently by all relevant bus masters?  It is, BTW, not only ARM
that gets this wrong, but in the Wintel world there is so much
horsepower to spare that few people notice.

> And I expect something similar to be necessary in order to avoid the
> read-modify-write penalty for write-combining buffers on ARMv7.  (NEON
> store-multiple operations can fill an entire 64-byte entry in the
> victim buffer in one opcode; I don't know whether this is enough to
> stop the L3 memory system from reading the data before clobbering it.)

Ah, now you are talking about store misses to cached memory.

Why 64 bytes?  VLD1 does 32 bytes (4x64b) and VLDM can do 128 bytes
(16x64b).  The latter is, I think, bigger than Intel's fill buffers.
Each of these have exactly equivalent store variants.

The manual for the Cortex-A8 states that store misses in the L1 cache
are sent to the L2 cache; the L2 cache then has validity bits for
every quadword (ie. four validity domains per line), so a 16-byte
store (if aligned) is sufficient to avoid read traffic.  I assume that
the A9 and A5 are at least as sophisticated, not sure about
Snapdragon.

 - Jonathan Morton

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-25  5:23                   ` Jonathan Morton
  0 siblings, 0 replies; 201+ messages in thread
From: Jonathan Morton @ 2011-06-25  5:23 UTC (permalink / raw)
  To: M.K.Edwards
  Cc: Subash Patel, Jordan Crouse, Marek Szyprowski, linux-arch,
	linaro-mm-sig, linux-arm-kernel, linux-mm

On 24 June 2011 01:09, Michael K. Edwards <m.k.edwards@gmail.com> wrote:
> Jonathan -
>
> I'm inviting you to this conversation (and to linaro-mm-sig, if you'd
> care to participate!), because I'd really like your commentary on what
> it takes to make write-combining fully effective on various ARMv7
> implementations.

Thanks for the invite.  I'm not fully conversant with the kernel-level
intricacy, but I do know what application code sees, and I have a
fairly good idea of what happens at the SDRAM pinout.

> Getting full write-combining performance on Intel architectures
> involves a somewhat delicate dance:
>  http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/

At a high level, that looks like a pretty effective technique (and
well explained) that works cross-platform with detail changes.
However, that describes *read* combining on an uncached area, rather
than write combining.

Write combining is easy - in my experience it Just Works on ARMv7 SoCs
in general.  In practice, I've found that you can write pretty much
anything to uncached memory and the write-combiner will deal with it
fairly intelligently.  Small writes sufficiently close together in
time and contiguous in area will be combined reliably.  This assumes
that the region is marked write-combinable, which should always be the
case for plain SDRAM.  So memcpy() *to* an uncached zone works okay,
even wtih an alignment mismatch.

Read combining is much harder as soon as you turn off the cache, which
defeats all of the nice auto-prefetching mechanisms that tend to be
built into modern caches.  Even the newest ARM SoCs disable any and
all speculative behaviour for uncached reads - it is then not possible
to set up a "streaming read" even explicitly (even though you could
reasonably express such using PLD).

There will typically be a full cache-miss latency per instruction (20+
cycles on A8), even if the addresses are exactly sequential.  If they
are not sequential, or if the memory controller does a read or write
to somewhere else in the meantime, you also get the CAS or RAS
latencies of about 25ns each, which hurt badly (CAS and RAS have not
sped up appreciably, in real terms, since PC133 a decade ago - sense
amps are not so good at fulfilling Moore's Law).  So on a 1GHz
Cortex-A8, you can spend 80 clock cycles waiting for a memory load to
complete - that's about 20 for the memory system to figure out it's
uncacheable and cause the CPU core to replay the instruction twice, 50
waiting for the SDRAM chip to spin up, and another 10 as a fudge
factor and to allow the data to percolate up.

This situation is sufficiently common that I assume (and I tell my
colleagues to assume) that this is the case.  If a vendor were to turn
off write-combining for a memory area, I would complain very loudly to
them once I discovered it.  So far, though, I can only wish that they
would sort out the memory hierarchy to make framebuffer & video reads
better.

I *have* found one vendor who appears to put GPU command buffers in
cached memory, but this necessitates a manual cache cleaning exercise
every time the command buffer is flushed.  This is a substantial
overhead too, but is perhaps easier to optimise.

IMO this whole problem is a hardware design fault.  It's SDRAM
directly wired to the chip; there's nothing going on that the memory
controller doesn't know about.  So why isn't the last cache level part
of / attached to the memory controller, so that it can be used
transparently by all relevant bus masters?  It is, BTW, not only ARM
that gets this wrong, but in the Wintel world there is so much
horsepower to spare that few people notice.

> And I expect something similar to be necessary in order to avoid the
> read-modify-write penalty for write-combining buffers on ARMv7.  (NEON
> store-multiple operations can fill an entire 64-byte entry in the
> victim buffer in one opcode; I don't know whether this is enough to
> stop the L3 memory system from reading the data before clobbering it.)

Ah, now you are talking about store misses to cached memory.

Why 64 bytes?  VLD1 does 32 bytes (4x64b) and VLDM can do 128 bytes
(16x64b).  The latter is, I think, bigger than Intel's fill buffers.
Each of these have exactly equivalent store variants.

The manual for the Cortex-A8 states that store misses in the L1 cache
are sent to the L2 cache; the L2 cache then has validity bits for
every quadword (ie. four validity domains per line), so a 16-byte
store (if aligned) is sufficient to avoid read traffic.  I assume that
the A9 and A5 are at least as sophisticated, not sure about
Snapdragon.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-25  5:23                   ` Jonathan Morton
  0 siblings, 0 replies; 201+ messages in thread
From: Jonathan Morton @ 2011-06-25  5:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 24 June 2011 01:09, Michael K. Edwards <m.k.edwards@gmail.com> wrote:
> Jonathan -
>
> I'm inviting you to this conversation (and to linaro-mm-sig, if you'd
> care to participate!), because I'd really like your commentary on what
> it takes to make write-combining fully effective on various ARMv7
> implementations.

Thanks for the invite.  I'm not fully conversant with the kernel-level
intricacy, but I do know what application code sees, and I have a
fairly good idea of what happens at the SDRAM pinout.

> Getting full write-combining performance on Intel architectures
> involves a somewhat delicate dance:
> ?http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/

At a high level, that looks like a pretty effective technique (and
well explained) that works cross-platform with detail changes.
However, that describes *read* combining on an uncached area, rather
than write combining.

Write combining is easy - in my experience it Just Works on ARMv7 SoCs
in general.  In practice, I've found that you can write pretty much
anything to uncached memory and the write-combiner will deal with it
fairly intelligently.  Small writes sufficiently close together in
time and contiguous in area will be combined reliably.  This assumes
that the region is marked write-combinable, which should always be the
case for plain SDRAM.  So memcpy() *to* an uncached zone works okay,
even wtih an alignment mismatch.

Read combining is much harder as soon as you turn off the cache, which
defeats all of the nice auto-prefetching mechanisms that tend to be
built into modern caches.  Even the newest ARM SoCs disable any and
all speculative behaviour for uncached reads - it is then not possible
to set up a "streaming read" even explicitly (even though you could
reasonably express such using PLD).

There will typically be a full cache-miss latency per instruction (20+
cycles on A8), even if the addresses are exactly sequential.  If they
are not sequential, or if the memory controller does a read or write
to somewhere else in the meantime, you also get the CAS or RAS
latencies of about 25ns each, which hurt badly (CAS and RAS have not
sped up appreciably, in real terms, since PC133 a decade ago - sense
amps are not so good at fulfilling Moore's Law).  So on a 1GHz
Cortex-A8, you can spend 80 clock cycles waiting for a memory load to
complete - that's about 20 for the memory system to figure out it's
uncacheable and cause the CPU core to replay the instruction twice, 50
waiting for the SDRAM chip to spin up, and another 10 as a fudge
factor and to allow the data to percolate up.

This situation is sufficiently common that I assume (and I tell my
colleagues to assume) that this is the case.  If a vendor were to turn
off write-combining for a memory area, I would complain very loudly to
them once I discovered it.  So far, though, I can only wish that they
would sort out the memory hierarchy to make framebuffer & video reads
better.

I *have* found one vendor who appears to put GPU command buffers in
cached memory, but this necessitates a manual cache cleaning exercise
every time the command buffer is flushed.  This is a substantial
overhead too, but is perhaps easier to optimise.

IMO this whole problem is a hardware design fault.  It's SDRAM
directly wired to the chip; there's nothing going on that the memory
controller doesn't know about.  So why isn't the last cache level part
of / attached to the memory controller, so that it can be used
transparently by all relevant bus masters?  It is, BTW, not only ARM
that gets this wrong, but in the Wintel world there is so much
horsepower to spare that few people notice.

> And I expect something similar to be necessary in order to avoid the
> read-modify-write penalty for write-combining buffers on ARMv7. ?(NEON
> store-multiple operations can fill an entire 64-byte entry in the
> victim buffer in one opcode; I don't know whether this is enough to
> stop the L3 memory system from reading the data before clobbering it.)

Ah, now you are talking about store misses to cached memory.

Why 64 bytes?  VLD1 does 32 bytes (4x64b) and VLDM can do 128 bytes
(16x64b).  The latter is, I think, bigger than Intel's fill buffers.
Each of these have exactly equivalent store variants.

The manual for the Cortex-A8 states that store misses in the L1 cache
are sent to the L2 cache; the L2 cache then has validity bits for
every quadword (ie. four validity domains per line), so a 16-byte
store (if aligned) is sufficient to avoid read traffic.  I assume that
the A9 and A5 are at least as sophisticated, not sure about
Snapdragon.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-25  5:23                   ` Jonathan Morton
  (?)
@ 2011-06-25  9:55                     ` Michael K. Edwards
  -1 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-25  9:55 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Subash Patel, Jordan Crouse, Marek Szyprowski, linux-arch,
	linaro-mm-sig, linux-arm-kernel, linux-mm

Thanks, Jonathan!  I agree with you that it is fundamentally a
hardware design defect that on-chip devices like GPUs and video
capture/encode/decode/display blocks do not participate in the cache
coherency protocol, and thus the buffers passed to, from, and among
them all have to be mapped uncacheable.  Not, unfortunately, something
likely to change soon, with one or two possible exceptions that I've
heard rumors about ...

With regard to the use of NEON for data moves, I have appended a
snippet of a conversation from the BeagleBoard list that veered off
into a related direction.  (My response is lightly edited, since I
made some stupid errors in the original.)  While this is somewhat
off-topic from Marek's patch set, I think it's relevant to the
question of whether "user-allocated" buffers are an important design
consideration for his otherwise DMA-centric API.  (And more to the
point, buffers allocated suitably for one or more on-chip devices, and
also mapped as uncacheable to userland.)

Perhaps you could correct misapprehensions and fill in gaps?  And
comment on what's likely to be different on other ARMv7-A
implementations?  And then I'll confine myself to review of Marek's
patches, on this thread anyway.  ;-)



On Jun 24, 4:50 am, Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote:
> 2011/6/24 Måns Rullgård <m...@mansr.com>:
>
> > "Edwards, Michael" <m.k.edwa...@gmail.com> writes:
> >> and do have dedicated lanes to memory for the NEON unit
>
> > No core released to date, including the A15, has dedicated memory lanes
> > for NEON.  All the Cortex-A* cores have a common load/store unit for all
> > types of instructions.  Some can do multiple concurrent accesses, but
> > that's orthogonal to this discussion.
>
> Probably he wanted to say that NEON unit from Cortex-A8 can load/store
> 128 bits of data per cycle when accessing L1 cache *memory*, while
> ordinary ARM load/store instructions can't handle more than 64 bits
> per cycle there. This makes sense in the context of this discussion
> because loading data to NEON/VFP registers directly without dragging
> it through ARM registers is not a bad idea.

That's close to what I meant.  The load/store path *to main memory* is
indeed shared.  But within the cache hierarchy, at least on the
Cortex-A8, ARM and NEON take separate paths.  And that's a good thing,
because the ARM stalls on an L1 miss, and it would be rather bad if it
had to wait for a big NEON transfer to complete before it could fill
from L2.  Moreover, the only way to get "streaming" performance
(back-to-back AXI burst transactions) on uncacheable regions is by
using the NEON.  That's almost impossible to determine from the TRM,
but it's there:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/ch09s04s03.html
.  Compare against the LDM/STM section of
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/ch09s01s02.html
.

On the A8, the NEON bypasses the L1 cache, and has a dedicated lane
(probably the wrong word, sorry) into the L2 cache -- or for
uncacheable mappings, *past* the L2 per se to its AXI scheduler.  See
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/ch08s02s02.html
.  In addition, NEON load/store operations can be issued in parallel
with integer code, and there can be as many as 12 NEON reads
outstanding in L2 -- vs. the maximum of 4 total cache line refills and
evictions.  So if you are moving data around without doing non-SIMD
operations on it, and without branching based on its contents, you can
do so without polluting L1 cache, or contending with L1 misses that
hit in L2.

There will be some contention between NEON-side loads and ARM-side L2
misses, but even that is negligible if you issue a preload early
enough (which you should do anyway for fetches that you suspect will
miss L2, because the compiler schedules loads based on the assumption
of an L1 hit; an L1 miss stalls the ARM side until it's satisfied).
Preloads do not have any effect if you miss in TLB, and they don't
force premature evictions from L1 cache (they only load as far as L2).
 And the contention on the write side is negligible thanks to the
write allocation mechanism, except insofar as you may approach
saturation of the AXI interface due to the total rate of L2
evictions/linefills and cache-bypassing traffic -- in which case,
congratulations!  Your code is well tuned and operates at the maximum
rate that the path to main memory permits.

If you are fetching data from an uncacheable region, using the NEON to
trampoline into a cacheable region should be a *huge* win.  Remember,
an L1 miss stalls the ARM side, and the only way to get data into L1
is to fetch and miss.  If you want it to hit in L2, you have to use
the NEON to put it there, by fetching up to 128 bytes at a go from the
uncacheable region (e. g., VLDM r1,{d16-d31}) and storing it to a
cacheable buffer (i. e., only as far as L2, since you write it again
and again without an eviction).  You want to limit fetches from the
ARM side to cacheable regions; otherwise every LDM is a round-trip to
AXI.

The store story is similar.  You want the equivalent of the x86's
magic "fill buffers" -- which avoid the read-modify-write penalty when
writing whole cache lines' worth of data through uncacheable
write-combining mappings, but only if you use cache-bypassing SSE2
writes.  To get it, you need to write from the ARM to cacheable
memory, then load that data to NEON registers and store from there.
That pushes up to two whole cache lines' worth of data at a time down
to the L2 controller, which queues the write without blocking the
NEON.  (This is the only way to get an AXI burst longer than 4 128-bit
transactions without using the preload engine.)

One more nice thing about doing your bulk data transfers this way,
instead of monkeying with some DMA unit (which you probably can't do
in userland anyway), is that there are no explicit cache operations to
deal with.  You don't have to worry about data stalling in L1, because
the NEON loads do peek data *from* L1 even though they don't load data
*to* L1.  (Not unless you turn on the L1NEON bit in the Auxiliary
Control Register, which you don't want to do unless you have no L2
cache, in which case you have a whole different set of problems.)

The Cortex-A9 is a whole different animal, with out-of-order issue on
the ARM side and two automatic prefetch mechanisms (based on detection
of miss patterns at L1 and, in MPCore only, at L2).  It also has a far
less detailed TRM, so I can't begin to analyze its memory hierarchy.
Given that the L2 cache has been hived off to an external unit, and
the penalty for transfers between the ARM and NEON units has been
greatly decreased, I would guess that the NEON goes through the L1
just like the ARM.  That changes the game a little -- the NEON
transfers to/from cacheable memory can now cause eviction of the ARM's
working set from L1 -- but in practice that's probably a wash.  The
basic premise (that you want to do your noncacheable transactions in
big bursts, feasible only from the NEON side) still holds.

> >> -- the compiler can tighten up the execution of rather a lot of code
> >> by trampolining structure fetches and stores through the NEON.
>
> > Do you have any numbers to back this up?  I don't see how going through
> > NEON registers would be faster than direct LDM/STM on any core.
>
> My understanding is that it's exactly the other way around. Using
> hardfp allows to avoid going through ARM registers for floating point
> data, which otherwise might be needed for the sole purpose of
> fulfilling ABI requirements in some cases. You are going a bit
> overboard trying to argue with absolutely everything what Edwards has
> posted :)

Not just for floating point data, but for SIMD integer data as well,
or really anything you want -- as long as you frame it as a
"Homogeneous Aggregate of containerized vectors".  That's an extra 64
bytes of structure that you can pass in, and let the callee decide
whether and when to spill a copy to a cache-line-aligned buffer (so
that it can then fetch the lot to the ARM L1 -- which might as well be
registers, as far as memory latency is concerned -- in one L1 miss).
Or you can do actual float/SIMD operations with the data, and return a
healthy chunk in registers, without ever touching memory.  (To be
precise, per the AAPCS, you can pass in one 64-byte chunk as a
"Homogeneous Aggregate with a Base Type of 128-bit containerized
vectors with four Elements", and return a similar chunk in the same
registers, with either the same or different contents.)

The point is not really to have "more registers"; the integer
"registers" are just names anyway, and the L1 cache is almost as
close.  Nor is it to pass floating point values to and from public
function calls cheaply; that's worth almost nothing on system scale.
Even in code that uses no floating point or SIMD whatever, there are
potentially big gains from:

  * preserving an additional 64 bytes of VFP/NEON state across
functions that don't need big operands or return values, if you are
willing to alter their function signatures to do so (at zero run-time
cost, if you're systematic about it); or alternately:

  * postponing the transfer of up to 64 bytes of operands from the
VFP/NEON bank to the integer side, allowing more time for pending NEON
operations (especially structure loads) to complete;

  * omitting the transfer from NEON to ARM entirely, if the operands
turn out to be unneeded (or simply written elsewhere in memory without
needing to be touched by the ARM);

  * returning up to 64 bytes of results in the VFP/NEON register bank,
possibly from an address that missed in L2, without stalling to wait
for a pending load to complete;

  * and, if you really do have to move those operands to the ARM,
doing so explicitly and efficiently (by spilling the whole block to a
cache-line-aligned buffer in L2, fetching it back into L1 with a
single load, and filling the delay with some other useful work)
instead of in the worst way possible (by transferring them from VFP to
ARM registers, 4 bytes at a time, before entering the function).

> As for NEON vs. LDM/STM. There are indeed no reasons why for example
> NEON memcpy should be faster than LDM/STM for the large memory buffers
> which do not fit caches. But still this is the case for OMAP3, along
> with some of other memory performance related WTF questions.

I hope I've clarified this a bit above.  But don't take my word for
it; these techniques are almost exactly the same as those described in
Intel's cheat sheet at
http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/
, except that there is no need for the equivalent of "fill buffers" /
"write combining buffers" because VLDM/VSTM can move 128 bytes at a
time.  (It's probable that the right micro-optimization is to work in
64-byte chunks and pipeline more deeply; I haven't benchmarked yet.)

> >> If, that is, it can schedule them appropriately to account for
> >> latencies to and from memory as well as the (reduced but non-zero)
> >> latency of VFP<->ARM transfers.
>
> > The out of order issue on A9 and later makes most such tricks unnecessary.
>
> VFP/NEON unit from A9 is still in-order.

True but mostly irrelevant.  If your code is at all tight, and your
working set doesn't fit into L2 cache, all the mere arithmetic
pipelines should be stalled most of the time.  The name of the game is
to race as quickly as possible from one fetch from an uncacheable /
unpredictable address to the next that depends on it, and to get as
high an interleave among such fetch chains as possible.  If your
working set isn't larger than L2 cache, why bother thinking about
performance at all?  Your algorithm could be O(n^3) and coded by
banana slugs, and it would still get the job done.

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-25  9:55                     ` Michael K. Edwards
  0 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-25  9:55 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Subash Patel, Jordan Crouse, Marek Szyprowski, linux-arch,
	linaro-mm-sig, linux-arm-kernel, linux-mm

Thanks, Jonathan!  I agree with you that it is fundamentally a
hardware design defect that on-chip devices like GPUs and video
capture/encode/decode/display blocks do not participate in the cache
coherency protocol, and thus the buffers passed to, from, and among
them all have to be mapped uncacheable.  Not, unfortunately, something
likely to change soon, with one or two possible exceptions that I've
heard rumors about ...

With regard to the use of NEON for data moves, I have appended a
snippet of a conversation from the BeagleBoard list that veered off
into a related direction.  (My response is lightly edited, since I
made some stupid errors in the original.)  While this is somewhat
off-topic from Marek's patch set, I think it's relevant to the
question of whether "user-allocated" buffers are an important design
consideration for his otherwise DMA-centric API.  (And more to the
point, buffers allocated suitably for one or more on-chip devices, and
also mapped as uncacheable to userland.)

Perhaps you could correct misapprehensions and fill in gaps?  And
comment on what's likely to be different on other ARMv7-A
implementations?  And then I'll confine myself to review of Marek's
patches, on this thread anyway.  ;-)



On Jun 24, 4:50 am, Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote:
> 2011/6/24 Måns Rullgård <m...@mansr.com>:
>
> > "Edwards, Michael" <m.k.edwa...@gmail.com> writes:
> >> and do have dedicated lanes to memory for the NEON unit
>
> > No core released to date, including the A15, has dedicated memory lanes
> > for NEON.  All the Cortex-A* cores have a common load/store unit for all
> > types of instructions.  Some can do multiple concurrent accesses, but
> > that's orthogonal to this discussion.
>
> Probably he wanted to say that NEON unit from Cortex-A8 can load/store
> 128 bits of data per cycle when accessing L1 cache *memory*, while
> ordinary ARM load/store instructions can't handle more than 64 bits
> per cycle there. This makes sense in the context of this discussion
> because loading data to NEON/VFP registers directly without dragging
> it through ARM registers is not a bad idea.

That's close to what I meant.  The load/store path *to main memory* is
indeed shared.  But within the cache hierarchy, at least on the
Cortex-A8, ARM and NEON take separate paths.  And that's a good thing,
because the ARM stalls on an L1 miss, and it would be rather bad if it
had to wait for a big NEON transfer to complete before it could fill
from L2.  Moreover, the only way to get "streaming" performance
(back-to-back AXI burst transactions) on uncacheable regions is by
using the NEON.  That's almost impossible to determine from the TRM,
but it's there:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/ch09s04s03.html
.  Compare against the LDM/STM section of
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/ch09s01s02.html
.

On the A8, the NEON bypasses the L1 cache, and has a dedicated lane
(probably the wrong word, sorry) into the L2 cache -- or for
uncacheable mappings, *past* the L2 per se to its AXI scheduler.  See
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/ch08s02s02.html
.  In addition, NEON load/store operations can be issued in parallel
with integer code, and there can be as many as 12 NEON reads
outstanding in L2 -- vs. the maximum of 4 total cache line refills and
evictions.  So if you are moving data around without doing non-SIMD
operations on it, and without branching based on its contents, you can
do so without polluting L1 cache, or contending with L1 misses that
hit in L2.

There will be some contention between NEON-side loads and ARM-side L2
misses, but even that is negligible if you issue a preload early
enough (which you should do anyway for fetches that you suspect will
miss L2, because the compiler schedules loads based on the assumption
of an L1 hit; an L1 miss stalls the ARM side until it's satisfied).
Preloads do not have any effect if you miss in TLB, and they don't
force premature evictions from L1 cache (they only load as far as L2).
 And the contention on the write side is negligible thanks to the
write allocation mechanism, except insofar as you may approach
saturation of the AXI interface due to the total rate of L2
evictions/linefills and cache-bypassing traffic -- in which case,
congratulations!  Your code is well tuned and operates at the maximum
rate that the path to main memory permits.

If you are fetching data from an uncacheable region, using the NEON to
trampoline into a cacheable region should be a *huge* win.  Remember,
an L1 miss stalls the ARM side, and the only way to get data into L1
is to fetch and miss.  If you want it to hit in L2, you have to use
the NEON to put it there, by fetching up to 128 bytes at a go from the
uncacheable region (e. g., VLDM r1,{d16-d31}) and storing it to a
cacheable buffer (i. e., only as far as L2, since you write it again
and again without an eviction).  You want to limit fetches from the
ARM side to cacheable regions; otherwise every LDM is a round-trip to
AXI.

The store story is similar.  You want the equivalent of the x86's
magic "fill buffers" -- which avoid the read-modify-write penalty when
writing whole cache lines' worth of data through uncacheable
write-combining mappings, but only if you use cache-bypassing SSE2
writes.  To get it, you need to write from the ARM to cacheable
memory, then load that data to NEON registers and store from there.
That pushes up to two whole cache lines' worth of data at a time down
to the L2 controller, which queues the write without blocking the
NEON.  (This is the only way to get an AXI burst longer than 4 128-bit
transactions without using the preload engine.)

One more nice thing about doing your bulk data transfers this way,
instead of monkeying with some DMA unit (which you probably can't do
in userland anyway), is that there are no explicit cache operations to
deal with.  You don't have to worry about data stalling in L1, because
the NEON loads do peek data *from* L1 even though they don't load data
*to* L1.  (Not unless you turn on the L1NEON bit in the Auxiliary
Control Register, which you don't want to do unless you have no L2
cache, in which case you have a whole different set of problems.)

The Cortex-A9 is a whole different animal, with out-of-order issue on
the ARM side and two automatic prefetch mechanisms (based on detection
of miss patterns at L1 and, in MPCore only, at L2).  It also has a far
less detailed TRM, so I can't begin to analyze its memory hierarchy.
Given that the L2 cache has been hived off to an external unit, and
the penalty for transfers between the ARM and NEON units has been
greatly decreased, I would guess that the NEON goes through the L1
just like the ARM.  That changes the game a little -- the NEON
transfers to/from cacheable memory can now cause eviction of the ARM's
working set from L1 -- but in practice that's probably a wash.  The
basic premise (that you want to do your noncacheable transactions in
big bursts, feasible only from the NEON side) still holds.

> >> -- the compiler can tighten up the execution of rather a lot of code
> >> by trampolining structure fetches and stores through the NEON.
>
> > Do you have any numbers to back this up?  I don't see how going through
> > NEON registers would be faster than direct LDM/STM on any core.
>
> My understanding is that it's exactly the other way around. Using
> hardfp allows to avoid going through ARM registers for floating point
> data, which otherwise might be needed for the sole purpose of
> fulfilling ABI requirements in some cases. You are going a bit
> overboard trying to argue with absolutely everything what Edwards has
> posted :)

Not just for floating point data, but for SIMD integer data as well,
or really anything you want -- as long as you frame it as a
"Homogeneous Aggregate of containerized vectors".  That's an extra 64
bytes of structure that you can pass in, and let the callee decide
whether and when to spill a copy to a cache-line-aligned buffer (so
that it can then fetch the lot to the ARM L1 -- which might as well be
registers, as far as memory latency is concerned -- in one L1 miss).
Or you can do actual float/SIMD operations with the data, and return a
healthy chunk in registers, without ever touching memory.  (To be
precise, per the AAPCS, you can pass in one 64-byte chunk as a
"Homogeneous Aggregate with a Base Type of 128-bit containerized
vectors with four Elements", and return a similar chunk in the same
registers, with either the same or different contents.)

The point is not really to have "more registers"; the integer
"registers" are just names anyway, and the L1 cache is almost as
close.  Nor is it to pass floating point values to and from public
function calls cheaply; that's worth almost nothing on system scale.
Even in code that uses no floating point or SIMD whatever, there are
potentially big gains from:

  * preserving an additional 64 bytes of VFP/NEON state across
functions that don't need big operands or return values, if you are
willing to alter their function signatures to do so (at zero run-time
cost, if you're systematic about it); or alternately:

  * postponing the transfer of up to 64 bytes of operands from the
VFP/NEON bank to the integer side, allowing more time for pending NEON
operations (especially structure loads) to complete;

  * omitting the transfer from NEON to ARM entirely, if the operands
turn out to be unneeded (or simply written elsewhere in memory without
needing to be touched by the ARM);

  * returning up to 64 bytes of results in the VFP/NEON register bank,
possibly from an address that missed in L2, without stalling to wait
for a pending load to complete;

  * and, if you really do have to move those operands to the ARM,
doing so explicitly and efficiently (by spilling the whole block to a
cache-line-aligned buffer in L2, fetching it back into L1 with a
single load, and filling the delay with some other useful work)
instead of in the worst way possible (by transferring them from VFP to
ARM registers, 4 bytes at a time, before entering the function).

> As for NEON vs. LDM/STM. There are indeed no reasons why for example
> NEON memcpy should be faster than LDM/STM for the large memory buffers
> which do not fit caches. But still this is the case for OMAP3, along
> with some of other memory performance related WTF questions.

I hope I've clarified this a bit above.  But don't take my word for
it; these techniques are almost exactly the same as those described in
Intel's cheat sheet at
http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/
, except that there is no need for the equivalent of "fill buffers" /
"write combining buffers" because VLDM/VSTM can move 128 bytes at a
time.  (It's probable that the right micro-optimization is to work in
64-byte chunks and pipeline more deeply; I haven't benchmarked yet.)

> >> If, that is, it can schedule them appropriately to account for
> >> latencies to and from memory as well as the (reduced but non-zero)
> >> latency of VFP<->ARM transfers.
>
> > The out of order issue on A9 and later makes most such tricks unnecessary.
>
> VFP/NEON unit from A9 is still in-order.

True but mostly irrelevant.  If your code is at all tight, and your
working set doesn't fit into L2 cache, all the mere arithmetic
pipelines should be stalled most of the time.  The name of the game is
to race as quickly as possible from one fetch from an uncacheable /
unpredictable address to the next that depends on it, and to get as
high an interleave among such fetch chains as possible.  If your
working set isn't larger than L2 cache, why bother thinking about
performance at all?  Your algorithm could be O(n^3) and coded by
banana slugs, and it would still get the job done.

Cheers,
- Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-25  9:55                     ` Michael K. Edwards
  0 siblings, 0 replies; 201+ messages in thread
From: Michael K. Edwards @ 2011-06-25  9:55 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks, Jonathan!  I agree with you that it is fundamentally a
hardware design defect that on-chip devices like GPUs and video
capture/encode/decode/display blocks do not participate in the cache
coherency protocol, and thus the buffers passed to, from, and among
them all have to be mapped uncacheable.  Not, unfortunately, something
likely to change soon, with one or two possible exceptions that I've
heard rumors about ...

With regard to the use of NEON for data moves, I have appended a
snippet of a conversation from the BeagleBoard list that veered off
into a related direction.  (My response is lightly edited, since I
made some stupid errors in the original.)  While this is somewhat
off-topic from Marek's patch set, I think it's relevant to the
question of whether "user-allocated" buffers are an important design
consideration for his otherwise DMA-centric API.  (And more to the
point, buffers allocated suitably for one or more on-chip devices, and
also mapped as uncacheable to userland.)

Perhaps you could correct misapprehensions and fill in gaps?  And
comment on what's likely to be different on other ARMv7-A
implementations?  And then I'll confine myself to review of Marek's
patches, on this thread anyway.  ;-)



On Jun 24, 4:50 am, Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote:
> 2011/6/24 M?ns Rullg?rd <m...@mansr.com>:
>
> > "Edwards, Michael" <m.k.edwa...@gmail.com> writes:
> >> and do have dedicated lanes to memory for the NEON unit
>
> > No core released to date, including the A15, has dedicated memory lanes
> > for NEON.  All the Cortex-A* cores have a common load/store unit for all
> > types of instructions.  Some can do multiple concurrent accesses, but
> > that's orthogonal to this discussion.
>
> Probably he wanted to say that NEON unit from Cortex-A8 can load/store
> 128 bits of data per cycle when accessing L1 cache *memory*, while
> ordinary ARM load/store instructions can't handle more than 64 bits
> per cycle there. This makes sense in the context of this discussion
> because loading data to NEON/VFP registers directly without dragging
> it through ARM registers is not a bad idea.

That's close to what I meant.  The load/store path *to main memory* is
indeed shared.  But within the cache hierarchy, at least on the
Cortex-A8, ARM and NEON take separate paths.  And that's a good thing,
because the ARM stalls on an L1 miss, and it would be rather bad if it
had to wait for a big NEON transfer to complete before it could fill
from L2.  Moreover, the only way to get "streaming" performance
(back-to-back AXI burst transactions) on uncacheable regions is by
using the NEON.  That's almost impossible to determine from the TRM,
but it's there:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/ch09s04s03.html
.  Compare against the LDM/STM section of
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/ch09s01s02.html
.

On the A8, the NEON bypasses the L1 cache, and has a dedicated lane
(probably the wrong word, sorry) into the L2 cache -- or for
uncacheable mappings, *past* the L2 per se to its AXI scheduler.  See
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/ch08s02s02.html
.  In addition, NEON load/store operations can be issued in parallel
with integer code, and there can be as many as 12 NEON reads
outstanding in L2 -- vs. the maximum of 4 total cache line refills and
evictions.  So if you are moving data around without doing non-SIMD
operations on it, and without branching based on its contents, you can
do so without polluting L1 cache, or contending with L1 misses that
hit in L2.

There will be some contention between NEON-side loads and ARM-side L2
misses, but even that is negligible if you issue a preload early
enough (which you should do anyway for fetches that you suspect will
miss L2, because the compiler schedules loads based on the assumption
of an L1 hit; an L1 miss stalls the ARM side until it's satisfied).
Preloads do not have any effect if you miss in TLB, and they don't
force premature evictions from L1 cache (they only load as far as L2).
 And the contention on the write side is negligible thanks to the
write allocation mechanism, except insofar as you may approach
saturation of the AXI interface due to the total rate of L2
evictions/linefills and cache-bypassing traffic -- in which case,
congratulations!  Your code is well tuned and operates at the maximum
rate that the path to main memory permits.

If you are fetching data from an uncacheable region, using the NEON to
trampoline into a cacheable region should be a *huge* win.  Remember,
an L1 miss stalls the ARM side, and the only way to get data into L1
is to fetch and miss.  If you want it to hit in L2, you have to use
the NEON to put it there, by fetching up to 128 bytes at a go from the
uncacheable region (e. g., VLDM r1,{d16-d31}) and storing it to a
cacheable buffer (i. e., only as far as L2, since you write it again
and again without an eviction).  You want to limit fetches from the
ARM side to cacheable regions; otherwise every LDM is a round-trip to
AXI.

The store story is similar.  You want the equivalent of the x86's
magic "fill buffers" -- which avoid the read-modify-write penalty when
writing whole cache lines' worth of data through uncacheable
write-combining mappings, but only if you use cache-bypassing SSE2
writes.  To get it, you need to write from the ARM to cacheable
memory, then load that data to NEON registers and store from there.
That pushes up to two whole cache lines' worth of data at a time down
to the L2 controller, which queues the write without blocking the
NEON.  (This is the only way to get an AXI burst longer than 4 128-bit
transactions without using the preload engine.)

One more nice thing about doing your bulk data transfers this way,
instead of monkeying with some DMA unit (which you probably can't do
in userland anyway), is that there are no explicit cache operations to
deal with.  You don't have to worry about data stalling in L1, because
the NEON loads do peek data *from* L1 even though they don't load data
*to* L1.  (Not unless you turn on the L1NEON bit in the Auxiliary
Control Register, which you don't want to do unless you have no L2
cache, in which case you have a whole different set of problems.)

The Cortex-A9 is a whole different animal, with out-of-order issue on
the ARM side and two automatic prefetch mechanisms (based on detection
of miss patterns at L1 and, in MPCore only, at L2).  It also has a far
less detailed TRM, so I can't begin to analyze its memory hierarchy.
Given that the L2 cache has been hived off to an external unit, and
the penalty for transfers between the ARM and NEON units has been
greatly decreased, I would guess that the NEON goes through the L1
just like the ARM.  That changes the game a little -- the NEON
transfers to/from cacheable memory can now cause eviction of the ARM's
working set from L1 -- but in practice that's probably a wash.  The
basic premise (that you want to do your noncacheable transactions in
big bursts, feasible only from the NEON side) still holds.

> >> -- the compiler can tighten up the execution of rather a lot of code
> >> by trampolining structure fetches and stores through the NEON.
>
> > Do you have any numbers to back this up?  I don't see how going through
> > NEON registers would be faster than direct LDM/STM on any core.
>
> My understanding is that it's exactly the other way around. Using
> hardfp allows to avoid going through ARM registers for floating point
> data, which otherwise might be needed for the sole purpose of
> fulfilling ABI requirements in some cases. You are going a bit
> overboard trying to argue with absolutely everything what Edwards has
> posted :)

Not just for floating point data, but for SIMD integer data as well,
or really anything you want -- as long as you frame it as a
"Homogeneous Aggregate of containerized vectors".  That's an extra 64
bytes of structure that you can pass in, and let the callee decide
whether and when to spill a copy to a cache-line-aligned buffer (so
that it can then fetch the lot to the ARM L1 -- which might as well be
registers, as far as memory latency is concerned -- in one L1 miss).
Or you can do actual float/SIMD operations with the data, and return a
healthy chunk in registers, without ever touching memory.  (To be
precise, per the AAPCS, you can pass in one 64-byte chunk as a
"Homogeneous Aggregate with a Base Type of 128-bit containerized
vectors with four Elements", and return a similar chunk in the same
registers, with either the same or different contents.)

The point is not really to have "more registers"; the integer
"registers" are just names anyway, and the L1 cache is almost as
close.  Nor is it to pass floating point values to and from public
function calls cheaply; that's worth almost nothing on system scale.
Even in code that uses no floating point or SIMD whatever, there are
potentially big gains from:

  * preserving an additional 64 bytes of VFP/NEON state across
functions that don't need big operands or return values, if you are
willing to alter their function signatures to do so (at zero run-time
cost, if you're systematic about it); or alternately:

  * postponing the transfer of up to 64 bytes of operands from the
VFP/NEON bank to the integer side, allowing more time for pending NEON
operations (especially structure loads) to complete;

  * omitting the transfer from NEON to ARM entirely, if the operands
turn out to be unneeded (or simply written elsewhere in memory without
needing to be touched by the ARM);

  * returning up to 64 bytes of results in the VFP/NEON register bank,
possibly from an address that missed in L2, without stalling to wait
for a pending load to complete;

  * and, if you really do have to move those operands to the ARM,
doing so explicitly and efficiently (by spilling the whole block to a
cache-line-aligned buffer in L2, fetching it back into L1 with a
single load, and filling the delay with some other useful work)
instead of in the worst way possible (by transferring them from VFP to
ARM registers, 4 bytes at a time, before entering the function).

> As for NEON vs. LDM/STM. There are indeed no reasons why for example
> NEON memcpy should be faster than LDM/STM for the large memory buffers
> which do not fit caches. But still this is the case for OMAP3, along
> with some of other memory performance related WTF questions.

I hope I've clarified this a bit above.  But don't take my word for
it; these techniques are almost exactly the same as those described in
Intel's cheat sheet at
http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/
, except that there is no need for the equivalent of "fill buffers" /
"write combining buffers" because VLDM/VSTM can move 128 bytes at a
time.  (It's probable that the right micro-optimization is to work in
64-byte chunks and pipeline more deeply; I haven't benchmarked yet.)

> >> If, that is, it can schedule them appropriately to account for
> >> latencies to and from memory as well as the (reduced but non-zero)
> >> latency of VFP<->ARM transfers.
>
> > The out of order issue on A9 and later makes most such tricks unnecessary.
>
> VFP/NEON unit from A9 is still in-order.

True but mostly irrelevant.  If your code is at all tight, and your
working set doesn't fit into L2 cache, all the mere arithmetic
pipelines should be stalled most of the time.  The name of the game is
to race as quickly as possible from one fetch from an uncacheable /
unpredictable address to the next that depends on it, and to get as
high an interleave among such fetch chains as possible.  If your
working set isn't larger than L2 cache, why bother thinking about
performance at all?  Your algorithm could be O(n^3) and coded by
banana slugs, and it would still get the job done.

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
  2011-06-25  9:55                     ` Michael K. Edwards
  (?)
@ 2011-06-26  0:06                       ` Jonathan Morton
  -1 siblings, 0 replies; 201+ messages in thread
From: Jonathan Morton @ 2011-06-26  0:06 UTC (permalink / raw)
  To: M.K.Edwards
  Cc: Subash Patel, Jordan Crouse, Marek Szyprowski, linux-arch,
	linaro-mm-sig, linux-arm-kernel, linux-mm

On 25 June 2011 12:55, Michael K. Edwards <m.k.edwards@gmail.com> wrote:
> With regard to the use of NEON for data moves, I have appended a
> snippet of a conversation from the BeagleBoard list that veered off
> into a related direction.  (My response is lightly edited, since I
> made some stupid errors in the original.)  While this is somewhat
> off-topic from Marek's patch set, I think it's relevant to the
> question of whether "user-allocated" buffers are an important design
> consideration for his otherwise DMA-centric API.  (And more to the
> point, buffers allocated suitably for one or more on-chip devices, and
> also mapped as uncacheable to userland.)

As far as userspace is concerned, dealing with the memory hierarchy's
quirks is already pretty much a black art, and that's *before* you
start presenting it with uncached buffers.  The best rule of thumb
userspace can follow is to keep things in cache if they can, and use
the biggest memory-move instructions (and prefetching if available) if
they can't.  Everything else they have to rely on the hardware to
optimise for them.  Indeed, when working in C, you barely even get
*that* level of control (optimised copy routines have been known to
use double simply because it is reliably 64 bits that can be loaded
and stored efficiently), and most other languages are worse.

Small wonder that userspace code that knows it has to work with
uncached buffers sometimes - such as Pixman - relies heavily on
handwritten SIMD assembler.

Video decoders are a particularly fun case, because the correct
solution is actually to DMA the output buffer to the GPU (or, better,
to map one onto the other so that zero-copy semantics result) so that
the CPU doesn't have to touch it.  But then you have to find a common
format that both VPU and GPU support, and you have to have a free DMA
channel and a way to use it.  Frankly though, this is a solution from
the 20th century (remember MPEG2 decoders sitting beside the SVGA
card?).

We *have* had to occasionally deal with hardware where no such common
format could be found, although often this has been due to inadequate
documentation or driver support (a familiar refrain).  In one case I
wrote a NEON NV12-to-RGB32 conversion routine which read directly from
the video buffer and wrote directly into a texture buffer, both of
which were of course uncached.  This halved the CPU consumption of the
video playback applet, but prefixing it with a routine which copied
the video buffer into cached memory (using 32-byte VLD1 instead of
16-byte versions) halved it again.  Profiling showed that the vast
majority of the time was spent in the prefix copy loop.  No doubt if
further savings had been required, I'd have tried using VLDM in the
copy loop.  (There weren't enough registers to widen the load stage of
the conversion routine itself.)

The takeaway from this is that if your code has to read from uncached
memory at all, that will undoubtedly dominate it's performance.  A
read-modify-write cycle is at least as bad (because the memory has to
go through at least one CAS latency and a write-to-read turnaround
before the next read can be serviced).  A pure write is, however, no
problem.

On cached memory, the L2 cache of most modern (even ARM) CPUs has an
auto-prefetcher which will help out with sequential transfers.  This
should get somewhere reasonably close to optimal performance.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-26  0:06                       ` Jonathan Morton
  0 siblings, 0 replies; 201+ messages in thread
From: Jonathan Morton @ 2011-06-26  0:06 UTC (permalink / raw)
  To: M.K.Edwards
  Cc: Subash Patel, Jordan Crouse, Marek Szyprowski, linux-arch,
	linaro-mm-sig, linux-arm-kernel, linux-mm

On 25 June 2011 12:55, Michael K. Edwards <m.k.edwards@gmail.com> wrote:
> With regard to the use of NEON for data moves, I have appended a
> snippet of a conversation from the BeagleBoard list that veered off
> into a related direction.  (My response is lightly edited, since I
> made some stupid errors in the original.)  While this is somewhat
> off-topic from Marek's patch set, I think it's relevant to the
> question of whether "user-allocated" buffers are an important design
> consideration for his otherwise DMA-centric API.  (And more to the
> point, buffers allocated suitably for one or more on-chip devices, and
> also mapped as uncacheable to userland.)

As far as userspace is concerned, dealing with the memory hierarchy's
quirks is already pretty much a black art, and that's *before* you
start presenting it with uncached buffers.  The best rule of thumb
userspace can follow is to keep things in cache if they can, and use
the biggest memory-move instructions (and prefetching if available) if
they can't.  Everything else they have to rely on the hardware to
optimise for them.  Indeed, when working in C, you barely even get
*that* level of control (optimised copy routines have been known to
use double simply because it is reliably 64 bits that can be loaded
and stored efficiently), and most other languages are worse.

Small wonder that userspace code that knows it has to work with
uncached buffers sometimes - such as Pixman - relies heavily on
handwritten SIMD assembler.

Video decoders are a particularly fun case, because the correct
solution is actually to DMA the output buffer to the GPU (or, better,
to map one onto the other so that zero-copy semantics result) so that
the CPU doesn't have to touch it.  But then you have to find a common
format that both VPU and GPU support, and you have to have a free DMA
channel and a way to use it.  Frankly though, this is a solution from
the 20th century (remember MPEG2 decoders sitting beside the SVGA
card?).

We *have* had to occasionally deal with hardware where no such common
format could be found, although often this has been due to inadequate
documentation or driver support (a familiar refrain).  In one case I
wrote a NEON NV12-to-RGB32 conversion routine which read directly from
the video buffer and wrote directly into a texture buffer, both of
which were of course uncached.  This halved the CPU consumption of the
video playback applet, but prefixing it with a routine which copied
the video buffer into cached memory (using 32-byte VLD1 instead of
16-byte versions) halved it again.  Profiling showed that the vast
majority of the time was spent in the prefix copy loop.  No doubt if
further savings had been required, I'd have tried using VLDM in the
copy loop.  (There weren't enough registers to widen the load stage of
the conversion routine itself.)

The takeaway from this is that if your code has to read from uncached
memory at all, that will undoubtedly dominate it's performance.  A
read-modify-write cycle is at least as bad (because the memory has to
go through at least one CAS latency and a write-to-read turnaround
before the next read can be serviced).  A pure write is, however, no
problem.

On cached memory, the L2 cache of most modern (even ARM) CPUs has an
auto-prefetcher which will help out with sequential transfers.  This
should get somewhere reasonably close to optimal performance.

 - Jonathan Morton

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign
@ 2011-06-26  0:06                       ` Jonathan Morton
  0 siblings, 0 replies; 201+ messages in thread
From: Jonathan Morton @ 2011-06-26  0:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 June 2011 12:55, Michael K. Edwards <m.k.edwards@gmail.com> wrote:
> With regard to the use of NEON for data moves, I have appended a
> snippet of a conversation from the BeagleBoard list that veered off
> into a related direction. ?(My response is lightly edited, since I
> made some stupid errors in the original.) ?While this is somewhat
> off-topic from Marek's patch set, I think it's relevant to the
> question of whether "user-allocated" buffers are an important design
> consideration for his otherwise DMA-centric API. ?(And more to the
> point, buffers allocated suitably for one or more on-chip devices, and
> also mapped as uncacheable to userland.)

As far as userspace is concerned, dealing with the memory hierarchy's
quirks is already pretty much a black art, and that's *before* you
start presenting it with uncached buffers.  The best rule of thumb
userspace can follow is to keep things in cache if they can, and use
the biggest memory-move instructions (and prefetching if available) if
they can't.  Everything else they have to rely on the hardware to
optimise for them.  Indeed, when working in C, you barely even get
*that* level of control (optimised copy routines have been known to
use double simply because it is reliably 64 bits that can be loaded
and stored efficiently), and most other languages are worse.

Small wonder that userspace code that knows it has to work with
uncached buffers sometimes - such as Pixman - relies heavily on
handwritten SIMD assembler.

Video decoders are a particularly fun case, because the correct
solution is actually to DMA the output buffer to the GPU (or, better,
to map one onto the other so that zero-copy semantics result) so that
the CPU doesn't have to touch it.  But then you have to find a common
format that both VPU and GPU support, and you have to have a free DMA
channel and a way to use it.  Frankly though, this is a solution from
the 20th century (remember MPEG2 decoders sitting beside the SVGA
card?).

We *have* had to occasionally deal with hardware where no such common
format could be found, although often this has been due to inadequate
documentation or driver support (a familiar refrain).  In one case I
wrote a NEON NV12-to-RGB32 conversion routine which read directly from
the video buffer and wrote directly into a texture buffer, both of
which were of course uncached.  This halved the CPU consumption of the
video playback applet, but prefixing it with a routine which copied
the video buffer into cached memory (using 32-byte VLD1 instead of
16-byte versions) halved it again.  Profiling showed that the vast
majority of the time was spent in the prefix copy loop.  No doubt if
further savings had been required, I'd have tried using VLDM in the
copy loop.  (There weren't enough registers to widen the load stage of
the conversion routine itself.)

The takeaway from this is that if your code has to read from uncached
memory at all, that will undoubtedly dominate it's performance.  A
read-modify-write cycle is at least as bad (because the memory has to
go through at least one CAS latency and a write-to-read turnaround
before the next read can be serviced).  A pure write is, however, no
problem.

On cached memory, the L2 cache of most modern (even ARM) CPUs has an
auto-prefetcher which will help out with sequential transfers.  This
should get somewhere reasonably close to optimal performance.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-27 12:18       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 12:18 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

Hello,

On Friday, June 24, 2011 5:37 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > This patch modifies dma-mapping implementation on ARM architecture to
> > use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> > helpers.
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> This is a good idea in general, but I have a few concerns about details:
> 
> First of all, should we only allow using dma_map_ops on ARM, or do we
> also want to support a case where these are all inlined as before?

I really wonder if it is possible to have a clean implementation of 
dma_map_ops based and linear inlined dma-mapping framework together.
Theoretically it should be possible, but it will end with a lot of
#ifdef hackery which is really hard to follow and understand for 
anyone but the authors.

> I suppose for the majority of the cases, the overhead of the indirect
> function call is near-zero, compared to the overhead of the cache
> management operation, so it would only make a difference for coherent
> systems without an IOMMU. Do we care about micro-optimizing those?

Even in coherent case, the overhead caused by additional function call
should have really negligible impact on drivers performance.

 > > diff --git a/arch/arm/include/asm/dma-mapping.h
> b/arch/arm/include/asm/dma-mapping.h
> > index 799669d..f4e4968 100644
> > --- a/arch/arm/include/asm/dma-mapping.h
> > +++ b/arch/arm/include/asm/dma-mapping.h
> > @@ -10,6 +10,27 @@
> >  #include <asm-generic/dma-coherent.h>
> >  #include <asm/memory.h>
> >
> > +extern struct dma_map_ops dma_ops;
> > +
> > +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> > +{
> > +	if (dev->archdata.dma_ops)
> > +		return dev->archdata.dma_ops;
> > +	return &dma_ops;
> > +}
> 
> I would not name the global structure just 'dma_ops', the identifier could
> too easily conflict with a local variable in some driver. How about
> arm_dma_ops or linear_dma_ops instead?

I'm fine with both of them. Even arm_linear_dma_ops make some sense.

> >  /*
> >   * The scatter list versions of the above methods.
> >   */
> > -extern int dma_map_sg(struct device *, struct scatterlist *, int,
> > -		enum dma_data_direction);
> > -extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
> > +extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> > +		enum dma_data_direction, struct dma_attrs *attrs);
> > +extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> > +		enum dma_data_direction, struct dma_attrs *attrs);
> > +extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist
> *, int,
> >  		enum dma_data_direction);
> > -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *,
> int,
> > +extern void arm_dma_sync_sg_for_device(struct device *, struct
> scatterlist *, int,
> >  		enum dma_data_direction);
> > -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *,
> int,
> > -		enum dma_data_direction);
> > -
> 
> You should not need to make these symbols visible in the header file any
> more unless they are used outside of the main file later.

They are used by the dma bounce code once converted to dma_map_ops framework.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-27 12:18       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 12:18 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

Hello,

On Friday, June 24, 2011 5:37 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > This patch modifies dma-mapping implementation on ARM architecture to
> > use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> > helpers.
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> This is a good idea in general, but I have a few concerns about details:
> 
> First of all, should we only allow using dma_map_ops on ARM, or do we
> also want to support a case where these are all inlined as before?

I really wonder if it is possible to have a clean implementation of 
dma_map_ops based and linear inlined dma-mapping framework together.
Theoretically it should be possible, but it will end with a lot of
#ifdef hackery which is really hard to follow and understand for 
anyone but the authors.

> I suppose for the majority of the cases, the overhead of the indirect
> function call is near-zero, compared to the overhead of the cache
> management operation, so it would only make a difference for coherent
> systems without an IOMMU. Do we care about micro-optimizing those?

Even in coherent case, the overhead caused by additional function call
should have really negligible impact on drivers performance.

 > > diff --git a/arch/arm/include/asm/dma-mapping.h
> b/arch/arm/include/asm/dma-mapping.h
> > index 799669d..f4e4968 100644
> > --- a/arch/arm/include/asm/dma-mapping.h
> > +++ b/arch/arm/include/asm/dma-mapping.h
> > @@ -10,6 +10,27 @@
> >  #include <asm-generic/dma-coherent.h>
> >  #include <asm/memory.h>
> >
> > +extern struct dma_map_ops dma_ops;
> > +
> > +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> > +{
> > +	if (dev->archdata.dma_ops)
> > +		return dev->archdata.dma_ops;
> > +	return &dma_ops;
> > +}
> 
> I would not name the global structure just 'dma_ops', the identifier could
> too easily conflict with a local variable in some driver. How about
> arm_dma_ops or linear_dma_ops instead?

I'm fine with both of them. Even arm_linear_dma_ops make some sense.

> >  /*
> >   * The scatter list versions of the above methods.
> >   */
> > -extern int dma_map_sg(struct device *, struct scatterlist *, int,
> > -		enum dma_data_direction);
> > -extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
> > +extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> > +		enum dma_data_direction, struct dma_attrs *attrs);
> > +extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> > +		enum dma_data_direction, struct dma_attrs *attrs);
> > +extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist
> *, int,
> >  		enum dma_data_direction);
> > -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *,
> int,
> > +extern void arm_dma_sync_sg_for_device(struct device *, struct
> scatterlist *, int,
> >  		enum dma_data_direction);
> > -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *,
> int,
> > -		enum dma_data_direction);
> > -
> 
> You should not need to make these symbols visible in the header file any
> more unless they are used outside of the main file later.

They are used by the dma bounce code once converted to dma_map_ops framework.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-27 12:18       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 12:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, June 24, 2011 5:37 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > This patch modifies dma-mapping implementation on ARM architecture to
> > use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> > helpers.
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> This is a good idea in general, but I have a few concerns about details:
> 
> First of all, should we only allow using dma_map_ops on ARM, or do we
> also want to support a case where these are all inlined as before?

I really wonder if it is possible to have a clean implementation of 
dma_map_ops based and linear inlined dma-mapping framework together.
Theoretically it should be possible, but it will end with a lot of
#ifdef hackery which is really hard to follow and understand for 
anyone but the authors.

> I suppose for the majority of the cases, the overhead of the indirect
> function call is near-zero, compared to the overhead of the cache
> management operation, so it would only make a difference for coherent
> systems without an IOMMU. Do we care about micro-optimizing those?

Even in coherent case, the overhead caused by additional function call
should have really negligible impact on drivers performance.

 > > diff --git a/arch/arm/include/asm/dma-mapping.h
> b/arch/arm/include/asm/dma-mapping.h
> > index 799669d..f4e4968 100644
> > --- a/arch/arm/include/asm/dma-mapping.h
> > +++ b/arch/arm/include/asm/dma-mapping.h
> > @@ -10,6 +10,27 @@
> >  #include <asm-generic/dma-coherent.h>
> >  #include <asm/memory.h>
> >
> > +extern struct dma_map_ops dma_ops;
> > +
> > +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> > +{
> > +	if (dev->archdata.dma_ops)
> > +		return dev->archdata.dma_ops;
> > +	return &dma_ops;
> > +}
> 
> I would not name the global structure just 'dma_ops', the identifier could
> too easily conflict with a local variable in some driver. How about
> arm_dma_ops or linear_dma_ops instead?

I'm fine with both of them. Even arm_linear_dma_ops make some sense.

> >  /*
> >   * The scatter list versions of the above methods.
> >   */
> > -extern int dma_map_sg(struct device *, struct scatterlist *, int,
> > -		enum dma_data_direction);
> > -extern void dma_unmap_sg(struct device *, struct scatterlist *, int,
> > +extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
> > +		enum dma_data_direction, struct dma_attrs *attrs);
> > +extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
> > +		enum dma_data_direction, struct dma_attrs *attrs);
> > +extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist
> *, int,
> >  		enum dma_data_direction);
> > -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *,
> int,
> > +extern void arm_dma_sync_sg_for_device(struct device *, struct
> scatterlist *, int,
> >  		enum dma_data_direction);
> > -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *,
> int,
> > -		enum dma_data_direction);
> > -
> 
> You should not need to make these symbols visible in the header file any
> more unless they are used outside of the main file later.

They are used by the dma bounce code once converted to dma_map_ops framework.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 12:23       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 12:23 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arch, 'Russell King - ARM Linux',
	'Joerg Roedel',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	linux-arm-kernel

Hello,

On Friday, June 24, 2011 5:52 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > Introduce new alloc/free/mmap methods that take attributes argument.
> > alloc/free_coherent can be implemented on top of the new alloc/free
> > calls with NULL attributes. dma_alloc_non_coherent can be implemented
> > using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> > use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> > get more generic, platform independent way of allocating dma memory
> > buffers with specific parameters.
> >
> > One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> > such attribute will not have valid kernel virtual address. They might be
> > usefull for drivers that only exports the DMA buffers to userspace (like
> > for example V4L2 or ALSA).
> >
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> >
> > TODO: update all dma_map_ops clients for all architectures
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Yes, I think that is good, but the change needs to be done atomically
> across all architectures.

Yes, I'm aware of this and I will include such changes in the next version
of my patches.

> This should be easy enough as I believe
> all other architectures that use dma_map_ops don't even require
> dma_alloc_noncoherent but just define it to dma_alloc_coherent
> because they have only coherent memory in regular device drivers.

Right, this should be quite simple. I will also add DMA_ATTR_NON_COHERENT
attribute for implementing dma_alloc_noncoherent() call.

> On a related note, do you plan to make the CMA work use this
> transparently, or do you want to have a DMA_ATTR_LARGE or
> DMA_ATTR_CONTIGUOUS for CMA?

IMHO it will be better to hide the CMA from the drivers. Memory allocated
from CMA doesn't really differ from the one allocated by dma_alloc_coherent()
(which internally use alloc_pages()), so I really see no reason for adding
additional attribute for it.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 12:23       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 12:23 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

Hello,

On Friday, June 24, 2011 5:52 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > Introduce new alloc/free/mmap methods that take attributes argument.
> > alloc/free_coherent can be implemented on top of the new alloc/free
> > calls with NULL attributes. dma_alloc_non_coherent can be implemented
> > using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> > use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> > get more generic, platform independent way of allocating dma memory
> > buffers with specific parameters.
> >
> > One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> > such attribute will not have valid kernel virtual address. They might be
> > usefull for drivers that only exports the DMA buffers to userspace (like
> > for example V4L2 or ALSA).
> >
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> >
> > TODO: update all dma_map_ops clients for all architectures
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Yes, I think that is good, but the change needs to be done atomically
> across all architectures.

Yes, I'm aware of this and I will include such changes in the next version
of my patches.

> This should be easy enough as I believe
> all other architectures that use dma_map_ops don't even require
> dma_alloc_noncoherent but just define it to dma_alloc_coherent
> because they have only coherent memory in regular device drivers.

Right, this should be quite simple. I will also add DMA_ATTR_NON_COHERENT
attribute for implementing dma_alloc_noncoherent() call.

> On a related note, do you plan to make the CMA work use this
> transparently, or do you want to have a DMA_ATTR_LARGE or
> DMA_ATTR_CONTIGUOUS for CMA?

IMHO it will be better to hide the CMA from the drivers. Memory allocated
from CMA doesn't really differ from the one allocated by dma_alloc_coherent()
(which internally use alloc_pages()), so I really see no reason for adding
additional attribute for it.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 12:23       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 12:23 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

Hello,

On Friday, June 24, 2011 5:52 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > Introduce new alloc/free/mmap methods that take attributes argument.
> > alloc/free_coherent can be implemented on top of the new alloc/free
> > calls with NULL attributes. dma_alloc_non_coherent can be implemented
> > using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> > use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> > get more generic, platform independent way of allocating dma memory
> > buffers with specific parameters.
> >
> > One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> > such attribute will not have valid kernel virtual address. They might be
> > usefull for drivers that only exports the DMA buffers to userspace (like
> > for example V4L2 or ALSA).
> >
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> >
> > TODO: update all dma_map_ops clients for all architectures
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Yes, I think that is good, but the change needs to be done atomically
> across all architectures.

Yes, I'm aware of this and I will include such changes in the next version
of my patches.

> This should be easy enough as I believe
> all other architectures that use dma_map_ops don't even require
> dma_alloc_noncoherent but just define it to dma_alloc_coherent
> because they have only coherent memory in regular device drivers.

Right, this should be quite simple. I will also add DMA_ATTR_NON_COHERENT
attribute for implementing dma_alloc_noncoherent() call.

> On a related note, do you plan to make the CMA work use this
> transparently, or do you want to have a DMA_ATTR_LARGE or
> DMA_ATTR_CONTIGUOUS for CMA?

IMHO it will be better to hide the CMA from the drivers. Memory allocated
from CMA doesn't really differ from the one allocated by dma_alloc_coherent()
(which internally use alloc_pages()), so I really see no reason for adding
additional attribute for it.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 12:23       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, June 24, 2011 5:52 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > Introduce new alloc/free/mmap methods that take attributes argument.
> > alloc/free_coherent can be implemented on top of the new alloc/free
> > calls with NULL attributes. dma_alloc_non_coherent can be implemented
> > using DMA_ATTR_NONCOHERENT attribute, dma_alloc_writecombine can also
> > use separate DMA_ATTR_WRITECOMBINE attribute. This way the drivers will
> > get more generic, platform independent way of allocating dma memory
> > buffers with specific parameters.
> >
> > One more attribute can be usefull: DMA_ATTR_NOKERNELVADDR. Buffers with
> > such attribute will not have valid kernel virtual address. They might be
> > usefull for drivers that only exports the DMA buffers to userspace (like
> > for example V4L2 or ALSA).
> >
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> >
> > TODO: update all dma_map_ops clients for all architectures
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> Yes, I think that is good, but the change needs to be done atomically
> across all architectures.

Yes, I'm aware of this and I will include such changes in the next version
of my patches.

> This should be easy enough as I believe
> all other architectures that use dma_map_ops don't even require
> dma_alloc_noncoherent but just define it to dma_alloc_coherent
> because they have only coherent memory in regular device drivers.

Right, this should be quite simple. I will also add DMA_ATTR_NON_COHERENT
attribute for implementing dma_alloc_noncoherent() call.

> On a related note, do you plan to make the CMA work use this
> transparently, or do you want to have a DMA_ATTR_LARGE or
> DMA_ATTR_CONTIGUOUS for CMA?

IMHO it will be better to hide the CMA from the drivers. Memory allocated
from CMA doesn't really differ from the one allocated by dma_alloc_coherent()
(which internally use alloc_pages()), so I really see no reason for adding
additional attribute for it.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-27 13:19         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 13:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux',
	Lennert Buytenhek

On Monday 27 June 2011, Marek Szyprowski wrote:

> On Friday, June 24, 2011 5:37 PM Arnd Bergmann wrote:
> 
> > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > This patch modifies dma-mapping implementation on ARM architecture to
> > > use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> > > helpers.
> > >
> > > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> > 
> > This is a good idea in general, but I have a few concerns about details:
> > 
> > First of all, should we only allow using dma_map_ops on ARM, or do we
> > also want to support a case where these are all inlined as before?
> 
> I really wonder if it is possible to have a clean implementation of 
> dma_map_ops based and linear inlined dma-mapping framework together.
> Theoretically it should be possible, but it will end with a lot of
> #ifdef hackery which is really hard to follow and understand for 
> anyone but the authors.

Right. It's probably not worth unless there is a significant overhead
in terms of code size or performance in the coherent linear case.

> > I suppose for the majority of the cases, the overhead of the indirect
> > function call is near-zero, compared to the overhead of the cache
> > management operation, so it would only make a difference for coherent
> > systems without an IOMMU. Do we care about micro-optimizing those?
> 
> Even in coherent case, the overhead caused by additional function call
> should have really negligible impact on drivers performance.

What about object code size? I guess since ixp23xx is the only platform
that announces itself as coherent, we probably don't need to worry about
it too much either. Lennert?

On everything else, we only replace a direct functin call with an
indirect one.

>  > > diff --git a/arch/arm/include/asm/dma-mapping.h
> > b/arch/arm/include/asm/dma-mapping.h
> > > index 799669d..f4e4968 100644
> > > --- a/arch/arm/include/asm/dma-mapping.h
> > > +++ b/arch/arm/include/asm/dma-mapping.h
> > > @@ -10,6 +10,27 @@
> > >  #include <asm-generic/dma-coherent.h>
> > >  #include <asm/memory.h>
> > >
> > > +extern struct dma_map_ops dma_ops;
> > > +
> > > +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> > > +{
> > > +	if (dev->archdata.dma_ops)
> > > +		return dev->archdata.dma_ops;
> > > +	return &dma_ops;
> > > +}
> > 
> > I would not name the global structure just 'dma_ops', the identifier could
> > too easily conflict with a local variable in some driver. How about
> > arm_dma_ops or linear_dma_ops instead?
> 
> I'm fine with both of them. Even arm_linear_dma_ops make some sense.

Ok, just pick one then if nobody has a strong opinion either way.

> > You should not need to make these symbols visible in the header file any
> > more unless they are used outside of the main file later.
> 
> They are used by the dma bounce code once converted to dma_map_ops framework.

Ok, I see.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-27 13:19         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 13:19 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux',
	Lennert Buytenhek

On Monday 27 June 2011, Marek Szyprowski wrote:

> On Friday, June 24, 2011 5:37 PM Arnd Bergmann wrote:
> 
> > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > This patch modifies dma-mapping implementation on ARM architecture to
> > > use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> > > helpers.
> > >
> > > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> > 
> > This is a good idea in general, but I have a few concerns about details:
> > 
> > First of all, should we only allow using dma_map_ops on ARM, or do we
> > also want to support a case where these are all inlined as before?
> 
> I really wonder if it is possible to have a clean implementation of 
> dma_map_ops based and linear inlined dma-mapping framework together.
> Theoretically it should be possible, but it will end with a lot of
> #ifdef hackery which is really hard to follow and understand for 
> anyone but the authors.

Right. It's probably not worth unless there is a significant overhead
in terms of code size or performance in the coherent linear case.

> > I suppose for the majority of the cases, the overhead of the indirect
> > function call is near-zero, compared to the overhead of the cache
> > management operation, so it would only make a difference for coherent
> > systems without an IOMMU. Do we care about micro-optimizing those?
> 
> Even in coherent case, the overhead caused by additional function call
> should have really negligible impact on drivers performance.

What about object code size? I guess since ixp23xx is the only platform
that announces itself as coherent, we probably don't need to worry about
it too much either. Lennert?

On everything else, we only replace a direct functin call with an
indirect one.

>  > > diff --git a/arch/arm/include/asm/dma-mapping.h
> > b/arch/arm/include/asm/dma-mapping.h
> > > index 799669d..f4e4968 100644
> > > --- a/arch/arm/include/asm/dma-mapping.h
> > > +++ b/arch/arm/include/asm/dma-mapping.h
> > > @@ -10,6 +10,27 @@
> > >  #include <asm-generic/dma-coherent.h>
> > >  #include <asm/memory.h>
> > >
> > > +extern struct dma_map_ops dma_ops;
> > > +
> > > +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> > > +{
> > > +	if (dev->archdata.dma_ops)
> > > +		return dev->archdata.dma_ops;
> > > +	return &dma_ops;
> > > +}
> > 
> > I would not name the global structure just 'dma_ops', the identifier could
> > too easily conflict with a local variable in some driver. How about
> > arm_dma_ops or linear_dma_ops instead?
> 
> I'm fine with both of them. Even arm_linear_dma_ops make some sense.

Ok, just pick one then if nobody has a strong opinion either way.

> > You should not need to make these symbols visible in the header file any
> > more unless they are used outside of the main file later.
> 
> They are used by the dma bounce code once converted to dma_map_ops framework.

Ok, I see.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-06-27 13:19         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 27 June 2011, Marek Szyprowski wrote:

> On Friday, June 24, 2011 5:37 PM Arnd Bergmann wrote:
> 
> > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > This patch modifies dma-mapping implementation on ARM architecture to
> > > use common dma_map_ops structure and asm-generic/dma-mapping-common.h
> > > helpers.
> > >
> > > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> > 
> > This is a good idea in general, but I have a few concerns about details:
> > 
> > First of all, should we only allow using dma_map_ops on ARM, or do we
> > also want to support a case where these are all inlined as before?
> 
> I really wonder if it is possible to have a clean implementation of 
> dma_map_ops based and linear inlined dma-mapping framework together.
> Theoretically it should be possible, but it will end with a lot of
> #ifdef hackery which is really hard to follow and understand for 
> anyone but the authors.

Right. It's probably not worth unless there is a significant overhead
in terms of code size or performance in the coherent linear case.

> > I suppose for the majority of the cases, the overhead of the indirect
> > function call is near-zero, compared to the overhead of the cache
> > management operation, so it would only make a difference for coherent
> > systems without an IOMMU. Do we care about micro-optimizing those?
> 
> Even in coherent case, the overhead caused by additional function call
> should have really negligible impact on drivers performance.

What about object code size? I guess since ixp23xx is the only platform
that announces itself as coherent, we probably don't need to worry about
it too much either. Lennert?

On everything else, we only replace a direct functin call with an
indirect one.

>  > > diff --git a/arch/arm/include/asm/dma-mapping.h
> > b/arch/arm/include/asm/dma-mapping.h
> > > index 799669d..f4e4968 100644
> > > --- a/arch/arm/include/asm/dma-mapping.h
> > > +++ b/arch/arm/include/asm/dma-mapping.h
> > > @@ -10,6 +10,27 @@
> > >  #include <asm-generic/dma-coherent.h>
> > >  #include <asm/memory.h>
> > >
> > > +extern struct dma_map_ops dma_ops;
> > > +
> > > +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> > > +{
> > > +	if (dev->archdata.dma_ops)
> > > +		return dev->archdata.dma_ops;
> > > +	return &dma_ops;
> > > +}
> > 
> > I would not name the global structure just 'dma_ops', the identifier could
> > too easily conflict with a local variable in some driver. How about
> > arm_dma_ops or linear_dma_ops instead?
> 
> I'm fine with both of them. Even arm_linear_dma_ops make some sense.

Ok, just pick one then if nobody has a strong opinion either way.

> > You should not need to make these symbols visible in the header file any
> > more unless they are used outside of the main file later.
> 
> They are used by the dma bounce code once converted to dma_map_ops framework.

Ok, I see.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 13:22         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 13:22 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

On Monday 27 June 2011, Marek Szyprowski wrote:
> > On a related note, do you plan to make the CMA work use this
> > transparently, or do you want to have a DMA_ATTR_LARGE or
> > DMA_ATTR_CONTIGUOUS for CMA?
> 
> IMHO it will be better to hide the CMA from the drivers. Memory allocated
> from CMA doesn't really differ from the one allocated by dma_alloc_coherent()
> (which internally use alloc_pages()), so I really see no reason for adding
> additional attribute for it.

Ok, fair enough. On a semi-related topic, IIRC we still need to make sure
that dma_alloc_coherent() pages are unmapped from the linear mapping. I hope
this is independent of both CMA and this patch.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 13:22         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 13:22 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

On Monday 27 June 2011, Marek Szyprowski wrote:
> > On a related note, do you plan to make the CMA work use this
> > transparently, or do you want to have a DMA_ATTR_LARGE or
> > DMA_ATTR_CONTIGUOUS for CMA?
> 
> IMHO it will be better to hide the CMA from the drivers. Memory allocated
> from CMA doesn't really differ from the one allocated by dma_alloc_coherent()
> (which internally use alloc_pages()), so I really see no reason for adding
> additional attribute for it.

Ok, fair enough. On a semi-related topic, IIRC we still need to make sure
that dma_alloc_coherent() pages are unmapped from the linear mapping. I hope
this is independent of both CMA and this patch.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 13:22         ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 27 June 2011, Marek Szyprowski wrote:
> > On a related note, do you plan to make the CMA work use this
> > transparently, or do you want to have a DMA_ATTR_LARGE or
> > DMA_ATTR_CONTIGUOUS for CMA?
> 
> IMHO it will be better to hide the CMA from the drivers. Memory allocated
> from CMA doesn't really differ from the one allocated by dma_alloc_coherent()
> (which internally use alloc_pages()), so I really see no reason for adding
> additional attribute for it.

Ok, fair enough. On a semi-related topic, IIRC we still need to make sure
that dma_alloc_coherent() pages are unmapped from the linear mapping. I hope
this is independent of both CMA and this patch.

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 13:30           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 13:30 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

Hello,

On Monday, June 27, 2011 3:22 PM Arnd Bergmann wrote:

> On Monday 27 June 2011, Marek Szyprowski wrote:
> > > On a related note, do you plan to make the CMA work use this
> > > transparently, or do you want to have a DMA_ATTR_LARGE or
> > > DMA_ATTR_CONTIGUOUS for CMA?
> >
> > IMHO it will be better to hide the CMA from the drivers. Memory allocated
> > from CMA doesn't really differ from the one allocated by
> dma_alloc_coherent()
> > (which internally use alloc_pages()), so I really see no reason for
> adding
> > additional attribute for it.
> 
> Ok, fair enough. On a semi-related topic, IIRC we still need to make sure
> that dma_alloc_coherent() pages are unmapped from the linear mapping. I
> hope
> this is independent of both CMA and this patch.

Right, that's one more big item that is still on the TODO list.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 13:30           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 13:30 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

Hello,

On Monday, June 27, 2011 3:22 PM Arnd Bergmann wrote:

> On Monday 27 June 2011, Marek Szyprowski wrote:
> > > On a related note, do you plan to make the CMA work use this
> > > transparently, or do you want to have a DMA_ATTR_LARGE or
> > > DMA_ATTR_CONTIGUOUS for CMA?
> >
> > IMHO it will be better to hide the CMA from the drivers. Memory allocated
> > from CMA doesn't really differ from the one allocated by
> dma_alloc_coherent()
> > (which internally use alloc_pages()), so I really see no reason for
> adding
> > additional attribute for it.
> 
> Ok, fair enough. On a semi-related topic, IIRC we still need to make sure
> that dma_alloc_coherent() pages are unmapped from the linear mapping. I
> hope
> this is independent of both CMA and this patch.

Right, that's one more big item that is still on the TODO list.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 13:30           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 27, 2011 3:22 PM Arnd Bergmann wrote:

> On Monday 27 June 2011, Marek Szyprowski wrote:
> > > On a related note, do you plan to make the CMA work use this
> > > transparently, or do you want to have a DMA_ATTR_LARGE or
> > > DMA_ATTR_CONTIGUOUS for CMA?
> >
> > IMHO it will be better to hide the CMA from the drivers. Memory allocated
> > from CMA doesn't really differ from the one allocated by
> dma_alloc_coherent()
> > (which internally use alloc_pages()), so I really see no reason for
> adding
> > additional attribute for it.
> 
> Ok, fair enough. On a semi-related topic, IIRC we still need to make sure
> that dma_alloc_coherent() pages are unmapped from the linear mapping. I
> hope
> this is independent of both CMA and this patch.

Right, that's one more big item that is still on the TODO list.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-27 14:20           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:20 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

Hello,

On Friday, June 24, 2011 5:47 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:
> >
> > > On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > > > This patch removes dma bounce hooks from the common dma mapping
> > > > implementation on ARM architecture and creates a separate set of
> > > > dma_map_ops for dma bounce devices.
> > >
> > > Why all this additional indirection for no gain?
> >
> > I've did it to really separate dmabounce code and let it be completely
> > independent of particular internal functions of the main generic dma-
> mapping
> > code.
> >
> > dmabounce is just one of possible dma-mapping implementation and it is
> really
> > convenient to have it closed into common interface (dma_map_ops) rather
> than
> > having it spread around and hardcoded behind some #ifdefs in generic ARM
> > dma-mapping.
> >
> > There will be also other dma-mapping implementations in the future - I
> > thinking mainly of some iommu capable versions.
> >
> > In terms of speed I really doubt that these changes have any impact on
> the
> > system performance, but they significantly improves the code readability
> > (see next patch with cleanup of dma-mapping.c).
> 
> Yes. I believe the main effect of splitting out dmabounce into its own
> set of operations is improved readability for people that are not
> familiar with the existing code (which excludes Russell ;-) ), by
> separating the two codepaths and losing various #ifdef.
> 
> The simplification becomes more obvious when you look at patch 6, which
> removes a lot of the code that becomes redundant after this one.

This separation might also help in future with code consolidation across
different architectures. It was suggested that ARM DMA bounce code has a lot
in common with SWIOTBL (if I'm right) dma-mapping implementation.

The separation will also help in integrating the IOMMU based dma-mapping,
which will probably share again some code with linear dma-mapping code.
Having these 3 implementations mixed together might not help in code
readability.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-27 14:20           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:20 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

Hello,

On Friday, June 24, 2011 5:47 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:
> >
> > > On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > > > This patch removes dma bounce hooks from the common dma mapping
> > > > implementation on ARM architecture and creates a separate set of
> > > > dma_map_ops for dma bounce devices.
> > >
> > > Why all this additional indirection for no gain?
> >
> > I've did it to really separate dmabounce code and let it be completely
> > independent of particular internal functions of the main generic dma-
> mapping
> > code.
> >
> > dmabounce is just one of possible dma-mapping implementation and it is
> really
> > convenient to have it closed into common interface (dma_map_ops) rather
> than
> > having it spread around and hardcoded behind some #ifdefs in generic ARM
> > dma-mapping.
> >
> > There will be also other dma-mapping implementations in the future - I
> > thinking mainly of some iommu capable versions.
> >
> > In terms of speed I really doubt that these changes have any impact on
> the
> > system performance, but they significantly improves the code readability
> > (see next patch with cleanup of dma-mapping.c).
> 
> Yes. I believe the main effect of splitting out dmabounce into its own
> set of operations is improved readability for people that are not
> familiar with the existing code (which excludes Russell ;-) ), by
> separating the two codepaths and losing various #ifdef.
> 
> The simplification becomes more obvious when you look at patch 6, which
> removes a lot of the code that becomes redundant after this one.

This separation might also help in future with code consolidation across
different architectures. It was suggested that ARM DMA bounce code has a lot
in common with SWIOTBL (if I'm right) dma-mapping implementation.

The separation will also help in integrating the IOMMU based dma-mapping,
which will probably share again some code with linear dma-mapping code.
Having these 3 implementations mixed together might not help in code
readability.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure
@ 2011-06-27 14:20           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, June 24, 2011 5:47 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > On Monday, June 20, 2011 4:43 PM Russell King - ARM Linux wrote:
> >
> > > On Mon, Jun 20, 2011 at 09:50:10AM +0200, Marek Szyprowski wrote:
> > > > This patch removes dma bounce hooks from the common dma mapping
> > > > implementation on ARM architecture and creates a separate set of
> > > > dma_map_ops for dma bounce devices.
> > >
> > > Why all this additional indirection for no gain?
> >
> > I've did it to really separate dmabounce code and let it be completely
> > independent of particular internal functions of the main generic dma-
> mapping
> > code.
> >
> > dmabounce is just one of possible dma-mapping implementation and it is
> really
> > convenient to have it closed into common interface (dma_map_ops) rather
> than
> > having it spread around and hardcoded behind some #ifdefs in generic ARM
> > dma-mapping.
> >
> > There will be also other dma-mapping implementations in the future - I
> > thinking mainly of some iommu capable versions.
> >
> > In terms of speed I really doubt that these changes have any impact on
> the
> > system performance, but they significantly improves the code readability
> > (see next patch with cleanup of dma-mapping.c).
> 
> Yes. I believe the main effect of splitting out dmabounce into its own
> set of operations is improved readability for people that are not
> familiar with the existing code (which excludes Russell ;-) ), by
> separating the two codepaths and losing various #ifdef.
> 
> The simplification becomes more obvious when you look at patch 6, which
> removes a lot of the code that becomes redundant after this one.

This separation might also help in future with code consolidation across
different architectures. It was suggested that ARM DMA bounce code has a lot
in common with SWIOTBL (if I'm right) dma-mapping implementation.

The separation will also help in integrating the IOMMU based dma-mapping,
which will probably share again some code with linear dma-mapping code.
Having these 3 implementations mixed together might not help in code
readability.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
  2011-06-24 15:24         ` Arnd Bergmann
  (?)
@ 2011-06-27 14:29           ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:29 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	Marek Szyprowski

Hello,

On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > > This also breaks dmabounce when used with a highmem-enabled system -
> > > dmabounce refuses the dma_map_page() API but allows the
> dma_map_single()
> > > API.
> >
> > I really not sure how this change will break dma bounce code.
> >
> > Does it mean that it is allowed to call dma_map_single() on kmapped
> > HIGH_MEM page?
> 
> dma_map_single on a kmapped page already doesn't work, the argument needs
> to be inside of the linear mapping in order for virt_to_page to work.

Then I got really confused.

Documentation/DMA-mapping.txt says that dma_map_single() can be used only
with kernel linear mapping, while dma_map_page() can be also called on 
HIGHMEM pages.

Now, lets go to arch/arm/common/dmabounce.c code:

dma_addr_t __dma_map_page(struct device *dev, struct page *page,
                unsigned long offset, size_t size, enum dma_data_direction dir)
{
        dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
                __func__, page, offset, size, dir);

        BUG_ON(!valid_dma_direction(dir));

        if (PageHighMem(page)) {
                dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
                             "is not supported\n");
                return ~0;
        }

        return map_single(dev, page_address(page) + offset, size, dir);
}
EXPORT_SYMBOL(__dma_map_page);

Am I right that there is something mixed here? I really don't get why there is
high mem check in dma_map_page implementation. dma_map_single doesn't perform
such check and works with kmapped highmem pages...

Russell also pointed that my patch broke dma bounch with high mem enabled.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-27 14:29           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:29 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	Marek Szyprowski

Hello,

On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > > This also breaks dmabounce when used with a highmem-enabled system -
> > > dmabounce refuses the dma_map_page() API but allows the
> dma_map_single()
> > > API.
> >
> > I really not sure how this change will break dma bounce code.
> >
> > Does it mean that it is allowed to call dma_map_single() on kmapped
> > HIGH_MEM page?
> 
> dma_map_single on a kmapped page already doesn't work, the argument needs
> to be inside of the linear mapping in order for virt_to_page to work.

Then I got really confused.

Documentation/DMA-mapping.txt says that dma_map_single() can be used only
with kernel linear mapping, while dma_map_page() can be also called on 
HIGHMEM pages.

Now, lets go to arch/arm/common/dmabounce.c code:

dma_addr_t __dma_map_page(struct device *dev, struct page *page,
                unsigned long offset, size_t size, enum dma_data_direction dir)
{
        dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
                __func__, page, offset, size, dir);

        BUG_ON(!valid_dma_direction(dir));

        if (PageHighMem(page)) {
                dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
                             "is not supported\n");
                return ~0;
        }

        return map_single(dev, page_address(page) + offset, size, dir);
}
EXPORT_SYMBOL(__dma_map_page);

Am I right that there is something mixed here? I really don't get why there is
high mem check in dma_map_page implementation. dma_map_single doesn't perform
such check and works with kmapped highmem pages...

Russell also pointed that my patch broke dma bounch with high mem enabled.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-27 14:29           ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > > This also breaks dmabounce when used with a highmem-enabled system -
> > > dmabounce refuses the dma_map_page() API but allows the
> dma_map_single()
> > > API.
> >
> > I really not sure how this change will break dma bounce code.
> >
> > Does it mean that it is allowed to call dma_map_single() on kmapped
> > HIGH_MEM page?
> 
> dma_map_single on a kmapped page already doesn't work, the argument needs
> to be inside of the linear mapping in order for virt_to_page to work.

Then I got really confused.

Documentation/DMA-mapping.txt says that dma_map_single() can be used only
with kernel linear mapping, while dma_map_page() can be also called on 
HIGHMEM pages.

Now, lets go to arch/arm/common/dmabounce.c code:

dma_addr_t __dma_map_page(struct device *dev, struct page *page,
                unsigned long offset, size_t size, enum dma_data_direction dir)
{
        dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
                __func__, page, offset, size, dir);

        BUG_ON(!valid_dma_direction(dir));

        if (PageHighMem(page)) {
                dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
                             "is not supported\n");
                return ~0;
        }

        return map_single(dev, page_address(page) + offset, size, dir);
}
EXPORT_SYMBOL(__dma_map_page);

Am I right that there is something mixed here? I really don't get why there is
high mem check in dma_map_page implementation. dma_map_single doesn't perform
such check and works with kmapped highmem pages...

Russell also pointed that my patch broke dma bounch with high mem enabled.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
  2011-06-24 15:53     ` Arnd Bergmann
  (?)
@ 2011-06-27 14:41       ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:41 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

Hello,

On Friday, June 24, 2011 5:54 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> 
> One more thing: please split out the mmap change into a separate patch.

Ok, no problem.

> I sense that there might be some objections to that, and it's better
> to let people know about it early on than having them complain
> later when that has already been merged.

Ok, I will also prepare much more detailed description for mmap patch.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 14:41       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:41 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

Hello,

On Friday, June 24, 2011 5:54 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> 
> One more thing: please split out the mmap change into a separate patch.

Ok, no problem.

> I sense that there might be some objections to that, and it's better
> to let people know about it early on than having them complain
> later when that has already been merged.

Ok, I will also prepare much more detailed description for mmap patch.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs
@ 2011-06-27 14:41       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 14:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Friday, June 24, 2011 5:54 PM Arnd Bergmann wrote:

> On Monday 20 June 2011, Marek Szyprowski wrote:
> > mmap method is introduced to let the drivers create a user space mapping
> > for a DMA buffer in generic, architecture independent way.
> 
> One more thing: please split out the mmap change into a separate patch.

Ok, no problem.

> I sense that there might be some objections to that, and it's better
> to let people know about it early on than having them complain
> later when that has already been merged.

Ok, I will also prepare much more detailed description for mmap patch.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-27 14:53             ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 14:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

On Monday 27 June 2011, Marek Szyprowski wrote:
> On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:
> 
> > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > > This also breaks dmabounce when used with a highmem-enabled system -
> > > > dmabounce refuses the dma_map_page() API but allows the
> > dma_map_single()
> > > > API.
> > >
> > > I really not sure how this change will break dma bounce code.
> > >
> > > Does it mean that it is allowed to call dma_map_single() on kmapped
> > > HIGH_MEM page?
> > 
> > dma_map_single on a kmapped page already doesn't work, the argument needs
> > to be inside of the linear mapping in order for virt_to_page to work.
> 
> Then I got really confused.
> 
> Documentation/DMA-mapping.txt says that dma_map_single() can be used only
> with kernel linear mapping, while dma_map_page() can be also called on 
> HIGHMEM pages.

Right, this is true in general.

> Now, lets go to arch/arm/common/dmabounce.c code:
> 
> dma_addr_t __dma_map_page(struct device *dev, struct page *page,
>                 unsigned long offset, size_t size, enum dma_data_direction dir)
> {
>         dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
>                 __func__, page, offset, size, dir);
> 
>         BUG_ON(!valid_dma_direction(dir));
> 
>         if (PageHighMem(page)) {
>                 dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
>                              "is not supported\n");
>                 return ~0;
>         }
> 
>         return map_single(dev, page_address(page) + offset, size, dir);
> }
> EXPORT_SYMBOL(__dma_map_page);
>
> Am I right that there is something mixed here? I really don't get why there is
> high mem check in dma_map_page implementation. dma_map_single doesn't perform
> such check and works with kmapped highmem pages...
>
> Russell also pointed that my patch broke dma bounch with high mem enabled.

The version of __dma_map_page that you cited is the one used with dmabounce
enabled, when CONFIG_DMABOUNCE is disabled, the following version is used:

static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
             unsigned long offset, size_t size, enum dma_data_direction dir)
{
        __dma_page_cpu_to_dev(page, offset, size, dir);
        return pfn_to_dma(dev, page_to_pfn(page)) + offset;
}

This does not have the check, because the kernel does not need to touch
the kernel mapping in that case.

If you pass a kmapped page into dma_map_single, it should also not
work because of the BUG_ON in ___dma_single_cpu_to_dev -- it warns
you that you would end up flushing the cache for the wrong page (if any).

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-27 14:53             ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 14:53 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

On Monday 27 June 2011, Marek Szyprowski wrote:
> On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:
> 
> > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > > This also breaks dmabounce when used with a highmem-enabled system -
> > > > dmabounce refuses the dma_map_page() API but allows the
> > dma_map_single()
> > > > API.
> > >
> > > I really not sure how this change will break dma bounce code.
> > >
> > > Does it mean that it is allowed to call dma_map_single() on kmapped
> > > HIGH_MEM page?
> > 
> > dma_map_single on a kmapped page already doesn't work, the argument needs
> > to be inside of the linear mapping in order for virt_to_page to work.
> 
> Then I got really confused.
> 
> Documentation/DMA-mapping.txt says that dma_map_single() can be used only
> with kernel linear mapping, while dma_map_page() can be also called on 
> HIGHMEM pages.

Right, this is true in general.

> Now, lets go to arch/arm/common/dmabounce.c code:
> 
> dma_addr_t __dma_map_page(struct device *dev, struct page *page,
>                 unsigned long offset, size_t size, enum dma_data_direction dir)
> {
>         dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
>                 __func__, page, offset, size, dir);
> 
>         BUG_ON(!valid_dma_direction(dir));
> 
>         if (PageHighMem(page)) {
>                 dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
>                              "is not supported\n");
>                 return ~0;
>         }
> 
>         return map_single(dev, page_address(page) + offset, size, dir);
> }
> EXPORT_SYMBOL(__dma_map_page);
>
> Am I right that there is something mixed here? I really don't get why there is
> high mem check in dma_map_page implementation. dma_map_single doesn't perform
> such check and works with kmapped highmem pages...
>
> Russell also pointed that my patch broke dma bounch with high mem enabled.

The version of __dma_map_page that you cited is the one used with dmabounce
enabled, when CONFIG_DMABOUNCE is disabled, the following version is used:

static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
             unsigned long offset, size_t size, enum dma_data_direction dir)
{
        __dma_page_cpu_to_dev(page, offset, size, dir);
        return pfn_to_dma(dev, page_to_pfn(page)) + offset;
}

This does not have the check, because the kernel does not need to touch
the kernel mapping in that case.

If you pass a kmapped page into dma_map_single, it should also not
work because of the BUG_ON in ___dma_single_cpu_to_dev -- it warns
you that you would end up flushing the cache for the wrong page (if any).

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-27 14:53             ` Arnd Bergmann
  0 siblings, 0 replies; 201+ messages in thread
From: Arnd Bergmann @ 2011-06-27 14:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 27 June 2011, Marek Szyprowski wrote:
> On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:
> 
> > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > > This also breaks dmabounce when used with a highmem-enabled system -
> > > > dmabounce refuses the dma_map_page() API but allows the
> > dma_map_single()
> > > > API.
> > >
> > > I really not sure how this change will break dma bounce code.
> > >
> > > Does it mean that it is allowed to call dma_map_single() on kmapped
> > > HIGH_MEM page?
> > 
> > dma_map_single on a kmapped page already doesn't work, the argument needs
> > to be inside of the linear mapping in order for virt_to_page to work.
> 
> Then I got really confused.
> 
> Documentation/DMA-mapping.txt says that dma_map_single() can be used only
> with kernel linear mapping, while dma_map_page() can be also called on 
> HIGHMEM pages.

Right, this is true in general.

> Now, lets go to arch/arm/common/dmabounce.c code:
> 
> dma_addr_t __dma_map_page(struct device *dev, struct page *page,
>                 unsigned long offset, size_t size, enum dma_data_direction dir)
> {
>         dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
>                 __func__, page, offset, size, dir);
> 
>         BUG_ON(!valid_dma_direction(dir));
> 
>         if (PageHighMem(page)) {
>                 dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
>                              "is not supported\n");
>                 return ~0;
>         }
> 
>         return map_single(dev, page_address(page) + offset, size, dir);
> }
> EXPORT_SYMBOL(__dma_map_page);
>
> Am I right that there is something mixed here? I really don't get why there is
> high mem check in dma_map_page implementation. dma_map_single doesn't perform
> such check and works with kmapped highmem pages...
>
> Russell also pointed that my patch broke dma bounch with high mem enabled.

The version of __dma_map_page that you cited is the one used with dmabounce
enabled, when CONFIG_DMABOUNCE is disabled, the following version is used:

static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page,
             unsigned long offset, size_t size, enum dma_data_direction dir)
{
        __dma_page_cpu_to_dev(page, offset, size, dir);
        return pfn_to_dma(dev, page_to_pfn(page)) + offset;
}

This does not have the check, because the kernel does not need to touch
the kernel mapping in that case.

If you pass a kmapped page into dma_map_single, it should also not
work because of the BUG_ON in ___dma_single_cpu_to_dev -- it warns
you that you would end up flushing the cache for the wrong page (if any).

	Arnd

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
  2011-06-27 14:53             ` Arnd Bergmann
  (?)
@ 2011-06-27 15:06               ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 15:06 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

Hello,

On Monday, June 27, 2011 4:54 PM Arnd Bergmann wrote:

> On Monday 27 June 2011, Marek Szyprowski wrote:
> > On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:
> >
> > > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > > > This also breaks dmabounce when used with a highmem-enabled system
> -
> > > > > dmabounce refuses the dma_map_page() API but allows the
> > > dma_map_single()
> > > > > API.
> > > >
> > > > I really not sure how this change will break dma bounce code.
> > > >
> > > > Does it mean that it is allowed to call dma_map_single() on kmapped
> > > > HIGH_MEM page?
> > >
> > > dma_map_single on a kmapped page already doesn't work, the argument
> needs
> > > to be inside of the linear mapping in order for virt_to_page to work.
> >
> > Then I got really confused.
> >
> > Documentation/DMA-mapping.txt says that dma_map_single() can be used only
> > with kernel linear mapping, while dma_map_page() can be also called on
> > HIGHMEM pages.
> 
> Right, this is true in general.

Ok, so I see no reasons for not implementing dma_map_single() on top of 
dma_map_page() like it has been done in asm-generic/dma-mapping-common.h
 
> > Now, lets go to arch/arm/common/dmabounce.c code:
> >
> > dma_addr_t __dma_map_page(struct device *dev, struct page *page,
> >                 unsigned long offset, size_t size, enum
> dma_data_direction dir)
> > {
> >         dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
> >                 __func__, page, offset, size, dir);
> >
> >         BUG_ON(!valid_dma_direction(dir));
> >
> >         if (PageHighMem(page)) {
> >                 dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
> >                              "is not supported\n");
> >                 return ~0;
> >         }
> >
> >         return map_single(dev, page_address(page) + offset, size, dir);
> > }
> > EXPORT_SYMBOL(__dma_map_page);
> >
> > Am I right that there is something mixed here? I really don't get why
> there is
> > high mem check in dma_map_page implementation. dma_map_single doesn't
> perform
> > such check and works with kmapped highmem pages...
> >
> > Russell also pointed that my patch broke dma bounch with high mem enabled.
> 
> The version of __dma_map_page that you cited is the one used with dmabounce
> enabled, when CONFIG_DMABOUNCE is disabled, the following version is used:
> 
> static inline dma_addr_t __dma_map_page(struct device *dev, struct page
> *page,
>              unsigned long offset, size_t size, enum dma_data_direction
> dir)
> {
>         __dma_page_cpu_to_dev(page, offset, size, dir);
>         return pfn_to_dma(dev, page_to_pfn(page)) + offset;
> }
> 
> This does not have the check, because the kernel does not need to touch
> the kernel mapping in that case.
> 
> If you pass a kmapped page into dma_map_single, it should also not
> work because of the BUG_ON in ___dma_single_cpu_to_dev -- it warns
> you that you would end up flushing the cache for the wrong page (if any).

Yes, I know that the flow is different when dma bounce is not used. Non-dma
bounce version will still work correctly after my patch. 

However I still don't get how my patch broke dma bounce code with HIGHMEM,
what has been pointed by Russell...

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-27 15:06               ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 15:06 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: 'Russell King - ARM Linux',
	linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Joerg Roedel'

Hello,

On Monday, June 27, 2011 4:54 PM Arnd Bergmann wrote:

> On Monday 27 June 2011, Marek Szyprowski wrote:
> > On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:
> >
> > > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > > > This also breaks dmabounce when used with a highmem-enabled system
> -
> > > > > dmabounce refuses the dma_map_page() API but allows the
> > > dma_map_single()
> > > > > API.
> > > >
> > > > I really not sure how this change will break dma bounce code.
> > > >
> > > > Does it mean that it is allowed to call dma_map_single() on kmapped
> > > > HIGH_MEM page?
> > >
> > > dma_map_single on a kmapped page already doesn't work, the argument
> needs
> > > to be inside of the linear mapping in order for virt_to_page to work.
> >
> > Then I got really confused.
> >
> > Documentation/DMA-mapping.txt says that dma_map_single() can be used only
> > with kernel linear mapping, while dma_map_page() can be also called on
> > HIGHMEM pages.
> 
> Right, this is true in general.

Ok, so I see no reasons for not implementing dma_map_single() on top of 
dma_map_page() like it has been done in asm-generic/dma-mapping-common.h
 
> > Now, lets go to arch/arm/common/dmabounce.c code:
> >
> > dma_addr_t __dma_map_page(struct device *dev, struct page *page,
> >                 unsigned long offset, size_t size, enum
> dma_data_direction dir)
> > {
> >         dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
> >                 __func__, page, offset, size, dir);
> >
> >         BUG_ON(!valid_dma_direction(dir));
> >
> >         if (PageHighMem(page)) {
> >                 dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
> >                              "is not supported\n");
> >                 return ~0;
> >         }
> >
> >         return map_single(dev, page_address(page) + offset, size, dir);
> > }
> > EXPORT_SYMBOL(__dma_map_page);
> >
> > Am I right that there is something mixed here? I really don't get why
> there is
> > high mem check in dma_map_page implementation. dma_map_single doesn't
> perform
> > such check and works with kmapped highmem pages...
> >
> > Russell also pointed that my patch broke dma bounch with high mem enabled.
> 
> The version of __dma_map_page that you cited is the one used with dmabounce
> enabled, when CONFIG_DMABOUNCE is disabled, the following version is used:
> 
> static inline dma_addr_t __dma_map_page(struct device *dev, struct page
> *page,
>              unsigned long offset, size_t size, enum dma_data_direction
> dir)
> {
>         __dma_page_cpu_to_dev(page, offset, size, dir);
>         return pfn_to_dma(dev, page_to_pfn(page)) + offset;
> }
> 
> This does not have the check, because the kernel does not need to touch
> the kernel mapping in that case.
> 
> If you pass a kmapped page into dma_map_single, it should also not
> work because of the BUG_ON in ___dma_single_cpu_to_dev -- it warns
> you that you would end up flushing the cache for the wrong page (if any).

Yes, I know that the flow is different when dma bounce is not used. Non-dma
bounce version will still work correctly after my patch. 

However I still don't get how my patch broke dma bounce code with HIGHMEM,
what has been pointed by Russell...

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page
@ 2011-06-27 15:06               ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-06-27 15:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Monday, June 27, 2011 4:54 PM Arnd Bergmann wrote:

> On Monday 27 June 2011, Marek Szyprowski wrote:
> > On Friday, June 24, 2011 5:24 PM Arnd Bergmann wrote:
> >
> > > On Monday 20 June 2011, Marek Szyprowski wrote:
> > > > > This also breaks dmabounce when used with a highmem-enabled system
> -
> > > > > dmabounce refuses the dma_map_page() API but allows the
> > > dma_map_single()
> > > > > API.
> > > >
> > > > I really not sure how this change will break dma bounce code.
> > > >
> > > > Does it mean that it is allowed to call dma_map_single() on kmapped
> > > > HIGH_MEM page?
> > >
> > > dma_map_single on a kmapped page already doesn't work, the argument
> needs
> > > to be inside of the linear mapping in order for virt_to_page to work.
> >
> > Then I got really confused.
> >
> > Documentation/DMA-mapping.txt says that dma_map_single() can be used only
> > with kernel linear mapping, while dma_map_page() can be also called on
> > HIGHMEM pages.
> 
> Right, this is true in general.

Ok, so I see no reasons for not implementing dma_map_single() on top of 
dma_map_page() like it has been done in asm-generic/dma-mapping-common.h
 
> > Now, lets go to arch/arm/common/dmabounce.c code:
> >
> > dma_addr_t __dma_map_page(struct device *dev, struct page *page,
> >                 unsigned long offset, size_t size, enum
> dma_data_direction dir)
> > {
> >         dev_dbg(dev, "%s(page=%p,off=%#lx,size=%zx,dir=%x)\n",
> >                 __func__, page, offset, size, dir);
> >
> >         BUG_ON(!valid_dma_direction(dir));
> >
> >         if (PageHighMem(page)) {
> >                 dev_err(dev, "DMA buffer bouncing of HIGHMEM pages "
> >                              "is not supported\n");
> >                 return ~0;
> >         }
> >
> >         return map_single(dev, page_address(page) + offset, size, dir);
> > }
> > EXPORT_SYMBOL(__dma_map_page);
> >
> > Am I right that there is something mixed here? I really don't get why
> there is
> > high mem check in dma_map_page implementation. dma_map_single doesn't
> perform
> > such check and works with kmapped highmem pages...
> >
> > Russell also pointed that my patch broke dma bounch with high mem enabled.
> 
> The version of __dma_map_page that you cited is the one used with dmabounce
> enabled, when CONFIG_DMABOUNCE is disabled, the following version is used:
> 
> static inline dma_addr_t __dma_map_page(struct device *dev, struct page
> *page,
>              unsigned long offset, size_t size, enum dma_data_direction
> dir)
> {
>         __dma_page_cpu_to_dev(page, offset, size, dir);
>         return pfn_to_dma(dev, page_to_pfn(page)) + offset;
> }
> 
> This does not have the check, because the kernel does not need to touch
> the kernel mapping in that case.
> 
> If you pass a kmapped page into dma_map_single, it should also not
> work because of the BUG_ON in ___dma_single_cpu_to_dev -- it warns
> you that you would end up flushing the cache for the wrong page (if any).

Yes, I know that the flow is different when dma bounce is not used. Non-dma
bounce version will still work correctly after my patch. 

However I still don't get how my patch broke dma bounce code with HIGHMEM,
what has been pointed by Russell...

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-07-03 15:28     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-03 15:28 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel

On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
> This patch removes the need for offset parameter in dma bounce
> functions. This is required to let dma-mapping framework on ARM
> architecture use common, generic dma-mapping helpers.

I really don't like this.  Really as in hate.  Why?  I've said in the past
that the whole idea of getting rid of the sub-range functions is idiotic.

If you have to deal with cache coherence, what you _really_ want is an
API which tells you the size of the original buffer and the section of
that buffer which you want to handle - because the edges of the buffer
need special handling.

Lets say that you have a buffer which is 256 bytes long, misaligned to
half a cache line.  Let's first look at the sequence for whole-buffer:

1. You map it for DMA from the device.  This means you writeback the
   first and last cache lines to perserve any data shared in the
   overlapping cache line.  The remainder you can just invalidate.

2. You want to access the buffer, so you use the sync_for_cpu function.
   If your CPU doesn't do any speculative prefetching, then you don't
   need to do anything.  If you do, you have to invalidate the buffer,
   but you must preserve the overlapping cache lines which again must
   be written back.

3. You transfer ownership back to the device using sync_for_device.
   As you may have caused cache lines to be read in, again you need to
   invalidate, and the overlapping cache lines must be written back.

Now, if you ask for a sub-section of the buffer to be sync'd, you can
actually eliminate those writebacks which are potentially troublesome,
and which could corrupt neighbouring data.

If you get rid of the sub-buffer functions and start using the whole
buffer functions for that purpose, you no longer know whether the
partial cache lines are part of the buffer or not, so you have to write
those back every time too.

So far, we haven't had any reports of corruption of this type (maybe
folk using the sync functions are rare on ARM - thankfully) but getting
rid of the range sync functions means that solving this becomes a lot
more difficult because we've lost the information to make the decision.

So I've always believed - and continue to do so - that those who want
to get rid of the range sync functions are misguided and are storing up
problems for the future.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-07-03 15:28     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-03 15:28 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	Kyungmin Park, Arnd Bergmann, Joerg Roedel

On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
> This patch removes the need for offset parameter in dma bounce
> functions. This is required to let dma-mapping framework on ARM
> architecture use common, generic dma-mapping helpers.

I really don't like this.  Really as in hate.  Why?  I've said in the past
that the whole idea of getting rid of the sub-range functions is idiotic.

If you have to deal with cache coherence, what you _really_ want is an
API which tells you the size of the original buffer and the section of
that buffer which you want to handle - because the edges of the buffer
need special handling.

Lets say that you have a buffer which is 256 bytes long, misaligned to
half a cache line.  Let's first look at the sequence for whole-buffer:

1. You map it for DMA from the device.  This means you writeback the
   first and last cache lines to perserve any data shared in the
   overlapping cache line.  The remainder you can just invalidate.

2. You want to access the buffer, so you use the sync_for_cpu function.
   If your CPU doesn't do any speculative prefetching, then you don't
   need to do anything.  If you do, you have to invalidate the buffer,
   but you must preserve the overlapping cache lines which again must
   be written back.

3. You transfer ownership back to the device using sync_for_device.
   As you may have caused cache lines to be read in, again you need to
   invalidate, and the overlapping cache lines must be written back.

Now, if you ask for a sub-section of the buffer to be sync'd, you can
actually eliminate those writebacks which are potentially troublesome,
and which could corrupt neighbouring data.

If you get rid of the sub-buffer functions and start using the whole
buffer functions for that purpose, you no longer know whether the
partial cache lines are part of the buffer or not, so you have to write
those back every time too.

So far, we haven't had any reports of corruption of this type (maybe
folk using the sync functions are rare on ARM - thankfully) but getting
rid of the range sync functions means that solving this becomes a lot
more difficult because we've lost the information to make the decision.

So I've always believed - and continue to do so - that those who want
to get rid of the range sync functions are misguided and are storing up
problems for the future.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-07-03 15:28     ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-03 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
> This patch removes the need for offset parameter in dma bounce
> functions. This is required to let dma-mapping framework on ARM
> architecture use common, generic dma-mapping helpers.

I really don't like this.  Really as in hate.  Why?  I've said in the past
that the whole idea of getting rid of the sub-range functions is idiotic.

If you have to deal with cache coherence, what you _really_ want is an
API which tells you the size of the original buffer and the section of
that buffer which you want to handle - because the edges of the buffer
need special handling.

Lets say that you have a buffer which is 256 bytes long, misaligned to
half a cache line.  Let's first look at the sequence for whole-buffer:

1. You map it for DMA from the device.  This means you writeback the
   first and last cache lines to perserve any data shared in the
   overlapping cache line.  The remainder you can just invalidate.

2. You want to access the buffer, so you use the sync_for_cpu function.
   If your CPU doesn't do any speculative prefetching, then you don't
   need to do anything.  If you do, you have to invalidate the buffer,
   but you must preserve the overlapping cache lines which again must
   be written back.

3. You transfer ownership back to the device using sync_for_device.
   As you may have caused cache lines to be read in, again you need to
   invalidate, and the overlapping cache lines must be written back.

Now, if you ask for a sub-section of the buffer to be sync'd, you can
actually eliminate those writebacks which are potentially troublesome,
and which could corrupt neighbouring data.

If you get rid of the sub-buffer functions and start using the whole
buffer functions for that purpose, you no longer know whether the
partial cache lines are part of the buffer or not, so you have to write
those back every time too.

So far, we haven't had any reports of corruption of this type (maybe
folk using the sync functions are rare on ARM - thankfully) but getting
rid of the range sync functions means that solving this becomes a lot
more difficult because we've lost the information to make the decision.

So I've always believed - and continue to do so - that those who want
to get rid of the range sync functions are misguided and are storing up
problems for the future.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-07 12:09           ` Lennert Buytenhek
  0 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-07 12:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-arm-kernel, linaro-mm-sig, linux-mm,
	linux-arch, 'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

On Mon, Jun 27, 2011 at 03:19:43PM +0200, Arnd Bergmann wrote:

> > > I suppose for the majority of the cases, the overhead of the indirect
> > > function call is near-zero, compared to the overhead of the cache
> > > management operation, so it would only make a difference for coherent
> > > systems without an IOMMU. Do we care about micro-optimizing those?

FWIW, when I was hacking on ARM access point routing performance some
time ago, turning the L1/L2 cache maintenance operations into inline
functions (inlined into the ethernet driver) gave me a significant and
measurable performance boost.

Such things can remain product-specific hacks, though.


> > Even in coherent case, the overhead caused by additional function call
> > should have really negligible impact on drivers performance.
> 
> What about object code size? I guess since ixp23xx is the only platform
> that announces itself as coherent, we probably don't need to worry about
> it too much either. Lennert?

I don't think so.  ixp23xx isn't a very popular platform anymore either,
having been discontinued some time ago.


thanks,
Lennert

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-07 12:09           ` Lennert Buytenhek
  0 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-07 12:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-arm-kernel, linaro-mm-sig, linux-mm,
	linux-arch, 'Kyungmin Park', 'Joerg Roedel',
	'Russell King - ARM Linux'

On Mon, Jun 27, 2011 at 03:19:43PM +0200, Arnd Bergmann wrote:

> > > I suppose for the majority of the cases, the overhead of the indirect
> > > function call is near-zero, compared to the overhead of the cache
> > > management operation, so it would only make a difference for coherent
> > > systems without an IOMMU. Do we care about micro-optimizing those?

FWIW, when I was hacking on ARM access point routing performance some
time ago, turning the L1/L2 cache maintenance operations into inline
functions (inlined into the ethernet driver) gave me a significant and
measurable performance boost.

Such things can remain product-specific hacks, though.


> > Even in coherent case, the overhead caused by additional function call
> > should have really negligible impact on drivers performance.
> 
> What about object code size? I guess since ixp23xx is the only platform
> that announces itself as coherent, we probably don't need to worry about
> it too much either. Lennert?

I don't think so.  ixp23xx isn't a very popular platform anymore either,
having been discontinued some time ago.


thanks,
Lennert

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-07 12:09           ` Lennert Buytenhek
  0 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-07 12:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 27, 2011 at 03:19:43PM +0200, Arnd Bergmann wrote:

> > > I suppose for the majority of the cases, the overhead of the indirect
> > > function call is near-zero, compared to the overhead of the cache
> > > management operation, so it would only make a difference for coherent
> > > systems without an IOMMU. Do we care about micro-optimizing those?

FWIW, when I was hacking on ARM access point routing performance some
time ago, turning the L1/L2 cache maintenance operations into inline
functions (inlined into the ethernet driver) gave me a significant and
measurable performance boost.

Such things can remain product-specific hacks, though.


> > Even in coherent case, the overhead caused by additional function call
> > should have really negligible impact on drivers performance.
> 
> What about object code size? I guess since ixp23xx is the only platform
> that announces itself as coherent, we probably don't need to worry about
> it too much either. Lennert?

I don't think so.  ixp23xx isn't a very popular platform anymore either,
having been discontinued some time ago.


thanks,
Lennert

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-07 12:38             ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-07 12:38 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: Arnd Bergmann, Marek Szyprowski, linux-arm-kernel, linaro-mm-sig,
	linux-mm, linux-arch, 'Kyungmin Park',
	'Joerg Roedel'

On Thu, Jul 07, 2011 at 02:09:18PM +0200, Lennert Buytenhek wrote:
> On Mon, Jun 27, 2011 at 03:19:43PM +0200, Arnd Bergmann wrote:
> 
> > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > function call is near-zero, compared to the overhead of the cache
> > > > management operation, so it would only make a difference for coherent
> > > > systems without an IOMMU. Do we care about micro-optimizing those?
> 
> FWIW, when I was hacking on ARM access point routing performance some
> time ago, turning the L1/L2 cache maintenance operations into inline
> functions (inlined into the ethernet driver) gave me a significant and
> measurable performance boost.

On what architecture?  Can you show what you did to gain that?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-07 12:38             ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-07 12:38 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: Arnd Bergmann, Marek Szyprowski, linux-arm-kernel, linaro-mm-sig,
	linux-mm, linux-arch, 'Kyungmin Park',
	'Joerg Roedel'

On Thu, Jul 07, 2011 at 02:09:18PM +0200, Lennert Buytenhek wrote:
> On Mon, Jun 27, 2011 at 03:19:43PM +0200, Arnd Bergmann wrote:
> 
> > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > function call is near-zero, compared to the overhead of the cache
> > > > management operation, so it would only make a difference for coherent
> > > > systems without an IOMMU. Do we care about micro-optimizing those?
> 
> FWIW, when I was hacking on ARM access point routing performance some
> time ago, turning the L1/L2 cache maintenance operations into inline
> functions (inlined into the ethernet driver) gave me a significant and
> measurable performance boost.

On what architecture?  Can you show what you did to gain that?

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-07 12:38             ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-07 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 07, 2011 at 02:09:18PM +0200, Lennert Buytenhek wrote:
> On Mon, Jun 27, 2011 at 03:19:43PM +0200, Arnd Bergmann wrote:
> 
> > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > function call is near-zero, compared to the overhead of the cache
> > > > management operation, so it would only make a difference for coherent
> > > > systems without an IOMMU. Do we care about micro-optimizing those?
> 
> FWIW, when I was hacking on ARM access point routing performance some
> time ago, turning the L1/L2 cache maintenance operations into inline
> functions (inlined into the ethernet driver) gave me a significant and
> measurable performance boost.

On what architecture?  Can you show what you did to gain that?

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
  2011-07-07 12:38             ` Russell King - ARM Linux
  (?)
@ 2011-07-15  0:10               ` Lennert Buytenhek
  -1 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-15  0:10 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Marek Szyprowski, linux-arm-kernel, linaro-mm-sig,
	linux-mm, linux-arch, 'Kyungmin Park',
	'Joerg Roedel'

On Thu, Jul 07, 2011 at 01:38:25PM +0100, Russell King - ARM Linux wrote:

> > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > function call is near-zero, compared to the overhead of the cache
> > > > > management operation, so it would only make a difference for coherent
> > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > 
> > FWIW, when I was hacking on ARM access point routing performance some
> > time ago, turning the L1/L2 cache maintenance operations into inline
> > functions (inlined into the ethernet driver) gave me a significant and
> > measurable performance boost.
> 
> On what architecture?  Can you show what you did to gain that?

Patch is attached below.  It's an ugly product-specific hack, not
suitable for upstreaming in this form, etc etc, but IIRC it gave me
a ~5% improvement on packet routing.



From 4e9ab8b1e5fd3a5d7abb3253b653a2990b377f97 Mon Sep 17 00:00:00 2001
From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Thu, 9 Apr 2009 02:28:54 +0200
Subject: [PATCH] Inline dma_cache_maint()

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
---
 arch/arm/include/asm/cacheflush.h  |  174 ++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/dma-mapping.h |   24 +++++-
 arch/arm/mm/Kconfig                |    1 -
 arch/arm/mm/cache-feroceon-l2.c    |   10 ++-
 arch/arm/mm/dma-mapping.c          |   35 -------
 5 files changed, 205 insertions(+), 39 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 6cbd8fd..7cc28eb 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -228,9 +228,105 @@ extern struct cpu_cache_fns cpu_cache;
  * is visible to DMA, or data written by DMA to system memory is
  * visible to the CPU.
  */
+#if defined(CONFIG_ARCH_KIRKWOOD) || defined(CONFIG_ARCH_MV78XX0)
+#define CACHE_LINE_SIZE		32
+
+static inline void l1d_flush_mva(unsigned long addr)
+{
+	__asm__("mcr p15, 0, %0, c7, c14, 1" : : "r" (addr));
+}
+
+static inline void l1d_inv_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c14, 0\n\t"
+		"mcr p15, 5, %1, c15, c14, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l1d_clean_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c13, 0\n\t"
+		"mcr p15, 5, %1, c15, c13, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l1d_flush_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c15, 0\n\t"
+		"mcr p15, 5, %1, c15, c15, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void dmac_inv_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	/*
+	 * Clean and invalidate partial first cache line.
+	 */
+	if (start & (CACHE_LINE_SIZE - 1)) {
+		l1d_flush_mva(start & ~(CACHE_LINE_SIZE - 1));
+		start = (start | (CACHE_LINE_SIZE - 1)) + 1;
+	}
+
+	/*
+	 * Clean and invalidate partial last cache line.
+	 */
+	if (start < end && end & (CACHE_LINE_SIZE - 1)) {
+		l1d_flush_mva(end & ~(CACHE_LINE_SIZE - 1));
+		end &= ~(CACHE_LINE_SIZE - 1);
+	}
+
+	/*
+	 * Invalidate all full cache lines between 'start' and 'end'.
+	 */
+	if (start < end)
+		l1d_inv_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void dmac_clean_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	l1d_clean_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void dmac_flush_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	l1d_flush_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+#else
 #define dmac_inv_range			cpu_cache.dma_inv_range
 #define dmac_clean_range		cpu_cache.dma_clean_range
 #define dmac_flush_range		cpu_cache.dma_flush_range
+#endif
 
 #else
 
@@ -286,12 +382,90 @@ static inline void outer_flush_range(unsigned long start, unsigned long end)
 
 #else
 
+#if defined(CONFIG_ARCH_KIRKWOOD) || defined(CONFIG_ARCH_MV78XX0)
+static inline void l2_clean_pa_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 1, %0, c15, c9, 4\n\t"
+		"mcr p15, 1, %1, c15, c9, 5"
+		: : "r" (__phys_to_virt(start)), "r" (__phys_to_virt(end)));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l2_clean_inv_pa(unsigned long addr)
+{
+	__asm__("mcr p15, 1, %0, c15, c10, 3" : : "r" (addr));
+}
+
+static inline void l2_inv_pa_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 1, %0, c15, c11, 4\n\t"
+		"mcr p15, 1, %1, c15, c11, 5"
+		: : "r" (__phys_to_virt(start)), "r" (__phys_to_virt(end)));
+	raw_local_irq_restore(flags);
+}
+
+static inline void outer_inv_range(unsigned long start, unsigned long end)
+{
+	/*
+	 * Clean and invalidate partial first cache line.
+	 */
+	if (start & (CACHE_LINE_SIZE - 1)) {
+		l2_clean_inv_pa(start & ~(CACHE_LINE_SIZE - 1));
+		start = (start | (CACHE_LINE_SIZE - 1)) + 1;
+	}
+
+	/*
+	 * Clean and invalidate partial last cache line.
+	 */
+	if (start < end && end & (CACHE_LINE_SIZE - 1)) {
+		l2_clean_inv_pa(end & ~(CACHE_LINE_SIZE - 1));
+		end &= ~(CACHE_LINE_SIZE - 1);
+	}
+
+	/*
+	 * Invalidate all full cache lines between 'start' and 'end'.
+	 */
+	if (start < end)
+		l2_inv_pa_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void outer_clean_range(unsigned long start, unsigned long end)
+{
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	if (start != end)
+		l2_clean_pa_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void outer_flush_range(unsigned long start, unsigned long end)
+{
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	if (start != end) {
+		l2_clean_pa_range(start, end - CACHE_LINE_SIZE);
+		l2_inv_pa_range(start, end - CACHE_LINE_SIZE);
+	}
+
+	dsb();
+}
+#else
 static inline void outer_inv_range(unsigned long start, unsigned long end)
 { }
 static inline void outer_clean_range(unsigned long start, unsigned long end)
 { }
 static inline void outer_flush_range(unsigned long start, unsigned long end)
 { }
+#endif
 
 #endif
 
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 22cb14e..10b517c 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -6,6 +6,7 @@
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
 
+#include <asm/cacheflush.h>
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
 
@@ -56,7 +57,28 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
  * platforms with CONFIG_DMABOUNCE.
  * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
  */
-extern void dma_cache_maint(const void *kaddr, size_t size, int rw);
+static inline void
+dma_cache_maint(const void *start, size_t size, int direction)
+{
+//	BUG_ON(!virt_addr_valid(start) || !virt_addr_valid(end - 1));
+
+	switch (direction) {
+	case DMA_FROM_DEVICE:		/* invalidate only */
+		dmac_inv_range(start, start + size);
+		outer_inv_range(__pa(start), __pa(start) + size);
+		break;
+	case DMA_TO_DEVICE:		/* writeback only */
+		dmac_clean_range(start, start + size);
+		outer_clean_range(__pa(start), __pa(start) + size);
+		break;
+	case DMA_BIDIRECTIONAL:		/* writeback and invalidate */
+		dmac_flush_range(start, start + size);
+		outer_flush_range(__pa(start), __pa(start) + size);
+		break;
+//	default:
+//		BUG();
+	}
+}
 
 /*
  * Return whether the given device DMA address mask can be supported
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index d490f37..3e4c526 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -690,7 +690,6 @@ config CACHE_FEROCEON_L2
 	bool "Enable the Feroceon L2 cache controller"
 	depends on ARCH_KIRKWOOD || ARCH_MV78XX0
 	default y
-	select OUTER_CACHE
 	help
 	  This option enables the Feroceon L2 cache controller.
 
diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c
index 355c2a1..f84e34f 100644
--- a/arch/arm/mm/cache-feroceon-l2.c
+++ b/arch/arm/mm/cache-feroceon-l2.c
@@ -17,6 +17,9 @@
 #include <plat/cache-feroceon-l2.h>
 
 
+static int l2_wt_override;
+
+#if 0
 /*
  * Low-level cache maintenance operations.
  *
@@ -94,12 +97,14 @@ static inline void l2_inv_pa_range(unsigned long start, unsigned long end)
 {
 	l2_inv_mva_range(__phys_to_virt(start), __phys_to_virt(end));
 }
+#endif
 
 static inline void l2_inv_all(void)
 {
 	__asm__("mcr p15, 1, %0, c15, c11, 0" : : "r" (0));
 }
 
+#if 0
 /*
  * Linux primitives.
  *
@@ -110,8 +115,6 @@ static inline void l2_inv_all(void)
 #define CACHE_LINE_SIZE		32
 #define MAX_RANGE_SIZE		1024
 
-static int l2_wt_override;
-
 static unsigned long calc_range_end(unsigned long start, unsigned long end)
 {
 	unsigned long range_end;
@@ -204,6 +207,7 @@ static void feroceon_l2_flush_range(unsigned long start, unsigned long end)
 
 	dsb();
 }
+#endif
 
 
 /*
@@ -318,9 +322,11 @@ void __init feroceon_l2_init(int __l2_wt_override)
 
 	disable_l2_prefetch();
 
+#if 0
 	outer_cache.inv_range = feroceon_l2_inv_range;
 	outer_cache.clean_range = feroceon_l2_clean_range;
 	outer_cache.flush_range = feroceon_l2_flush_range;
+#endif
 
 	enable_l2();
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f1ef561..d866150 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -482,41 +482,6 @@ static int __init consistent_init(void)
 
 core_initcall(consistent_init);
 
-/*
- * Make an area consistent for devices.
- * Note: Drivers should NOT use this function directly, as it will break
- * platforms with CONFIG_DMABOUNCE.
- * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
- */
-void dma_cache_maint(const void *start, size_t size, int direction)
-{
-	void (*inner_op)(const void *, const void *);
-	void (*outer_op)(unsigned long, unsigned long);
-
-	BUG_ON(!virt_addr_valid(start) || !virt_addr_valid(start + size - 1));
-
-	switch (direction) {
-	case DMA_FROM_DEVICE:		/* invalidate only */
-		inner_op = dmac_inv_range;
-		outer_op = outer_inv_range;
-		break;
-	case DMA_TO_DEVICE:		/* writeback only */
-		inner_op = dmac_clean_range;
-		outer_op = outer_clean_range;
-		break;
-	case DMA_BIDIRECTIONAL:		/* writeback and invalidate */
-		inner_op = dmac_flush_range;
-		outer_op = outer_flush_range;
-		break;
-	default:
-		BUG();
-	}
-
-	inner_op(start, start + size);
-	outer_op(__pa(start), __pa(start) + size);
-}
-EXPORT_SYMBOL(dma_cache_maint);
-
 /**
  * dma_map_sg - map a set of SG buffers for streaming mode DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-15  0:10               ` Lennert Buytenhek
  0 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-15  0:10 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Marek Szyprowski, linux-arm-kernel, linaro-mm-sig,
	linux-mm, linux-arch, 'Kyungmin Park',
	'Joerg Roedel'

On Thu, Jul 07, 2011 at 01:38:25PM +0100, Russell King - ARM Linux wrote:

> > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > function call is near-zero, compared to the overhead of the cache
> > > > > management operation, so it would only make a difference for coherent
> > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > 
> > FWIW, when I was hacking on ARM access point routing performance some
> > time ago, turning the L1/L2 cache maintenance operations into inline
> > functions (inlined into the ethernet driver) gave me a significant and
> > measurable performance boost.
> 
> On what architecture?  Can you show what you did to gain that?

Patch is attached below.  It's an ugly product-specific hack, not
suitable for upstreaming in this form, etc etc, but IIRC it gave me
a ~5% improvement on packet routing.



>From 4e9ab8b1e5fd3a5d7abb3253b653a2990b377f97 Mon Sep 17 00:00:00 2001
From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Thu, 9 Apr 2009 02:28:54 +0200
Subject: [PATCH] Inline dma_cache_maint()

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
---
 arch/arm/include/asm/cacheflush.h  |  174 ++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/dma-mapping.h |   24 +++++-
 arch/arm/mm/Kconfig                |    1 -
 arch/arm/mm/cache-feroceon-l2.c    |   10 ++-
 arch/arm/mm/dma-mapping.c          |   35 -------
 5 files changed, 205 insertions(+), 39 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 6cbd8fd..7cc28eb 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -228,9 +228,105 @@ extern struct cpu_cache_fns cpu_cache;
  * is visible to DMA, or data written by DMA to system memory is
  * visible to the CPU.
  */
+#if defined(CONFIG_ARCH_KIRKWOOD) || defined(CONFIG_ARCH_MV78XX0)
+#define CACHE_LINE_SIZE		32
+
+static inline void l1d_flush_mva(unsigned long addr)
+{
+	__asm__("mcr p15, 0, %0, c7, c14, 1" : : "r" (addr));
+}
+
+static inline void l1d_inv_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c14, 0\n\t"
+		"mcr p15, 5, %1, c15, c14, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l1d_clean_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c13, 0\n\t"
+		"mcr p15, 5, %1, c15, c13, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l1d_flush_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c15, 0\n\t"
+		"mcr p15, 5, %1, c15, c15, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void dmac_inv_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	/*
+	 * Clean and invalidate partial first cache line.
+	 */
+	if (start & (CACHE_LINE_SIZE - 1)) {
+		l1d_flush_mva(start & ~(CACHE_LINE_SIZE - 1));
+		start = (start | (CACHE_LINE_SIZE - 1)) + 1;
+	}
+
+	/*
+	 * Clean and invalidate partial last cache line.
+	 */
+	if (start < end && end & (CACHE_LINE_SIZE - 1)) {
+		l1d_flush_mva(end & ~(CACHE_LINE_SIZE - 1));
+		end &= ~(CACHE_LINE_SIZE - 1);
+	}
+
+	/*
+	 * Invalidate all full cache lines between 'start' and 'end'.
+	 */
+	if (start < end)
+		l1d_inv_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void dmac_clean_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	l1d_clean_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void dmac_flush_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	l1d_flush_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+#else
 #define dmac_inv_range			cpu_cache.dma_inv_range
 #define dmac_clean_range		cpu_cache.dma_clean_range
 #define dmac_flush_range		cpu_cache.dma_flush_range
+#endif
 
 #else
 
@@ -286,12 +382,90 @@ static inline void outer_flush_range(unsigned long start, unsigned long end)
 
 #else
 
+#if defined(CONFIG_ARCH_KIRKWOOD) || defined(CONFIG_ARCH_MV78XX0)
+static inline void l2_clean_pa_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 1, %0, c15, c9, 4\n\t"
+		"mcr p15, 1, %1, c15, c9, 5"
+		: : "r" (__phys_to_virt(start)), "r" (__phys_to_virt(end)));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l2_clean_inv_pa(unsigned long addr)
+{
+	__asm__("mcr p15, 1, %0, c15, c10, 3" : : "r" (addr));
+}
+
+static inline void l2_inv_pa_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 1, %0, c15, c11, 4\n\t"
+		"mcr p15, 1, %1, c15, c11, 5"
+		: : "r" (__phys_to_virt(start)), "r" (__phys_to_virt(end)));
+	raw_local_irq_restore(flags);
+}
+
+static inline void outer_inv_range(unsigned long start, unsigned long end)
+{
+	/*
+	 * Clean and invalidate partial first cache line.
+	 */
+	if (start & (CACHE_LINE_SIZE - 1)) {
+		l2_clean_inv_pa(start & ~(CACHE_LINE_SIZE - 1));
+		start = (start | (CACHE_LINE_SIZE - 1)) + 1;
+	}
+
+	/*
+	 * Clean and invalidate partial last cache line.
+	 */
+	if (start < end && end & (CACHE_LINE_SIZE - 1)) {
+		l2_clean_inv_pa(end & ~(CACHE_LINE_SIZE - 1));
+		end &= ~(CACHE_LINE_SIZE - 1);
+	}
+
+	/*
+	 * Invalidate all full cache lines between 'start' and 'end'.
+	 */
+	if (start < end)
+		l2_inv_pa_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void outer_clean_range(unsigned long start, unsigned long end)
+{
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	if (start != end)
+		l2_clean_pa_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void outer_flush_range(unsigned long start, unsigned long end)
+{
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	if (start != end) {
+		l2_clean_pa_range(start, end - CACHE_LINE_SIZE);
+		l2_inv_pa_range(start, end - CACHE_LINE_SIZE);
+	}
+
+	dsb();
+}
+#else
 static inline void outer_inv_range(unsigned long start, unsigned long end)
 { }
 static inline void outer_clean_range(unsigned long start, unsigned long end)
 { }
 static inline void outer_flush_range(unsigned long start, unsigned long end)
 { }
+#endif
 
 #endif
 
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 22cb14e..10b517c 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -6,6 +6,7 @@
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
 
+#include <asm/cacheflush.h>
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
 
@@ -56,7 +57,28 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
  * platforms with CONFIG_DMABOUNCE.
  * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
  */
-extern void dma_cache_maint(const void *kaddr, size_t size, int rw);
+static inline void
+dma_cache_maint(const void *start, size_t size, int direction)
+{
+//	BUG_ON(!virt_addr_valid(start) || !virt_addr_valid(end - 1));
+
+	switch (direction) {
+	case DMA_FROM_DEVICE:		/* invalidate only */
+		dmac_inv_range(start, start + size);
+		outer_inv_range(__pa(start), __pa(start) + size);
+		break;
+	case DMA_TO_DEVICE:		/* writeback only */
+		dmac_clean_range(start, start + size);
+		outer_clean_range(__pa(start), __pa(start) + size);
+		break;
+	case DMA_BIDIRECTIONAL:		/* writeback and invalidate */
+		dmac_flush_range(start, start + size);
+		outer_flush_range(__pa(start), __pa(start) + size);
+		break;
+//	default:
+//		BUG();
+	}
+}
 
 /*
  * Return whether the given device DMA address mask can be supported
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index d490f37..3e4c526 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -690,7 +690,6 @@ config CACHE_FEROCEON_L2
 	bool "Enable the Feroceon L2 cache controller"
 	depends on ARCH_KIRKWOOD || ARCH_MV78XX0
 	default y
-	select OUTER_CACHE
 	help
 	  This option enables the Feroceon L2 cache controller.
 
diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c
index 355c2a1..f84e34f 100644
--- a/arch/arm/mm/cache-feroceon-l2.c
+++ b/arch/arm/mm/cache-feroceon-l2.c
@@ -17,6 +17,9 @@
 #include <plat/cache-feroceon-l2.h>
 
 
+static int l2_wt_override;
+
+#if 0
 /*
  * Low-level cache maintenance operations.
  *
@@ -94,12 +97,14 @@ static inline void l2_inv_pa_range(unsigned long start, unsigned long end)
 {
 	l2_inv_mva_range(__phys_to_virt(start), __phys_to_virt(end));
 }
+#endif
 
 static inline void l2_inv_all(void)
 {
 	__asm__("mcr p15, 1, %0, c15, c11, 0" : : "r" (0));
 }
 
+#if 0
 /*
  * Linux primitives.
  *
@@ -110,8 +115,6 @@ static inline void l2_inv_all(void)
 #define CACHE_LINE_SIZE		32
 #define MAX_RANGE_SIZE		1024
 
-static int l2_wt_override;
-
 static unsigned long calc_range_end(unsigned long start, unsigned long end)
 {
 	unsigned long range_end;
@@ -204,6 +207,7 @@ static void feroceon_l2_flush_range(unsigned long start, unsigned long end)
 
 	dsb();
 }
+#endif
 
 
 /*
@@ -318,9 +322,11 @@ void __init feroceon_l2_init(int __l2_wt_override)
 
 	disable_l2_prefetch();
 
+#if 0
 	outer_cache.inv_range = feroceon_l2_inv_range;
 	outer_cache.clean_range = feroceon_l2_clean_range;
 	outer_cache.flush_range = feroceon_l2_flush_range;
+#endif
 
 	enable_l2();
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f1ef561..d866150 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -482,41 +482,6 @@ static int __init consistent_init(void)
 
 core_initcall(consistent_init);
 
-/*
- * Make an area consistent for devices.
- * Note: Drivers should NOT use this function directly, as it will break
- * platforms with CONFIG_DMABOUNCE.
- * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
- */
-void dma_cache_maint(const void *start, size_t size, int direction)
-{
-	void (*inner_op)(const void *, const void *);
-	void (*outer_op)(unsigned long, unsigned long);
-
-	BUG_ON(!virt_addr_valid(start) || !virt_addr_valid(start + size - 1));
-
-	switch (direction) {
-	case DMA_FROM_DEVICE:		/* invalidate only */
-		inner_op = dmac_inv_range;
-		outer_op = outer_inv_range;
-		break;
-	case DMA_TO_DEVICE:		/* writeback only */
-		inner_op = dmac_clean_range;
-		outer_op = outer_clean_range;
-		break;
-	case DMA_BIDIRECTIONAL:		/* writeback and invalidate */
-		inner_op = dmac_flush_range;
-		outer_op = outer_flush_range;
-		break;
-	default:
-		BUG();
-	}
-
-	inner_op(start, start + size);
-	outer_op(__pa(start), __pa(start) + size);
-}
-EXPORT_SYMBOL(dma_cache_maint);
-
 /**
  * dma_map_sg - map a set of SG buffers for streaming mode DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-15  0:10               ` Lennert Buytenhek
  0 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-15  0:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 07, 2011 at 01:38:25PM +0100, Russell King - ARM Linux wrote:

> > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > function call is near-zero, compared to the overhead of the cache
> > > > > management operation, so it would only make a difference for coherent
> > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > 
> > FWIW, when I was hacking on ARM access point routing performance some
> > time ago, turning the L1/L2 cache maintenance operations into inline
> > functions (inlined into the ethernet driver) gave me a significant and
> > measurable performance boost.
> 
> On what architecture?  Can you show what you did to gain that?

Patch is attached below.  It's an ugly product-specific hack, not
suitable for upstreaming in this form, etc etc, but IIRC it gave me
a ~5% improvement on packet routing.



>From 4e9ab8b1e5fd3a5d7abb3253b653a2990b377f97 Mon Sep 17 00:00:00 2001
From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Thu, 9 Apr 2009 02:28:54 +0200
Subject: [PATCH] Inline dma_cache_maint()

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
---
 arch/arm/include/asm/cacheflush.h  |  174 ++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/dma-mapping.h |   24 +++++-
 arch/arm/mm/Kconfig                |    1 -
 arch/arm/mm/cache-feroceon-l2.c    |   10 ++-
 arch/arm/mm/dma-mapping.c          |   35 -------
 5 files changed, 205 insertions(+), 39 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 6cbd8fd..7cc28eb 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -228,9 +228,105 @@ extern struct cpu_cache_fns cpu_cache;
  * is visible to DMA, or data written by DMA to system memory is
  * visible to the CPU.
  */
+#if defined(CONFIG_ARCH_KIRKWOOD) || defined(CONFIG_ARCH_MV78XX0)
+#define CACHE_LINE_SIZE		32
+
+static inline void l1d_flush_mva(unsigned long addr)
+{
+	__asm__("mcr p15, 0, %0, c7, c14, 1" : : "r" (addr));
+}
+
+static inline void l1d_inv_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c14, 0\n\t"
+		"mcr p15, 5, %1, c15, c14, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l1d_clean_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c13, 0\n\t"
+		"mcr p15, 5, %1, c15, c13, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l1d_flush_mva_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 5, %0, c15, c15, 0\n\t"
+		"mcr p15, 5, %1, c15, c15, 1"
+		: : "r" (start), "r" (end));
+	raw_local_irq_restore(flags);
+}
+
+static inline void dmac_inv_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	/*
+	 * Clean and invalidate partial first cache line.
+	 */
+	if (start & (CACHE_LINE_SIZE - 1)) {
+		l1d_flush_mva(start & ~(CACHE_LINE_SIZE - 1));
+		start = (start | (CACHE_LINE_SIZE - 1)) + 1;
+	}
+
+	/*
+	 * Clean and invalidate partial last cache line.
+	 */
+	if (start < end && end & (CACHE_LINE_SIZE - 1)) {
+		l1d_flush_mva(end & ~(CACHE_LINE_SIZE - 1));
+		end &= ~(CACHE_LINE_SIZE - 1);
+	}
+
+	/*
+	 * Invalidate all full cache lines between 'start' and 'end'.
+	 */
+	if (start < end)
+		l1d_inv_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void dmac_clean_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	l1d_clean_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void dmac_flush_range(const void *_start, const void *_end)
+{
+	unsigned long start = (unsigned long)_start;
+	unsigned long end = (unsigned long)_end;
+
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	l1d_flush_mva_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+#else
 #define dmac_inv_range			cpu_cache.dma_inv_range
 #define dmac_clean_range		cpu_cache.dma_clean_range
 #define dmac_flush_range		cpu_cache.dma_flush_range
+#endif
 
 #else
 
@@ -286,12 +382,90 @@ static inline void outer_flush_range(unsigned long start, unsigned long end)
 
 #else
 
+#if defined(CONFIG_ARCH_KIRKWOOD) || defined(CONFIG_ARCH_MV78XX0)
+static inline void l2_clean_pa_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 1, %0, c15, c9, 4\n\t"
+		"mcr p15, 1, %1, c15, c9, 5"
+		: : "r" (__phys_to_virt(start)), "r" (__phys_to_virt(end)));
+	raw_local_irq_restore(flags);
+}
+
+static inline void l2_clean_inv_pa(unsigned long addr)
+{
+	__asm__("mcr p15, 1, %0, c15, c10, 3" : : "r" (addr));
+}
+
+static inline void l2_inv_pa_range(unsigned long start, unsigned long end)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__asm__("mcr p15, 1, %0, c15, c11, 4\n\t"
+		"mcr p15, 1, %1, c15, c11, 5"
+		: : "r" (__phys_to_virt(start)), "r" (__phys_to_virt(end)));
+	raw_local_irq_restore(flags);
+}
+
+static inline void outer_inv_range(unsigned long start, unsigned long end)
+{
+	/*
+	 * Clean and invalidate partial first cache line.
+	 */
+	if (start & (CACHE_LINE_SIZE - 1)) {
+		l2_clean_inv_pa(start & ~(CACHE_LINE_SIZE - 1));
+		start = (start | (CACHE_LINE_SIZE - 1)) + 1;
+	}
+
+	/*
+	 * Clean and invalidate partial last cache line.
+	 */
+	if (start < end && end & (CACHE_LINE_SIZE - 1)) {
+		l2_clean_inv_pa(end & ~(CACHE_LINE_SIZE - 1));
+		end &= ~(CACHE_LINE_SIZE - 1);
+	}
+
+	/*
+	 * Invalidate all full cache lines between 'start' and 'end'.
+	 */
+	if (start < end)
+		l2_inv_pa_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void outer_clean_range(unsigned long start, unsigned long end)
+{
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	if (start != end)
+		l2_clean_pa_range(start, end - CACHE_LINE_SIZE);
+
+	dsb();
+}
+
+static inline void outer_flush_range(unsigned long start, unsigned long end)
+{
+	start &= ~(CACHE_LINE_SIZE - 1);
+	end = (end + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1);
+	if (start != end) {
+		l2_clean_pa_range(start, end - CACHE_LINE_SIZE);
+		l2_inv_pa_range(start, end - CACHE_LINE_SIZE);
+	}
+
+	dsb();
+}
+#else
 static inline void outer_inv_range(unsigned long start, unsigned long end)
 { }
 static inline void outer_clean_range(unsigned long start, unsigned long end)
 { }
 static inline void outer_flush_range(unsigned long start, unsigned long end)
 { }
+#endif
 
 #endif
 
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 22cb14e..10b517c 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -6,6 +6,7 @@
 #include <linux/mm_types.h>
 #include <linux/scatterlist.h>
 
+#include <asm/cacheflush.h>
 #include <asm-generic/dma-coherent.h>
 #include <asm/memory.h>
 
@@ -56,7 +57,28 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
  * platforms with CONFIG_DMABOUNCE.
  * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
  */
-extern void dma_cache_maint(const void *kaddr, size_t size, int rw);
+static inline void
+dma_cache_maint(const void *start, size_t size, int direction)
+{
+//	BUG_ON(!virt_addr_valid(start) || !virt_addr_valid(end - 1));
+
+	switch (direction) {
+	case DMA_FROM_DEVICE:		/* invalidate only */
+		dmac_inv_range(start, start + size);
+		outer_inv_range(__pa(start), __pa(start) + size);
+		break;
+	case DMA_TO_DEVICE:		/* writeback only */
+		dmac_clean_range(start, start + size);
+		outer_clean_range(__pa(start), __pa(start) + size);
+		break;
+	case DMA_BIDIRECTIONAL:		/* writeback and invalidate */
+		dmac_flush_range(start, start + size);
+		outer_flush_range(__pa(start), __pa(start) + size);
+		break;
+//	default:
+//		BUG();
+	}
+}
 
 /*
  * Return whether the given device DMA address mask can be supported
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index d490f37..3e4c526 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -690,7 +690,6 @@ config CACHE_FEROCEON_L2
 	bool "Enable the Feroceon L2 cache controller"
 	depends on ARCH_KIRKWOOD || ARCH_MV78XX0
 	default y
-	select OUTER_CACHE
 	help
 	  This option enables the Feroceon L2 cache controller.
 
diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c
index 355c2a1..f84e34f 100644
--- a/arch/arm/mm/cache-feroceon-l2.c
+++ b/arch/arm/mm/cache-feroceon-l2.c
@@ -17,6 +17,9 @@
 #include <plat/cache-feroceon-l2.h>
 
 
+static int l2_wt_override;
+
+#if 0
 /*
  * Low-level cache maintenance operations.
  *
@@ -94,12 +97,14 @@ static inline void l2_inv_pa_range(unsigned long start, unsigned long end)
 {
 	l2_inv_mva_range(__phys_to_virt(start), __phys_to_virt(end));
 }
+#endif
 
 static inline void l2_inv_all(void)
 {
 	__asm__("mcr p15, 1, %0, c15, c11, 0" : : "r" (0));
 }
 
+#if 0
 /*
  * Linux primitives.
  *
@@ -110,8 +115,6 @@ static inline void l2_inv_all(void)
 #define CACHE_LINE_SIZE		32
 #define MAX_RANGE_SIZE		1024
 
-static int l2_wt_override;
-
 static unsigned long calc_range_end(unsigned long start, unsigned long end)
 {
 	unsigned long range_end;
@@ -204,6 +207,7 @@ static void feroceon_l2_flush_range(unsigned long start, unsigned long end)
 
 	dsb();
 }
+#endif
 
 
 /*
@@ -318,9 +322,11 @@ void __init feroceon_l2_init(int __l2_wt_override)
 
 	disable_l2_prefetch();
 
+#if 0
 	outer_cache.inv_range = feroceon_l2_inv_range;
 	outer_cache.clean_range = feroceon_l2_clean_range;
 	outer_cache.flush_range = feroceon_l2_flush_range;
+#endif
 
 	enable_l2();
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f1ef561..d866150 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -482,41 +482,6 @@ static int __init consistent_init(void)
 
 core_initcall(consistent_init);
 
-/*
- * Make an area consistent for devices.
- * Note: Drivers should NOT use this function directly, as it will break
- * platforms with CONFIG_DMABOUNCE.
- * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
- */
-void dma_cache_maint(const void *start, size_t size, int direction)
-{
-	void (*inner_op)(const void *, const void *);
-	void (*outer_op)(unsigned long, unsigned long);
-
-	BUG_ON(!virt_addr_valid(start) || !virt_addr_valid(start + size - 1));
-
-	switch (direction) {
-	case DMA_FROM_DEVICE:		/* invalidate only */
-		inner_op = dmac_inv_range;
-		outer_op = outer_inv_range;
-		break;
-	case DMA_TO_DEVICE:		/* writeback only */
-		inner_op = dmac_clean_range;
-		outer_op = outer_clean_range;
-		break;
-	case DMA_BIDIRECTIONAL:		/* writeback and invalidate */
-		inner_op = dmac_flush_range;
-		outer_op = outer_flush_range;
-		break;
-	default:
-		BUG();
-	}
-
-	inner_op(start, start + size);
-	outer_op(__pa(start), __pa(start) + size);
-}
-EXPORT_SYMBOL(dma_cache_maint);
-
 /**
  * dma_map_sg - map a set of SG buffers for streaming mode DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-15  9:27                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-15  9:27 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: Arnd Bergmann, Marek Szyprowski, linux-arm-kernel, linaro-mm-sig,
	linux-mm, linux-arch, 'Kyungmin Park',
	'Joerg Roedel'

On Fri, Jul 15, 2011 at 02:10:21AM +0200, Lennert Buytenhek wrote:
> On Thu, Jul 07, 2011 at 01:38:25PM +0100, Russell King - ARM Linux wrote:
> 
> > > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > > function call is near-zero, compared to the overhead of the cache
> > > > > > management operation, so it would only make a difference for coherent
> > > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > > 
> > > FWIW, when I was hacking on ARM access point routing performance some
> > > time ago, turning the L1/L2 cache maintenance operations into inline
> > > functions (inlined into the ethernet driver) gave me a significant and
> > > measurable performance boost.
> > 
> > On what architecture?  Can you show what you did to gain that?
> 
> Patch is attached below.  It's an ugly product-specific hack, not
> suitable for upstreaming in this form, etc etc, but IIRC it gave me
> a ~5% improvement on packet routing.

Do you know how much is contributed from each change - L1, L2, moving
dma_cache_maint() inline, removing the virt_addr_valid() etc?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-15  9:27                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-15  9:27 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: Arnd Bergmann, Marek Szyprowski, linux-arm-kernel, linaro-mm-sig,
	linux-mm, linux-arch, 'Kyungmin Park',
	'Joerg Roedel'

On Fri, Jul 15, 2011 at 02:10:21AM +0200, Lennert Buytenhek wrote:
> On Thu, Jul 07, 2011 at 01:38:25PM +0100, Russell King - ARM Linux wrote:
> 
> > > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > > function call is near-zero, compared to the overhead of the cache
> > > > > > management operation, so it would only make a difference for coherent
> > > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > > 
> > > FWIW, when I was hacking on ARM access point routing performance some
> > > time ago, turning the L1/L2 cache maintenance operations into inline
> > > functions (inlined into the ethernet driver) gave me a significant and
> > > measurable performance boost.
> > 
> > On what architecture?  Can you show what you did to gain that?
> 
> Patch is attached below.  It's an ugly product-specific hack, not
> suitable for upstreaming in this form, etc etc, but IIRC it gave me
> a ~5% improvement on packet routing.

Do you know how much is contributed from each change - L1, L2, moving
dma_cache_maint() inline, removing the virt_addr_valid() etc?

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-15  9:27                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 201+ messages in thread
From: Russell King - ARM Linux @ 2011-07-15  9:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 15, 2011 at 02:10:21AM +0200, Lennert Buytenhek wrote:
> On Thu, Jul 07, 2011 at 01:38:25PM +0100, Russell King - ARM Linux wrote:
> 
> > > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > > function call is near-zero, compared to the overhead of the cache
> > > > > > management operation, so it would only make a difference for coherent
> > > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > > 
> > > FWIW, when I was hacking on ARM access point routing performance some
> > > time ago, turning the L1/L2 cache maintenance operations into inline
> > > functions (inlined into the ethernet driver) gave me a significant and
> > > measurable performance boost.
> > 
> > On what architecture?  Can you show what you did to gain that?
> 
> Patch is attached below.  It's an ugly product-specific hack, not
> suitable for upstreaming in this form, etc etc, but IIRC it gave me
> a ~5% improvement on packet routing.

Do you know how much is contributed from each change - L1, L2, moving
dma_cache_maint() inline, removing the virt_addr_valid() etc?

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
  2011-07-15  9:27                 ` Russell King - ARM Linux
  (?)
@ 2011-07-15 21:53                   ` Lennert Buytenhek
  -1 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-15 21:53 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Marek Szyprowski, linux-arm-kernel, linaro-mm-sig,
	linux-mm, linux-arch, 'Kyungmin Park',
	'Joerg Roedel'

On Fri, Jul 15, 2011 at 10:27:17AM +0100, Russell King - ARM Linux wrote:

> > > > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > > > function call is near-zero, compared to the overhead of the cache
> > > > > > > management operation, so it would only make a difference for coherent
> > > > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > > > 
> > > > FWIW, when I was hacking on ARM access point routing performance some
> > > > time ago, turning the L1/L2 cache maintenance operations into inline
> > > > functions (inlined into the ethernet driver) gave me a significant and
> > > > measurable performance boost.
> > > 
> > > On what architecture?  Can you show what you did to gain that?
> > 
> > Patch is attached below.  It's an ugly product-specific hack, not
> > suitable for upstreaming in this form, etc etc, but IIRC it gave me
> > a ~5% improvement on packet routing.
> 
> Do you know how much is contributed from each change - L1, L2, moving
> dma_cache_maint() inline, removing the virt_addr_valid() etc?

Sorry, I'm not sure -- I never tested it to that granularity, and I
don't have access to the hardware anymore now.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-15 21:53                   ` Lennert Buytenhek
  0 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-15 21:53 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Marek Szyprowski, linux-arm-kernel, linaro-mm-sig,
	linux-mm, linux-arch, 'Kyungmin Park',
	'Joerg Roedel'

On Fri, Jul 15, 2011 at 10:27:17AM +0100, Russell King - ARM Linux wrote:

> > > > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > > > function call is near-zero, compared to the overhead of the cache
> > > > > > > management operation, so it would only make a difference for coherent
> > > > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > > > 
> > > > FWIW, when I was hacking on ARM access point routing performance some
> > > > time ago, turning the L1/L2 cache maintenance operations into inline
> > > > functions (inlined into the ethernet driver) gave me a significant and
> > > > measurable performance boost.
> > > 
> > > On what architecture?  Can you show what you did to gain that?
> > 
> > Patch is attached below.  It's an ugly product-specific hack, not
> > suitable for upstreaming in this form, etc etc, but IIRC it gave me
> > a ~5% improvement on packet routing.
> 
> Do you know how much is contributed from each change - L1, L2, moving
> dma_cache_maint() inline, removing the virt_addr_valid() etc?

Sorry, I'm not sure -- I never tested it to that granularity, and I
don't have access to the hardware anymore now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h
@ 2011-07-15 21:53                   ` Lennert Buytenhek
  0 siblings, 0 replies; 201+ messages in thread
From: Lennert Buytenhek @ 2011-07-15 21:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 15, 2011 at 10:27:17AM +0100, Russell King - ARM Linux wrote:

> > > > > > > I suppose for the majority of the cases, the overhead of the indirect
> > > > > > > function call is near-zero, compared to the overhead of the cache
> > > > > > > management operation, so it would only make a difference for coherent
> > > > > > > systems without an IOMMU. Do we care about micro-optimizing those?
> > > > 
> > > > FWIW, when I was hacking on ARM access point routing performance some
> > > > time ago, turning the L1/L2 cache maintenance operations into inline
> > > > functions (inlined into the ethernet driver) gave me a significant and
> > > > measurable performance boost.
> > > 
> > > On what architecture?  Can you show what you did to gain that?
> > 
> > Patch is attached below.  It's an ugly product-specific hack, not
> > suitable for upstreaming in this form, etc etc, but IIRC it gave me
> > a ~5% improvement on packet routing.
> 
> Do you know how much is contributed from each change - L1, L2, moving
> dma_cache_maint() inline, removing the virt_addr_valid() etc?

Sorry, I'm not sure -- I never tested it to that granularity, and I
don't have access to the hardware anymore now.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
  2011-07-03 15:28     ` Russell King - ARM Linux
  (?)
@ 2011-07-26 12:56       ` Marek Szyprowski
  -1 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-07-26 12:56 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel',
	Marek Szyprowski

Hello,

On Sunday, July 03, 2011 5:28 PM Russell King wrote:

> On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
> > This patch removes the need for offset parameter in dma bounce
> > functions. This is required to let dma-mapping framework on ARM
> > architecture use common, generic dma-mapping helpers.
> 
> I really don't like this.  Really as in hate.  Why?  I've said in the past
> that the whole idea of getting rid of the sub-range functions is idiotic.
> 
> If you have to deal with cache coherence, what you _really_ want is an
> API which tells you the size of the original buffer and the section of
> that buffer which you want to handle - because the edges of the buffer
> need special handling.
> 
> Lets say that you have a buffer which is 256 bytes long, misaligned to
> half a cache line.  Let's first look at the sequence for whole-buffer:
> 
> 1. You map it for DMA from the device.  This means you writeback the
>    first and last cache lines to perserve any data shared in the
>    overlapping cache line.  The remainder you can just invalidate.
> 
> 2. You want to access the buffer, so you use the sync_for_cpu function.
>    If your CPU doesn't do any speculative prefetching, then you don't
>    need to do anything.  If you do, you have to invalidate the buffer,
>    but you must preserve the overlapping cache lines which again must
>    be written back.
> 
> 3. You transfer ownership back to the device using sync_for_device.
>    As you may have caused cache lines to be read in, again you need to
>    invalidate, and the overlapping cache lines must be written back.
> 
> Now, if you ask for a sub-section of the buffer to be sync'd, you can
> actually eliminate those writebacks which are potentially troublesome,
> and which could corrupt neighbouring data.
> 
> If you get rid of the sub-buffer functions and start using the whole
> buffer functions for that purpose, you no longer know whether the
> partial cache lines are part of the buffer or not, so you have to write
> those back every time too.
> 
> So far, we haven't had any reports of corruption of this type (maybe
> folk using the sync functions are rare on ARM - thankfully) but getting
> rid of the range sync functions means that solving this becomes a lot
> more difficult because we've lost the information to make the decision.

Well, right now I haven't heard anyone who wants to remove 
dma_sync_single_range_for_{cpu,device}. All this is about internal
implementation and dma_map_ops which uses the simplified calls, not
exposed to the drivers or any public API. 

I also see no reason why we loose the information. All drivers are still
required to call dma_map_{single,page} to aquire dma address first. This
way DMA mapping subsystem perfectly knows that the range from returned 
dma_addr to dma_addr+size has been used for dma operations. All calls to
dma_sync_single_* operations takes dma_addr as one of the arguments, so
there is no problem to check which dma range this particular sync 
operation fits.

In my patch I have shown that it is perfectly possible to use the common
dma_map_ops structure on ARM and unify dma mapping implementation a bit
with other architectures.

IMHO this is the right way. There is a need for custom dma mapping 
implementations (mainly related to taking the advantage of iommu controllers
available on newer SoCs). I would really like to avoid another set of ifdefs
or sequences of "if (iommu_supported())" all over the dma-mapping code. Even
now all this code is hard to understand in the first read (due to coherent/
non-coherent sub-architectures and dmabounce code mixed in).

> So I've always believed - and continue to do so - that those who want
> to get rid of the range sync functions are misguided and are storing up
> problems for the future.

I never said that I want to remove these operations from drivers API.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-07-26 12:56       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-07-26 12:56 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: linux-arm-kernel, linaro-mm-sig, linux-mm, linux-arch,
	'Kyungmin Park', 'Arnd Bergmann',
	'Joerg Roedel',
	Marek Szyprowski

Hello,

On Sunday, July 03, 2011 5:28 PM Russell King wrote:

> On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
> > This patch removes the need for offset parameter in dma bounce
> > functions. This is required to let dma-mapping framework on ARM
> > architecture use common, generic dma-mapping helpers.
> 
> I really don't like this.  Really as in hate.  Why?  I've said in the past
> that the whole idea of getting rid of the sub-range functions is idiotic.
> 
> If you have to deal with cache coherence, what you _really_ want is an
> API which tells you the size of the original buffer and the section of
> that buffer which you want to handle - because the edges of the buffer
> need special handling.
> 
> Lets say that you have a buffer which is 256 bytes long, misaligned to
> half a cache line.  Let's first look at the sequence for whole-buffer:
> 
> 1. You map it for DMA from the device.  This means you writeback the
>    first and last cache lines to perserve any data shared in the
>    overlapping cache line.  The remainder you can just invalidate.
> 
> 2. You want to access the buffer, so you use the sync_for_cpu function.
>    If your CPU doesn't do any speculative prefetching, then you don't
>    need to do anything.  If you do, you have to invalidate the buffer,
>    but you must preserve the overlapping cache lines which again must
>    be written back.
> 
> 3. You transfer ownership back to the device using sync_for_device.
>    As you may have caused cache lines to be read in, again you need to
>    invalidate, and the overlapping cache lines must be written back.
> 
> Now, if you ask for a sub-section of the buffer to be sync'd, you can
> actually eliminate those writebacks which are potentially troublesome,
> and which could corrupt neighbouring data.
> 
> If you get rid of the sub-buffer functions and start using the whole
> buffer functions for that purpose, you no longer know whether the
> partial cache lines are part of the buffer or not, so you have to write
> those back every time too.
> 
> So far, we haven't had any reports of corruption of this type (maybe
> folk using the sync functions are rare on ARM - thankfully) but getting
> rid of the range sync functions means that solving this becomes a lot
> more difficult because we've lost the information to make the decision.

Well, right now I haven't heard anyone who wants to remove 
dma_sync_single_range_for_{cpu,device}. All this is about internal
implementation and dma_map_ops which uses the simplified calls, not
exposed to the drivers or any public API. 

I also see no reason why we loose the information. All drivers are still
required to call dma_map_{single,page} to aquire dma address first. This
way DMA mapping subsystem perfectly knows that the range from returned 
dma_addr to dma_addr+size has been used for dma operations. All calls to
dma_sync_single_* operations takes dma_addr as one of the arguments, so
there is no problem to check which dma range this particular sync 
operation fits.

In my patch I have shown that it is perfectly possible to use the common
dma_map_ops structure on ARM and unify dma mapping implementation a bit
with other architectures.

IMHO this is the right way. There is a need for custom dma mapping 
implementations (mainly related to taking the advantage of iommu controllers
available on newer SoCs). I would really like to avoid another set of ifdefs
or sequences of "if (iommu_supported())" all over the dma-mapping code. Even
now all this code is hard to understand in the first read (due to coherent/
non-coherent sub-architectures and dmabounce code mixed in).

> So I've always believed - and continue to do so - that those who want
> to get rid of the range sync functions are misguided and are storing up
> problems for the future.

I never said that I want to remove these operations from drivers API.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops
@ 2011-07-26 12:56       ` Marek Szyprowski
  0 siblings, 0 replies; 201+ messages in thread
From: Marek Szyprowski @ 2011-07-26 12:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Sunday, July 03, 2011 5:28 PM Russell King wrote:

> On Mon, Jun 20, 2011 at 09:50:06AM +0200, Marek Szyprowski wrote:
> > This patch removes the need for offset parameter in dma bounce
> > functions. This is required to let dma-mapping framework on ARM
> > architecture use common, generic dma-mapping helpers.
> 
> I really don't like this.  Really as in hate.  Why?  I've said in the past
> that the whole idea of getting rid of the sub-range functions is idiotic.
> 
> If you have to deal with cache coherence, what you _really_ want is an
> API which tells you the size of the original buffer and the section of
> that buffer which you want to handle - because the edges of the buffer
> need special handling.
> 
> Lets say that you have a buffer which is 256 bytes long, misaligned to
> half a cache line.  Let's first look at the sequence for whole-buffer:
> 
> 1. You map it for DMA from the device.  This means you writeback the
>    first and last cache lines to perserve any data shared in the
>    overlapping cache line.  The remainder you can just invalidate.
> 
> 2. You want to access the buffer, so you use the sync_for_cpu function.
>    If your CPU doesn't do any speculative prefetching, then you don't
>    need to do anything.  If you do, you have to invalidate the buffer,
>    but you must preserve the overlapping cache lines which again must
>    be written back.
> 
> 3. You transfer ownership back to the device using sync_for_device.
>    As you may have caused cache lines to be read in, again you need to
>    invalidate, and the overlapping cache lines must be written back.
> 
> Now, if you ask for a sub-section of the buffer to be sync'd, you can
> actually eliminate those writebacks which are potentially troublesome,
> and which could corrupt neighbouring data.
> 
> If you get rid of the sub-buffer functions and start using the whole
> buffer functions for that purpose, you no longer know whether the
> partial cache lines are part of the buffer or not, so you have to write
> those back every time too.
> 
> So far, we haven't had any reports of corruption of this type (maybe
> folk using the sync functions are rare on ARM - thankfully) but getting
> rid of the range sync functions means that solving this becomes a lot
> more difficult because we've lost the information to make the decision.

Well, right now I haven't heard anyone who wants to remove 
dma_sync_single_range_for_{cpu,device}. All this is about internal
implementation and dma_map_ops which uses the simplified calls, not
exposed to the drivers or any public API. 

I also see no reason why we loose the information. All drivers are still
required to call dma_map_{single,page} to aquire dma address first. This
way DMA mapping subsystem perfectly knows that the range from returned 
dma_addr to dma_addr+size has been used for dma operations. All calls to
dma_sync_single_* operations takes dma_addr as one of the arguments, so
there is no problem to check which dma range this particular sync 
operation fits.

In my patch I have shown that it is perfectly possible to use the common
dma_map_ops structure on ARM and unify dma mapping implementation a bit
with other architectures.

IMHO this is the right way. There is a need for custom dma mapping 
implementations (mainly related to taking the advantage of iommu controllers
available on newer SoCs). I would really like to avoid another set of ifdefs
or sequences of "if (iommu_supported())" all over the dma-mapping code. Even
now all this code is hard to understand in the first read (due to coherent/
non-coherent sub-architectures and dmabounce code mixed in).

> So I've always believed - and continue to do so - that those who want
> to get rid of the range sync functions are misguided and are storing up
> problems for the future.

I never said that I want to remove these operations from drivers API.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 201+ messages in thread

end of thread, other threads:[~2011-07-26 12:56 UTC | newest]

Thread overview: 201+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-20  7:50 [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign Marek Szyprowski
2011-06-20  7:50 ` Marek Szyprowski
2011-06-20  7:50 ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 1/8] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  8:35   ` Michal Nazarewicz
2011-06-20  8:35     ` Michal Nazarewicz
2011-06-20  8:35     ` Michal Nazarewicz
2011-06-20 10:46     ` Marek Szyprowski
2011-06-20 10:46       ` Marek Szyprowski
2011-06-20 10:46       ` Marek Szyprowski
2011-07-03 15:28   ` Russell King - ARM Linux
2011-07-03 15:28     ` Russell King - ARM Linux
2011-07-03 15:28     ` Russell King - ARM Linux
2011-07-26 12:56     ` Marek Szyprowski
2011-07-26 12:56       ` Marek Szyprowski
2011-07-26 12:56       ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 2/8] ARM: dma-mapping: implement dma_map_single on top of dma_map_page Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20 14:39   ` Russell King - ARM Linux
2011-06-20 14:39     ` Russell King - ARM Linux
2011-06-20 14:39     ` Russell King - ARM Linux
2011-06-20 15:15     ` Marek Szyprowski
2011-06-20 15:15       ` Marek Szyprowski
2011-06-20 15:15       ` Marek Szyprowski
2011-06-24 15:24       ` Arnd Bergmann
2011-06-24 15:24         ` Arnd Bergmann
2011-06-24 15:24         ` Arnd Bergmann
2011-06-27 14:29         ` Marek Szyprowski
2011-06-27 14:29           ` Marek Szyprowski
2011-06-27 14:29           ` Marek Szyprowski
2011-06-27 14:53           ` Arnd Bergmann
2011-06-27 14:53             ` Arnd Bergmann
2011-06-27 14:53             ` Arnd Bergmann
2011-06-27 15:06             ` Marek Szyprowski
2011-06-27 15:06               ` Marek Szyprowski
2011-06-27 15:06               ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 3/8] ARM: dma-mapping: use asm-generic/dma-mapping-common.h Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20 14:33   ` [Linaro-mm-sig] " KyongHo Cho
2011-06-20 14:33     ` KyongHo Cho
2011-06-20 14:33     ` KyongHo Cho
2011-06-21 11:47     ` Marek Szyprowski
2011-06-21 11:47       ` Marek Szyprowski
2011-06-21 11:47       ` Marek Szyprowski
2011-06-24  8:39       ` 'Joerg Roedel'
2011-06-24  8:39         ` 'Joerg Roedel'
2011-06-24  8:39         ` 'Joerg Roedel'
2011-06-24 15:36   ` Arnd Bergmann
2011-06-24 15:36     ` Arnd Bergmann
2011-06-24 15:36     ` Arnd Bergmann
2011-06-24 15:36     ` Arnd Bergmann
2011-06-27 12:18     ` Marek Szyprowski
2011-06-27 12:18       ` Marek Szyprowski
2011-06-27 12:18       ` Marek Szyprowski
2011-06-27 13:19       ` Arnd Bergmann
2011-06-27 13:19         ` Arnd Bergmann
2011-06-27 13:19         ` Arnd Bergmann
2011-07-07 12:09         ` Lennert Buytenhek
2011-07-07 12:09           ` Lennert Buytenhek
2011-07-07 12:09           ` Lennert Buytenhek
2011-07-07 12:38           ` Russell King - ARM Linux
2011-07-07 12:38             ` Russell King - ARM Linux
2011-07-07 12:38             ` Russell King - ARM Linux
2011-07-15  0:10             ` Lennert Buytenhek
2011-07-15  0:10               ` Lennert Buytenhek
2011-07-15  0:10               ` Lennert Buytenhek
2011-07-15  9:27               ` Russell King - ARM Linux
2011-07-15  9:27                 ` Russell King - ARM Linux
2011-07-15  9:27                 ` Russell King - ARM Linux
2011-07-15 21:53                 ` Lennert Buytenhek
2011-07-15 21:53                   ` Lennert Buytenhek
2011-07-15 21:53                   ` Lennert Buytenhek
2011-06-20  7:50 ` [PATCH 4/8] ARM: dma-mapping: implement dma sg methods on top of generic dma ops Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20 14:37   ` KyongHo Cho
2011-06-20 14:37     ` KyongHo Cho
2011-06-20 14:37     ` KyongHo Cho
2011-06-20 14:40   ` Russell King - ARM Linux
2011-06-20 14:40     ` Russell King - ARM Linux
2011-06-20 14:40     ` Russell King - ARM Linux
2011-06-20 15:23     ` Marek Szyprowski
2011-06-20 15:23       ` Marek Szyprowski
2011-06-20 15:23       ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 5/8] ARM: dma-mapping: move all dma bounce code to separate dma ops structure Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20 14:42   ` Russell King - ARM Linux
2011-06-20 14:42     ` Russell King - ARM Linux
2011-06-20 14:42     ` Russell King - ARM Linux
2011-06-20 15:31     ` Marek Szyprowski
2011-06-20 15:31       ` Marek Szyprowski
2011-06-20 15:31       ` Marek Szyprowski
2011-06-24 15:47       ` Arnd Bergmann
2011-06-24 15:47         ` Arnd Bergmann
2011-06-24 15:47         ` Arnd Bergmann
2011-06-27 14:20         ` Marek Szyprowski
2011-06-27 14:20           ` Marek Szyprowski
2011-06-27 14:20           ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 6/8] ARM: dma-mapping: remove redundant code and cleanup Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 7/8] common: dma-mapping: change alloc/free_coherent method to more generic alloc/free_attrs Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20 14:45   ` KyongHo Cho
2011-06-20 14:45     ` KyongHo Cho
2011-06-20 14:45     ` KyongHo Cho
2011-06-20 15:06     ` Russell King - ARM Linux
2011-06-20 15:06       ` Russell King - ARM Linux
2011-06-20 15:06       ` Russell King - ARM Linux
2011-06-20 15:06       ` Russell King - ARM Linux
2011-06-20 15:14       ` [Linaro-mm-sig] " KyongHo Cho
2011-06-20 15:14         ` KyongHo Cho
2011-06-20 15:14         ` KyongHo Cho
2011-06-21 11:23     ` Marek Szyprowski
2011-06-21 11:23       ` Marek Szyprowski
2011-06-21 11:23       ` Marek Szyprowski
2011-06-22  0:00       ` [Linaro-mm-sig] " KyongHo Cho
2011-06-22  0:00         ` KyongHo Cho
2011-06-22  0:00         ` KyongHo Cho
2011-06-24  7:20         ` Marek Szyprowski
2011-06-24  7:20           ` Marek Szyprowski
2011-06-24  7:20           ` Marek Szyprowski
2011-06-24 15:51   ` Arnd Bergmann
2011-06-24 15:51     ` Arnd Bergmann
2011-06-24 15:51     ` Arnd Bergmann
2011-06-24 16:15     ` James Bottomley
2011-06-24 16:15       ` James Bottomley
2011-06-24 16:15       ` James Bottomley
2011-06-24 16:23       ` Arnd Bergmann
2011-06-24 16:23         ` Arnd Bergmann
2011-06-24 16:23         ` Arnd Bergmann
2011-06-27 12:23     ` Marek Szyprowski
2011-06-27 12:23       ` Marek Szyprowski
2011-06-27 12:23       ` Marek Szyprowski
2011-06-27 12:23       ` Marek Szyprowski
2011-06-27 13:22       ` Arnd Bergmann
2011-06-27 13:22         ` Arnd Bergmann
2011-06-27 13:22         ` Arnd Bergmann
2011-06-27 13:30         ` Marek Szyprowski
2011-06-27 13:30           ` Marek Szyprowski
2011-06-27 13:30           ` Marek Szyprowski
2011-06-24 15:53   ` Arnd Bergmann
2011-06-24 15:53     ` Arnd Bergmann
2011-06-24 15:53     ` Arnd Bergmann
2011-06-27 14:41     ` Marek Szyprowski
2011-06-27 14:41       ` Marek Szyprowski
2011-06-27 14:41       ` Marek Szyprowski
2011-06-20  7:50 ` [PATCH 8/8] ARM: dma-mapping: use alloc, mmap, free from dma_ops Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-20  7:50   ` Marek Szyprowski
2011-06-22  6:53   ` [Linaro-mm-sig] " KyongHo Cho
2011-06-22  6:53     ` KyongHo Cho
2011-06-22  6:53     ` KyongHo Cho
2011-06-22  4:53 ` [Linaro-mm-sig] [PATCH/RFC 0/8] ARM: DMA-mapping framework redesign Subash Patel
2011-06-22  4:53   ` Subash Patel
2011-06-22  4:53   ` Subash Patel
2011-06-22  6:59   ` Marek Szyprowski
2011-06-22  6:59     ` Marek Szyprowski
2011-06-22  6:59     ` Marek Szyprowski
2011-06-22  8:53     ` Subash Patel
2011-06-22  8:53       ` Subash Patel
2011-06-22  8:53       ` Subash Patel
2011-06-22  9:27       ` Marek Szyprowski
2011-06-22  9:27         ` Marek Szyprowski
2011-06-22  9:27         ` Marek Szyprowski
2011-06-22 16:00         ` Jordan Crouse
2011-06-22 16:00           ` Jordan Crouse
2011-06-22 16:00           ` Jordan Crouse
2011-06-23 13:09           ` Subash Patel
2011-06-23 13:09             ` Subash Patel
2011-06-23 13:09             ` Subash Patel
2011-06-23 16:24             ` Michael K. Edwards
2011-06-23 16:24               ` Michael K. Edwards
2011-06-23 16:24               ` Michael K. Edwards
2011-06-23 22:09               ` Michael K. Edwards
2011-06-23 22:09                 ` Michael K. Edwards
2011-06-23 22:09                 ` Michael K. Edwards
2011-06-25  5:23                 ` Jonathan Morton
2011-06-25  5:23                   ` Jonathan Morton
2011-06-25  5:23                   ` Jonathan Morton
2011-06-25  9:55                   ` Michael K. Edwards
2011-06-25  9:55                     ` Michael K. Edwards
2011-06-25  9:55                     ` Michael K. Edwards
2011-06-26  0:06                     ` Jonathan Morton
2011-06-26  0:06                       ` Jonathan Morton
2011-06-26  0:06                       ` Jonathan Morton
2011-06-24 15:20           ` Arnd Bergmann
2011-06-24 15:20             ` Arnd Bergmann
2011-06-24 15:20             ` Arnd Bergmann
2011-06-24  9:18 ` Joerg Roedel
2011-06-24  9:18   ` Joerg Roedel
2011-06-24  9:18   ` Joerg Roedel
2011-06-24 14:26   ` Marek Szyprowski
2011-06-24 14:26     ` Marek Szyprowski
2011-06-24 14:26     ` Marek Szyprowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.