All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
@ 2016-12-13 13:45 Vladimir Murzin
  2016-12-13 13:45 ` [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU Vladimir Murzin
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-13 13:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

It seem that addition of cache support for M-class cpus uncovered
latent bug in DMA usage. NOMMU memory model has been treated as being
always consistent; however, for R/M classes of cpu memory can be
covered by MPU which in turn might configure RAM as Normal
i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and
friends, since data can stuck in caches now or be buffered.

This patch set is trying to address the issue by providing region of
memory suitable for consistent DMA operations. It is supposed that such
region is marked by MPU as non-cacheable. Since we have MPU support in
Linux for R-class only and M-class setting MPU in bootloader, proposed
interface to advertise such memory is via "memdma=size at start" command
line option, to avoid clashing with normal memory (which usually comes
from dts) it'd be safer to use it together with "mem=" command line
option. Meanwhile, I'm open to suggestions for the better way telling
Linux of such memory.

For configuration without cache support (like Cortex-M3/M4) dma
operations are forced to be coherent and wired with dma-noop. Such
decision is made based on cacheid global variable. In case cpu
supports caches and no coherent memory region is given - dma is
disallowed. Probably, some other important checks are missing, so I'll
all my ears :)

To make life easier NOMMU dma operations are kept in separate
compilation unit.

Thanks!

Changelog:

    RFC v1 -> RFC v2
           - s/dmac_unmap_area/dmac_map_area in __dma_page_cpu_to_dev()
	   - removed unrelated changes in nommu.c

Vladimir Murzin (3):
  ARM: NOMMU: introduce dma operations for noMMU
  ARM: NOMMU: set ARM_DMA_MEM_BUFFERABLE for M-class cpus
  ARM: dma-mapping: remove traces of NOMMU code

 arch/arm/include/asm/dma-mapping.h |    3 +-
 arch/arm/mm/Kconfig                |    2 +-
 arch/arm/mm/Makefile               |    5 +-
 arch/arm/mm/dma-mapping-nommu.c    |  262 ++++++++++++++++++++++++++++++++++++
 arch/arm/mm/dma-mapping.c          |   26 +---
 arch/arm/mm/mm.h                   |    3 +
 arch/arm/mm/nommu.c                |    6 +
 7 files changed, 278 insertions(+), 29 deletions(-)
 create mode 100644 arch/arm/mm/dma-mapping-nommu.c

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU
  2016-12-13 13:45 [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Vladimir Murzin
@ 2016-12-13 13:45 ` Vladimir Murzin
  2017-01-02 15:26   ` Benjamin Gaignard
  2016-12-13 13:45 ` [RFC v2 PATCH 2/3] ARM: NOMMU: set ARM_DMA_MEM_BUFFERABLE for M-class cpus Vladimir Murzin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-13 13:45 UTC (permalink / raw)
  To: linux-arm-kernel

R/M classes of cpus can have momory covered by MPU which in turn might
configure RAM as Normal i.e. bufferable and cacheable. It breaks
dma_alloc_coherent() and friends, since data can stuck in caches now
or be buffered.

This patch introduces the way to specify region of memory (via
"memdma=size at start" command line option) suitable for consistent DMA
operations. It is supposed that such region is marked by MPU as
non-cacheable.

For configuration without cache support (like Cortex-M3/M4) dma
operations are forced to be coherent and wired with dma-noop. Such
decision is made based on cacheid global variable. In case cpu
supports caches and no coherent memory region is given - dma is
disallowed.

Reported-by: Alexandre Torgue <alexandre.torgue@st.com>
Reported-by: Andras Szemzo <sza@esh.hu>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
---
 arch/arm/include/asm/dma-mapping.h |    3 +-
 arch/arm/mm/Makefile               |    5 +-
 arch/arm/mm/dma-mapping-nommu.c    |  262 ++++++++++++++++++++++++++++++++++++
 arch/arm/mm/mm.h                   |    3 +
 arch/arm/mm/nommu.c                |    6 +
 5 files changed, 275 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/mm/dma-mapping-nommu.c

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index bf02dbd..559faad 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -20,7 +20,8 @@ static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
 {
 	if (dev && dev->archdata.dma_ops)
 		return dev->archdata.dma_ops;
-	return &arm_dma_ops;
+
+	return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_noop_ops;
 }
 
 static inline struct dma_map_ops *get_dma_ops(struct device *dev)
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 2ac7988..5796357 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -2,9 +2,8 @@
 # Makefile for the linux arm-specific parts of the memory manager.
 #
 
-obj-y				:= dma-mapping.o extable.o fault.o init.o \
-				   iomap.o
-
+obj-y				:= extable.o fault.o init.o iomap.o
+obj-y				+= dma-mapping$(MMUEXT).o
 obj-$(CONFIG_MMU)		+= fault-armv.o flush.o idmap.o ioremap.o \
 				   mmap.o pgd.o mmu.o pageattr.o
 
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
new file mode 100644
index 0000000..f92d98a
--- /dev/null
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -0,0 +1,262 @@
+/*
+ *  Based on linux/arch/arm/mm/dma-mapping.c
+ *
+ *  Copyright (C) 2000-2004 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ *  DMA uncached mapping support.
+ */
+
+#include <linux/export.h>
+#include <linux/mm.h>
+#include <linux/dma-mapping.h>
+#include <linux/scatterlist.h>
+#include <linux/genalloc.h>
+
+#include <asm/cachetype.h>
+#include <asm/cacheflush.h>
+#include <asm/outercache.h>
+
+#include "dma.h"
+
+unsigned long dma_start __initdata;
+unsigned long dma_size __initdata;
+
+static struct gen_pool *dma_pool;
+
+static void *arm_nommu_dma_alloc(struct device *dev, size_t size,
+			    dma_addr_t *dma_handle, gfp_t gfp,
+			    unsigned long attrs)
+{
+	void *ptr;
+
+	if (!dma_pool)
+		return NULL;
+
+	ptr = (void *)gen_pool_alloc(dma_pool, size);
+	if (ptr) {
+		*dma_handle = __pa(ptr);
+		dmac_flush_range(ptr, ptr + size);
+		outer_flush_range(__pa(ptr), __pa(ptr) + size);
+	}
+
+	return ptr;
+}
+
+static void arm_nommu_dma_free(struct device *dev, size_t size,
+			  void *cpu_addr, dma_addr_t dma_addr,
+			  unsigned long attrs)
+{
+	gen_pool_free(dma_pool, (unsigned long)cpu_addr, size);
+}
+
+static void __dma_page_cpu_to_dev(dma_addr_t handle, size_t size,
+			 enum dma_data_direction dir)
+{
+	dmac_map_area(__va(handle), size, dir);
+
+	if (dir == DMA_FROM_DEVICE)
+		outer_inv_range(handle, handle + size);
+	else
+		outer_clean_range(handle, handle + size);
+}
+
+static void __dma_page_dev_to_cpu(dma_addr_t handle, size_t size,
+			 enum dma_data_direction dir)
+{
+	if (dir != DMA_TO_DEVICE) {
+		outer_inv_range(handle, handle + size);
+		dmac_unmap_area(__va(handle), size, dir);
+	}
+}
+
+static dma_addr_t arm_nommu_dma_map_page(struct device *dev, struct page *page,
+				      unsigned long offset, size_t size,
+				      enum dma_data_direction dir,
+				      unsigned long attrs)
+{
+	dma_addr_t handle = page_to_phys(page) + offset;
+
+	__dma_page_cpu_to_dev(handle, size, dir);
+
+	return handle;
+}
+
+static void arm_nommu_dma_unmap_page(struct device *dev, dma_addr_t handle,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	__dma_page_dev_to_cpu(handle, size, dir);
+}
+
+
+static int arm_nommu_dma_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+			     enum dma_data_direction dir,
+			     unsigned long attrs)
+{
+	int i;
+	struct scatterlist *sg;
+
+	for_each_sg(sgl, sg, nents, i) {
+		sg_dma_address(sg) = sg_phys(sg);
+		sg_dma_len(sg) = sg->length;
+		__dma_page_cpu_to_dev(sg_dma_address(sg), sg_dma_len(sg), dir);
+	}
+
+	return nents;
+}
+
+static void arm_nommu_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, int nents,
+		enum dma_data_direction dir, unsigned long attrs)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sgl, sg, nents, i)
+		__dma_page_dev_to_cpu(sg_dma_address(sg), sg_dma_len(sg), dir);
+}
+
+static void arm_nommu_dma_sync_single_for_device(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	__dma_page_cpu_to_dev(handle, size, dir);
+}
+
+static void arm_nommu_dma_sync_single_for_cpu(struct device *dev,
+		dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+	__dma_page_cpu_to_dev(handle, size, dir);
+}
+
+static void arm_nommu_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
+				      int nents, enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sgl, sg, nents, i)
+		__dma_page_cpu_to_dev(sg_dma_address(sg), sg_dma_len(sg), dir);
+}
+
+static void arm_nommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl,
+				   int nents, enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sgl, sg, nents, i)
+		__dma_page_dev_to_cpu(sg_dma_address(sg), sg_dma_len(sg), dir);
+}
+
+struct dma_map_ops arm_nommu_dma_ops = {
+	.alloc			= arm_nommu_dma_alloc,
+	.free			= arm_nommu_dma_free,
+	.map_page		= arm_nommu_dma_map_page,
+	.unmap_page		= arm_nommu_dma_unmap_page,
+	.map_sg			= arm_nommu_dma_map_sg,
+	.unmap_sg		= arm_nommu_dma_unmap_sg,
+	.sync_single_for_device	= arm_nommu_dma_sync_single_for_device,
+	.sync_single_for_cpu	= arm_nommu_dma_sync_single_for_cpu,
+	.sync_sg_for_device	= arm_nommu_dma_sync_sg_for_device,
+	.sync_sg_for_cpu	= arm_nommu_dma_sync_sg_for_cpu,
+};
+EXPORT_SYMBOL(arm_nommu_dma_ops);
+
+static struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
+{
+	return coherent ? &dma_noop_ops : &arm_nommu_dma_ops;
+}
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			const struct iommu_ops *iommu, bool coherent)
+{
+	struct dma_map_ops *dma_ops;
+
+	/*
+	 * Cahe support for v7m is optional, so can be treated as
+	 * coherent if no cache has been detected.
+	 */
+	dev->archdata.dma_coherent = (cacheid) ? coherent : true;
+
+	dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
+
+	set_dma_ops(dev, dma_ops);
+}
+
+void arch_teardown_dma_ops(struct device *dev)
+{
+}
+
+int dma_supported(struct device *dev, u64 mask)
+{
+	if (cacheid && !dma_pool)
+		return 0;
+
+	return 1;
+}
+
+EXPORT_SYMBOL(dma_supported);
+
+#define PREALLOC_DMA_DEBUG_ENTRIES	4096
+
+static int __init dma_debug_do_init(void)
+{
+	dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
+	return 0;
+}
+core_initcall(dma_debug_do_init);
+
+/*
+ * Initialise the coherent pool for DMA allocations.
+ */
+static int  __init dma_pool_init(void)
+{
+	int ret;
+
+	if (cacheid && !dma_size) {
+		pr_warn("DMA: coherent memory region has not been given.\n");
+		return 0;
+	}
+
+	dma_pool = gen_pool_create(PAGE_SHIFT, -1);
+
+	if (!dma_pool)
+		goto out;
+
+	ret = gen_pool_add_virt(dma_pool, (unsigned long)dma_start, (unsigned long)dma_start,
+				dma_size, -1);
+	if (ret)
+		goto destroy_genpool;
+
+	gen_pool_set_algo(dma_pool, gen_pool_first_fit_order_align, NULL);
+
+	pr_info("DMA: coherent memory region 0x%lx - 0x%lx (%lu KiB)\n",
+		dma_start, dma_start + dma_size, dma_size >> 10);
+
+	return 0;
+
+destroy_genpool:
+	gen_pool_destroy(dma_pool);
+	dma_pool = NULL;
+out:
+	pr_err("DMA: failed to allocate coherent memory region\n");
+	return -ENOMEM;
+}
+
+postcore_initcall(dma_pool_init);
+
+/* "memdma=<size>@<address>" parsing. */
+static int __init early_memdma(char *p)
+{
+	if (!p)
+		return -EINVAL;
+
+	dma_size = memparse(p, &p);
+	if (*p == '@')
+		dma_start = memparse(p + 1, &p);
+
+	return 0;
+}
+early_param("memdma", early_memdma);
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index ce727d4..18eb869 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -97,3 +97,6 @@ struct static_vm {
 void dma_contiguous_remap(void);
 
 unsigned long __clear_cr(unsigned long mask);
+
+extern unsigned long dma_start  __initdata;
+extern unsigned long dma_size  __initdata;
diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
index 681cec8..5827e54 100644
--- a/arch/arm/mm/nommu.c
+++ b/arch/arm/mm/nommu.c
@@ -303,6 +303,12 @@ void __init sanity_check_meminfo(void)
 	end = memblock_end_of_DRAM();
 	high_memory = __va(end - 1) + 1;
 	memblock_set_current_limit(end);
+
+	if (dma_size &&
+	    memblock_overlaps_region(&memblock.memory, dma_start, dma_size)) {
+		pr_crit("DMA: coherent memory region overlaps with main memory.\n");
+		dma_size = 0;
+	}
 }
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 2/3] ARM: NOMMU: set ARM_DMA_MEM_BUFFERABLE for M-class cpus
  2016-12-13 13:45 [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Vladimir Murzin
  2016-12-13 13:45 ` [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU Vladimir Murzin
@ 2016-12-13 13:45 ` Vladimir Murzin
  2016-12-13 13:45 ` [RFC v2 PATCH 3/3] ARM: dma-mapping: remove traces of NOMMU code Vladimir Murzin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-13 13:45 UTC (permalink / raw)
  To: linux-arm-kernel

Now, we have dedicated non-cacheable region for consistent DMA
operations. However, that region can still be marked as bufferable by
MPU, so it'd be safer to have barriers by default.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
---
 arch/arm/mm/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 6dffbe4..7f0000d 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -1023,7 +1023,7 @@ config ARM_L1_CACHE_SHIFT
 
 config ARM_DMA_MEM_BUFFERABLE
 	bool "Use non-cacheable memory for DMA" if (CPU_V6 || CPU_V6K) && !CPU_V7
-	default y if CPU_V6 || CPU_V6K || CPU_V7
+	default y if CPU_V6 || CPU_V6K || CPU_V7 || CPU_V7M
 	help
 	  Historically, the kernel has used strongly ordered mappings to
 	  provide DMA coherent memory.  With the advent of ARMv7, mapping
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 3/3] ARM: dma-mapping: remove traces of NOMMU code
  2016-12-13 13:45 [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Vladimir Murzin
  2016-12-13 13:45 ` [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU Vladimir Murzin
  2016-12-13 13:45 ` [RFC v2 PATCH 2/3] ARM: NOMMU: set ARM_DMA_MEM_BUFFERABLE for M-class cpus Vladimir Murzin
@ 2016-12-13 13:45 ` Vladimir Murzin
  2016-12-13 14:07 ` [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Russell King - ARM Linux
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-13 13:45 UTC (permalink / raw)
  To: linux-arm-kernel

DMA operations for NOMMU case have been just factored out into
separate compilation unit, so don't keep dead code.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
---
 arch/arm/mm/dma-mapping.c |   26 ++------------------------
 1 file changed, 2 insertions(+), 24 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ab77100..d8a755b 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -344,8 +344,6 @@ static void __dma_free_buffer(struct page *page, size_t size)
 	}
 }
 
-#ifdef CONFIG_MMU
-
 static void *__alloc_from_contiguous(struct device *dev, size_t size,
 				     pgprot_t prot, struct page **ret_page,
 				     const void *caller, bool want_vaddr,
@@ -646,22 +644,6 @@ static inline pgprot_t __get_dma_pgprot(unsigned long attrs, pgprot_t prot)
 	return prot;
 }
 
-#define nommu() 0
-
-#else	/* !CONFIG_MMU */
-
-#define nommu() 1
-
-#define __get_dma_pgprot(attrs, prot)				__pgprot(0)
-#define __alloc_remap_buffer(dev, size, gfp, prot, ret, c, wv)	NULL
-#define __alloc_from_pool(size, ret_page)			NULL
-#define __alloc_from_contiguous(dev, size, prot, ret, c, wv, coherent_flag)	NULL
-#define __free_from_pool(cpu_addr, size)			do { } while (0)
-#define __free_from_contiguous(dev, page, cpu_addr, size, wv)	do { } while (0)
-#define __dma_free_remap(cpu_addr, size)			do { } while (0)
-
-#endif	/* CONFIG_MMU */
-
 static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp,
 				   struct page **ret_page)
 {
@@ -803,7 +785,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 
 	if (cma)
 		buf->allocator = &cma_allocator;
-	else if (nommu() || is_coherent)
+	else if (is_coherent)
 		buf->allocator = &simple_allocator;
 	else if (allowblock)
 		buf->allocator = &remap_allocator;
@@ -852,8 +834,7 @@ static int __arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
 		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		 unsigned long attrs)
 {
-	int ret = -ENXIO;
-#ifdef CONFIG_MMU
+	int ret;
 	unsigned long nr_vma_pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 	unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
 	unsigned long pfn = dma_to_pfn(dev, dma_addr);
@@ -868,7 +849,6 @@ static int __arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
 				      vma->vm_end - vma->vm_start,
 				      vma->vm_page_prot);
 	}
-#endif	/* CONFIG_MMU */
 
 	return ret;
 }
@@ -887,9 +867,7 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
 		 void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		 unsigned long attrs)
 {
-#ifdef CONFIG_MMU
 	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
-#endif	/* CONFIG_MMU */
 	return __arm_dma_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 13:45 [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Vladimir Murzin
                   ` (2 preceding siblings ...)
  2016-12-13 13:45 ` [RFC v2 PATCH 3/3] ARM: dma-mapping: remove traces of NOMMU code Vladimir Murzin
@ 2016-12-13 14:07 ` Russell King - ARM Linux
  2016-12-13 14:14   ` Vladimir Murzin
  2016-12-13 14:33 ` Szemző András
  2016-12-16 14:57 ` Alexandre Torgue
  5 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2016-12-13 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 13, 2016 at 01:45:01PM +0000, Vladimir Murzin wrote:
> This patch set is trying to address the issue by providing region of
> memory suitable for consistent DMA operations. It is supposed that such
> region is marked by MPU as non-cacheable. Since we have MPU support in
> Linux for R-class only and M-class setting MPU in bootloader, proposed
> interface to advertise such memory is via "memdma=size at start" command
> line option, to avoid clashing with normal memory (which usually comes
> from dts) it'd be safer to use it together with "mem=" command line
> option. Meanwhile, I'm open to suggestions for the better way telling
> Linux of such memory.

For those nommu systems where the MPU is not used, how do they allocate
DMA memory without setting aside a chunk of memory?

>From what I understand of the current nommu code, it would just use
the normal page allocator for DMA memory allocation, so now requiring
everything to fit the "nommu has mpu" case seems like it's going to
break older nommu.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 14:07 ` [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Russell King - ARM Linux
@ 2016-12-13 14:14   ` Vladimir Murzin
  2016-12-13 14:25     ` Robin Murphy
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-13 14:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/12/16 14:07, Russell King - ARM Linux wrote:
> On Tue, Dec 13, 2016 at 01:45:01PM +0000, Vladimir Murzin wrote:
>> This patch set is trying to address the issue by providing region of
>> memory suitable for consistent DMA operations. It is supposed that such
>> region is marked by MPU as non-cacheable. Since we have MPU support in
>> Linux for R-class only and M-class setting MPU in bootloader, proposed
>> interface to advertise such memory is via "memdma=size at start" command
>> line option, to avoid clashing with normal memory (which usually comes
>> from dts) it'd be safer to use it together with "mem=" command line
>> option. Meanwhile, I'm open to suggestions for the better way telling
>> Linux of such memory.
> 
> For those nommu systems where the MPU is not used, how do they allocate
> DMA memory without setting aside a chunk of memory?
> 
>>From what I understand of the current nommu code, it would just use
> the normal page allocator for DMA memory allocation, so now requiring
> everything to fit the "nommu has mpu" case seems like it's going to
> break older nommu.
> 

Probably, it'd be better if we just fallback to dma-noop operations if there
is no dma region, i.e. assume that platform is coherent. We still need a way
to tell user that absence of such region can be reason of broken DMA.

Cheers
Vladimir

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 14:14   ` Vladimir Murzin
@ 2016-12-13 14:25     ` Robin Murphy
  2016-12-13 15:02       ` Vladimir Murzin
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2016-12-13 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/12/16 14:14, Vladimir Murzin wrote:
> On 13/12/16 14:07, Russell King - ARM Linux wrote:
>> On Tue, Dec 13, 2016 at 01:45:01PM +0000, Vladimir Murzin wrote:
>>> This patch set is trying to address the issue by providing region of
>>> memory suitable for consistent DMA operations. It is supposed that such
>>> region is marked by MPU as non-cacheable. Since we have MPU support in
>>> Linux for R-class only and M-class setting MPU in bootloader, proposed
>>> interface to advertise such memory is via "memdma=size at start" command
>>> line option, to avoid clashing with normal memory (which usually comes
>>> from dts) it'd be safer to use it together with "mem=" command line
>>> option. Meanwhile, I'm open to suggestions for the better way telling
>>> Linux of such memory.
>>
>> For those nommu systems where the MPU is not used, how do they allocate
>> DMA memory without setting aside a chunk of memory?
>>
>> >From what I understand of the current nommu code, it would just use
>> the normal page allocator for DMA memory allocation, so now requiring
>> everything to fit the "nommu has mpu" case seems like it's going to
>> break older nommu.
>>
> 
> Probably, it'd be better if we just fallback to dma-noop operations if there
> is no dma region, i.e. assume that platform is coherent. We still need a way
> to tell user that absence of such region can be reason of broken DMA.

As I mentioned internally, I think it would be worth trying to use CMA
for this, because dma_map_ops are already wired to try that first, and
from what I can see it seems already set up to do precisely this via a
"shared-dma-pool" reserved memory region (see rmem_cma_setup() in
drivers/base/dma-contiguous.c) - mandating that for cached v7-M systems
whilst letting cache-less/non-MPU systems automatically fall back to the
normal page allocator in its absence would seem to solve all 3 cases.

Other than the allocator issue, though, the rest of the refactoring does
look nice.

Robin.

> 
> Cheers
> Vladimir
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 13:45 [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Vladimir Murzin
                   ` (3 preceding siblings ...)
  2016-12-13 14:07 ` [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Russell King - ARM Linux
@ 2016-12-13 14:33 ` Szemző András
  2016-12-13 15:04   ` Vladimir Murzin
  2016-12-16 14:57 ` Alexandre Torgue
  5 siblings, 1 reply; 19+ messages in thread
From: Szemző András @ 2016-12-13 14:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

> On 2016. Dec 13., at 14:45, Vladimir Murzin <vladimir.murzin@arm.com> wrote:
> 
> Hi,
> 
> It seem that addition of cache support for M-class cpus uncovered
> latent bug in DMA usage. NOMMU memory model has been treated as being
> always consistent; however, for R/M classes of cpu memory can be
> covered by MPU which in turn might configure RAM as Normal
> i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and
> friends, since data can stuck in caches now or be buffered.
> 
> This patch set is trying to address the issue by providing region of
> memory suitable for consistent DMA operations. It is supposed that such
> region is marked by MPU as non-cacheable. Since we have MPU support in
> Linux for R-class only and M-class setting MPU in bootloader, proposed
> interface to advertise such memory is via "memdma=size at start" command
> line option, to avoid clashing with normal memory (which usually comes
> from dts) it'd be safer to use it together with "mem=" command line
> option. Meanwhile, I'm open to suggestions for the better way telling
> Linux of such memory.
> 

I have tested these patches on my ATMEL SAME70 armv7m board.

After setting the memory regions and attributes in the bootloader and providing the required 
command line parameters, the DMA issues fixed.

I have tested an usart, sdcard, and ethernet driver with DMA enabled.

Booting Linux on physical CPU 0x0
Linux version 4.9.0 (root at debian) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #4 Mon Dec 12 20:27:21 CET 201
6
CPU: ARMv7-M [410fc271] revision 1 (ARMv7M), cr=00000000
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
OF: fdt:Machine model: SAME70-sampione board
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 15748
Kernel command line: console=ttyS1,115200 root=/dev/mmcblk0p2 rw init=/linuxrc rootwait mem=62M memdma=2M at 0x73e00000

So you can add my Tested-by.

Thanks for the patches!

Regards,
Andras

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 14:25     ` Robin Murphy
@ 2016-12-13 15:02       ` Vladimir Murzin
  2016-12-13 18:32         ` Robin Murphy
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-13 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/12/16 14:25, Robin Murphy wrote:
> On 13/12/16 14:14, Vladimir Murzin wrote:
>> On 13/12/16 14:07, Russell King - ARM Linux wrote:
>>> On Tue, Dec 13, 2016 at 01:45:01PM +0000, Vladimir Murzin wrote:
>>>> This patch set is trying to address the issue by providing region of
>>>> memory suitable for consistent DMA operations. It is supposed that such
>>>> region is marked by MPU as non-cacheable. Since we have MPU support in
>>>> Linux for R-class only and M-class setting MPU in bootloader, proposed
>>>> interface to advertise such memory is via "memdma=size at start" command
>>>> line option, to avoid clashing with normal memory (which usually comes
>>>> from dts) it'd be safer to use it together with "mem=" command line
>>>> option. Meanwhile, I'm open to suggestions for the better way telling
>>>> Linux of such memory.
>>>
>>> For those nommu systems where the MPU is not used, how do they allocate
>>> DMA memory without setting aside a chunk of memory?
>>>
>>> >From what I understand of the current nommu code, it would just use
>>> the normal page allocator for DMA memory allocation, so now requiring
>>> everything to fit the "nommu has mpu" case seems like it's going to
>>> break older nommu.
>>>
>>
>> Probably, it'd be better if we just fallback to dma-noop operations if there
>> is no dma region, i.e. assume that platform is coherent. We still need a way
>> to tell user that absence of such region can be reason of broken DMA.
> 
> As I mentioned internally, I think it would be worth trying to use CMA
> for this, because dma_map_ops are already wired to try that first, and
> from what I can see it seems already set up to do precisely this via a
> "shared-dma-pool" reserved memory region (see rmem_cma_setup() in
> drivers/base/dma-contiguous.c) - mandating that for cached v7-M systems
> whilst letting cache-less/non-MPU systems automatically fall back to the
> normal page allocator in its absence would seem to solve all 3 cases.

Unfortunately,

config DMA_CMA
        bool "DMA Contiguous Memory Allocator"
        depends on HAVE_DMA_CONTIGUOUS && CMA
        help
...
config CMA
        bool "Contiguous Memory Allocator"
        depends on HAVE_MEMBLOCK && MMU
	select MIGRATION

and it blows up if I remove dependecy on MMU :(

Another option would be drivers/base/dma-coherent.c, but, IIUC, in this case
memory is reserved per device exclusively, so I'm in doubt if tiny M-class can
afford that...


> 
> Other than the allocator issue, though, the rest of the refactoring does
> look nice.

Thanks for going through it!

Cheers
Vladimir

> 
> Robin.
> 
>>
>> Cheers
>> Vladimir
>>
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 14:33 ` Szemző András
@ 2016-12-13 15:04   ` Vladimir Murzin
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-13 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 13/12/16 14:33, Szemz? Andr?s wrote:
> Hi,
> 
>> On 2016. Dec 13., at 14:45, Vladimir Murzin <vladimir.murzin@arm.com> wrote:
>>
>> Hi,
>>
>> It seem that addition of cache support for M-class cpus uncovered
>> latent bug in DMA usage. NOMMU memory model has been treated as being
>> always consistent; however, for R/M classes of cpu memory can be
>> covered by MPU which in turn might configure RAM as Normal
>> i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and
>> friends, since data can stuck in caches now or be buffered.
>>
>> This patch set is trying to address the issue by providing region of
>> memory suitable for consistent DMA operations. It is supposed that such
>> region is marked by MPU as non-cacheable. Since we have MPU support in
>> Linux for R-class only and M-class setting MPU in bootloader, proposed
>> interface to advertise such memory is via "memdma=size at start" command
>> line option, to avoid clashing with normal memory (which usually comes
>> from dts) it'd be safer to use it together with "mem=" command line
>> option. Meanwhile, I'm open to suggestions for the better way telling
>> Linux of such memory.
>>
> 
> I have tested these patches on my ATMEL SAME70 armv7m board.
> 
> After setting the memory regions and attributes in the bootloader and providing the required 
> command line parameters, the DMA issues fixed.
> 
> I have tested an usart, sdcard, and ethernet driver with DMA enabled.
> 
> Booting Linux on physical CPU 0x0
> Linux version 4.9.0 (root at debian) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #4 Mon Dec 12 20:27:21 CET 201
> 6
> CPU: ARMv7-M [410fc271] revision 1 (ARMv7M), cr=00000000
> CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
> OF: fdt:Machine model: SAME70-sampione board
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 15748
> Kernel command line: console=ttyS1,115200 root=/dev/mmcblk0p2 rw init=/linuxrc rootwait mem=62M memdma=2M at 0x73e00000
> 
> So you can add my Tested-by.
> 
> Thanks for the patches!

Glad to hear it works for you! Thanks for reporting and testing!

Cheers
Vladimir

> 
> Regards,
> Andras
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 15:02       ` Vladimir Murzin
@ 2016-12-13 18:32         ` Robin Murphy
  2016-12-14 10:15           ` Vladimir Murzin
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2016-12-13 18:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/12/16 15:02, Vladimir Murzin wrote:
> On 13/12/16 14:25, Robin Murphy wrote:
>> On 13/12/16 14:14, Vladimir Murzin wrote:
>>> On 13/12/16 14:07, Russell King - ARM Linux wrote:
>>>> On Tue, Dec 13, 2016 at 01:45:01PM +0000, Vladimir Murzin wrote:
>>>>> This patch set is trying to address the issue by providing region of
>>>>> memory suitable for consistent DMA operations. It is supposed that such
>>>>> region is marked by MPU as non-cacheable. Since we have MPU support in
>>>>> Linux for R-class only and M-class setting MPU in bootloader, proposed
>>>>> interface to advertise such memory is via "memdma=size at start" command
>>>>> line option, to avoid clashing with normal memory (which usually comes
>>>>> from dts) it'd be safer to use it together with "mem=" command line
>>>>> option. Meanwhile, I'm open to suggestions for the better way telling
>>>>> Linux of such memory.
>>>>
>>>> For those nommu systems where the MPU is not used, how do they allocate
>>>> DMA memory without setting aside a chunk of memory?
>>>>
>>>> >From what I understand of the current nommu code, it would just use
>>>> the normal page allocator for DMA memory allocation, so now requiring
>>>> everything to fit the "nommu has mpu" case seems like it's going to
>>>> break older nommu.
>>>>
>>>
>>> Probably, it'd be better if we just fallback to dma-noop operations if there
>>> is no dma region, i.e. assume that platform is coherent. We still need a way
>>> to tell user that absence of such region can be reason of broken DMA.
>>
>> As I mentioned internally, I think it would be worth trying to use CMA
>> for this, because dma_map_ops are already wired to try that first, and
>> from what I can see it seems already set up to do precisely this via a
>> "shared-dma-pool" reserved memory region (see rmem_cma_setup() in
>> drivers/base/dma-contiguous.c) - mandating that for cached v7-M systems
>> whilst letting cache-less/non-MPU systems automatically fall back to the
>> normal page allocator in its absence would seem to solve all 3 cases.
> 
> Unfortunately,
> 
> config DMA_CMA
>         bool "DMA Contiguous Memory Allocator"
>         depends on HAVE_DMA_CONTIGUOUS && CMA
>         help
> ...
> config CMA
>         bool "Contiguous Memory Allocator"
>         depends on HAVE_MEMBLOCK && MMU
> 	select MIGRATION
> 
> and it blows up if I remove dependecy on MMU :(

Ah yes, fair enough.

> Another option would be drivers/base/dma-coherent.c, but, IIUC, in this case
> memory is reserved per device exclusively, so I'm in doubt if tiny M-class can
> afford that...

I think as usual I managed to conflate the two - it was actually
dma_alloc_from_coherent() I had in mind when I mentioned dma_map_ops. It
does seem from 7bfa5ab6fa1b that dma-coherent *can* handle multiple
devices per region, so it wouldn't appear to be too hard to implement a
default coherent region (possibly specific to ARM_MPU) for all devices
in a similar manner to the default contiguous region. Either way I do
still think a reserved memory region in the DT is nicer and probably
more robust than the command line parameter.

Robin.

>> Other than the allocator issue, though, the rest of the refactoring does
>> look nice.
> 
> Thanks for going through it!
> 
> Cheers
> Vladimir

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 18:32         ` Robin Murphy
@ 2016-12-14 10:15           ` Vladimir Murzin
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-14 10:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 13/12/16 18:32, Robin Murphy wrote:
> On 13/12/16 15:02, Vladimir Murzin wrote:
>> On 13/12/16 14:25, Robin Murphy wrote:
>>> On 13/12/16 14:14, Vladimir Murzin wrote:
>>>> On 13/12/16 14:07, Russell King - ARM Linux wrote:
>>>>> On Tue, Dec 13, 2016 at 01:45:01PM +0000, Vladimir Murzin wrote:
>>>>>> This patch set is trying to address the issue by providing region of
>>>>>> memory suitable for consistent DMA operations. It is supposed that such
>>>>>> region is marked by MPU as non-cacheable. Since we have MPU support in
>>>>>> Linux for R-class only and M-class setting MPU in bootloader, proposed
>>>>>> interface to advertise such memory is via "memdma=size at start" command
>>>>>> line option, to avoid clashing with normal memory (which usually comes
>>>>>> from dts) it'd be safer to use it together with "mem=" command line
>>>>>> option. Meanwhile, I'm open to suggestions for the better way telling
>>>>>> Linux of such memory.
>>>>>
>>>>> For those nommu systems where the MPU is not used, how do they allocate
>>>>> DMA memory without setting aside a chunk of memory?
>>>>>
>>>>> >From what I understand of the current nommu code, it would just use
>>>>> the normal page allocator for DMA memory allocation, so now requiring
>>>>> everything to fit the "nommu has mpu" case seems like it's going to
>>>>> break older nommu.
>>>>>
>>>>
>>>> Probably, it'd be better if we just fallback to dma-noop operations if there
>>>> is no dma region, i.e. assume that platform is coherent. We still need a way
>>>> to tell user that absence of such region can be reason of broken DMA.
>>>
>>> As I mentioned internally, I think it would be worth trying to use CMA
>>> for this, because dma_map_ops are already wired to try that first, and
>>> from what I can see it seems already set up to do precisely this via a
>>> "shared-dma-pool" reserved memory region (see rmem_cma_setup() in
>>> drivers/base/dma-contiguous.c) - mandating that for cached v7-M systems
>>> whilst letting cache-less/non-MPU systems automatically fall back to the
>>> normal page allocator in its absence would seem to solve all 3 cases.
>>
>> Unfortunately,
>>
>> config DMA_CMA
>>         bool "DMA Contiguous Memory Allocator"
>>         depends on HAVE_DMA_CONTIGUOUS && CMA
>>         help
>> ...
>> config CMA
>>         bool "Contiguous Memory Allocator"
>>         depends on HAVE_MEMBLOCK && MMU
>> 	select MIGRATION
>>
>> and it blows up if I remove dependecy on MMU :(
> 
> Ah yes, fair enough.
> 
>> Another option would be drivers/base/dma-coherent.c, but, IIUC, in this case
>> memory is reserved per device exclusively, so I'm in doubt if tiny M-class can
>> afford that...
> 
> I think as usual I managed to conflate the two - it was actually
> dma_alloc_from_coherent() I had in mind when I mentioned dma_map_ops. It
> does seem from 7bfa5ab6fa1b that dma-coherent *can* handle multiple
> devices per region, so it wouldn't appear to be too hard to implement a
> default coherent region (possibly specific to ARM_MPU) for all devices
> in a similar manner to the default contiguous region. Either way I do
> still think a reserved memory region in the DT is nicer and probably
> more robust than the command line parameter.

Ok, I'll look at this option in detail.

Thanks
Vladimir

> 
> Robin.
> 
>>> Other than the allocator issue, though, the rest of the refactoring does
>>> look nice.
>>
>> Thanks for going through it!
>>
>> Cheers
>> Vladimir
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-13 13:45 [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Vladimir Murzin
                   ` (4 preceding siblings ...)
  2016-12-13 14:33 ` Szemző András
@ 2016-12-16 14:57 ` Alexandre Torgue
  2016-12-16 15:00   ` Vladimir Murzin
  5 siblings, 1 reply; 19+ messages in thread
From: Alexandre Torgue @ 2016-12-16 14:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Vladimir,

On 12/13/2016 02:45 PM, Vladimir Murzin wrote:
> Hi,
>
> It seem that addition of cache support for M-class cpus uncovered
> latent bug in DMA usage. NOMMU memory model has been treated as being
> always consistent; however, for R/M classes of cpu memory can be
> covered by MPU which in turn might configure RAM as Normal
> i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and
> friends, since data can stuck in caches now or be buffered.
>
> This patch set is trying to address the issue by providing region of
> memory suitable for consistent DMA operations. It is supposed that such
> region is marked by MPU as non-cacheable. Since we have MPU support in
> Linux for R-class only and M-class setting MPU in bootloader, proposed
> interface to advertise such memory is via "memdma=size at start" command
> line option, to avoid clashing with normal memory (which usually comes
> from dts) it'd be safer to use it together with "mem=" command line
> option. Meanwhile, I'm open to suggestions for the better way telling
> Linux of such memory.
>
> For configuration without cache support (like Cortex-M3/M4) dma
> operations are forced to be coherent and wired with dma-noop. Such
> decision is made based on cacheid global variable. In case cpu
> supports caches and no coherent memory region is given - dma is
> disallowed. Probably, some other important checks are missing, so I'll
> all my ears :)
>
> To make life easier NOMMU dma operations are kept in separate
> compilation unit.
>
> Thanks!
>
> Changelog:
>
>     RFC v1 -> RFC v2
>            - s/dmac_unmap_area/dmac_map_area in __dma_page_cpu_to_dev()
> 	   - removed unrelated changes in nommu.c
>
> Vladimir Murzin (3):
>   ARM: NOMMU: introduce dma operations for noMMU
>   ARM: NOMMU: set ARM_DMA_MEM_BUFFERABLE for M-class cpus
>   ARM: dma-mapping: remove traces of NOMMU code
>
>  arch/arm/include/asm/dma-mapping.h |    3 +-
>  arch/arm/mm/Kconfig                |    2 +-
>  arch/arm/mm/Makefile               |    5 +-
>  arch/arm/mm/dma-mapping-nommu.c    |  262 ++++++++++++++++++++++++++++++++++++
>  arch/arm/mm/dma-mapping.c          |   26 +---
>  arch/arm/mm/mm.h                   |    3 +
>  arch/arm/mm/nommu.c                |    6 +
>  7 files changed, 278 insertions(+), 29 deletions(-)
>  create mode 100644 arch/arm/mm/dma-mapping-nommu.c
>

First, thanks for this series.

I tested it on stm32f746 platform. Main issues related to cache and DMA 
are fixed but I still have an issue using dma_zalloc_alloc API. 
Allocated memory is not set to zero.
Can you have a look on it please?

Thanks
Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-16 14:57 ` Alexandre Torgue
@ 2016-12-16 15:00   ` Vladimir Murzin
  2016-12-16 15:33     ` Alexandre Torgue
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Murzin @ 2016-12-16 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alexandre,

On 16/12/16 14:57, Alexandre Torgue wrote:
> Hi Vladimir,
> 
> On 12/13/2016 02:45 PM, Vladimir Murzin wrote:
>> Hi,
>>
>> It seem that addition of cache support for M-class cpus uncovered
>> latent bug in DMA usage. NOMMU memory model has been treated as being
>> always consistent; however, for R/M classes of cpu memory can be
>> covered by MPU which in turn might configure RAM as Normal
>> i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and
>> friends, since data can stuck in caches now or be buffered.
>>
>> This patch set is trying to address the issue by providing region of
>> memory suitable for consistent DMA operations. It is supposed that such
>> region is marked by MPU as non-cacheable. Since we have MPU support in
>> Linux for R-class only and M-class setting MPU in bootloader, proposed
>> interface to advertise such memory is via "memdma=size at start" command
>> line option, to avoid clashing with normal memory (which usually comes
>> from dts) it'd be safer to use it together with "mem=" command line
>> option. Meanwhile, I'm open to suggestions for the better way telling
>> Linux of such memory.
>>
>> For configuration without cache support (like Cortex-M3/M4) dma
>> operations are forced to be coherent and wired with dma-noop. Such
>> decision is made based on cacheid global variable. In case cpu
>> supports caches and no coherent memory region is given - dma is
>> disallowed. Probably, some other important checks are missing, so I'll
>> all my ears :)
>>
>> To make life easier NOMMU dma operations are kept in separate
>> compilation unit.
>>
>> Thanks!
>>
>> Changelog:
>>
>>     RFC v1 -> RFC v2
>>            - s/dmac_unmap_area/dmac_map_area in __dma_page_cpu_to_dev()
>>        - removed unrelated changes in nommu.c
>>
>> Vladimir Murzin (3):
>>   ARM: NOMMU: introduce dma operations for noMMU
>>   ARM: NOMMU: set ARM_DMA_MEM_BUFFERABLE for M-class cpus
>>   ARM: dma-mapping: remove traces of NOMMU code
>>
>>  arch/arm/include/asm/dma-mapping.h |    3 +-
>>  arch/arm/mm/Kconfig                |    2 +-
>>  arch/arm/mm/Makefile               |    5 +-
>>  arch/arm/mm/dma-mapping-nommu.c    |  262 ++++++++++++++++++++++++++++++++++++
>>  arch/arm/mm/dma-mapping.c          |   26 +---
>>  arch/arm/mm/mm.h                   |    3 +
>>  arch/arm/mm/nommu.c                |    6 +
>>  7 files changed, 278 insertions(+), 29 deletions(-)
>>  create mode 100644 arch/arm/mm/dma-mapping-nommu.c
>>
> 
> First, thanks for this series.
> 
> I tested it on stm32f746 platform. Main issues related to cache and DMA are fixed but I still have an issue using dma_zalloc_alloc API. Allocated memory is not set to zero.
> Can you have a look on it please?

Thanks for testing! I think following diff should fix dma_zalloc_alloc():

diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index f92d98a..1f97bb8 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -38,6 +38,7 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t size,
 
        ptr = (void *)gen_pool_alloc(dma_pool, size);
        if (ptr) {
+               memset(ptr, 0, size);
                *dma_handle = __pa(ptr);
                dmac_flush_range(ptr, ptr + size);
                outer_flush_range(__pa(ptr), __pa(ptr) + size);


Cheers
Vladimir

> 
> Thanks
> Alex
> 
> 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU
  2016-12-16 15:00   ` Vladimir Murzin
@ 2016-12-16 15:33     ` Alexandre Torgue
  0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Torgue @ 2016-12-16 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 12/16/2016 04:00 PM, Vladimir Murzin wrote:
> Hi Alexandre,
>
> On 16/12/16 14:57, Alexandre Torgue wrote:
>> Hi Vladimir,
>>
>> On 12/13/2016 02:45 PM, Vladimir Murzin wrote:
>>> Hi,
>>>
>>> It seem that addition of cache support for M-class cpus uncovered
>>> latent bug in DMA usage. NOMMU memory model has been treated as being
>>> always consistent; however, for R/M classes of cpu memory can be
>>> covered by MPU which in turn might configure RAM as Normal
>>> i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and
>>> friends, since data can stuck in caches now or be buffered.
>>>
>>> This patch set is trying to address the issue by providing region of
>>> memory suitable for consistent DMA operations. It is supposed that such
>>> region is marked by MPU as non-cacheable. Since we have MPU support in
>>> Linux for R-class only and M-class setting MPU in bootloader, proposed
>>> interface to advertise such memory is via "memdma=size at start" command
>>> line option, to avoid clashing with normal memory (which usually comes
>>> from dts) it'd be safer to use it together with "mem=" command line
>>> option. Meanwhile, I'm open to suggestions for the better way telling
>>> Linux of such memory.
>>>
>>> For configuration without cache support (like Cortex-M3/M4) dma
>>> operations are forced to be coherent and wired with dma-noop. Such
>>> decision is made based on cacheid global variable. In case cpu
>>> supports caches and no coherent memory region is given - dma is
>>> disallowed. Probably, some other important checks are missing, so I'll
>>> all my ears :)
>>>
>>> To make life easier NOMMU dma operations are kept in separate
>>> compilation unit.
>>>
>>> Thanks!
>>>
>>> Changelog:
>>>
>>>     RFC v1 -> RFC v2
>>>            - s/dmac_unmap_area/dmac_map_area in __dma_page_cpu_to_dev()
>>>        - removed unrelated changes in nommu.c
>>>
>>> Vladimir Murzin (3):
>>>   ARM: NOMMU: introduce dma operations for noMMU
>>>   ARM: NOMMU: set ARM_DMA_MEM_BUFFERABLE for M-class cpus
>>>   ARM: dma-mapping: remove traces of NOMMU code
>>>
>>>  arch/arm/include/asm/dma-mapping.h |    3 +-
>>>  arch/arm/mm/Kconfig                |    2 +-
>>>  arch/arm/mm/Makefile               |    5 +-
>>>  arch/arm/mm/dma-mapping-nommu.c    |  262 ++++++++++++++++++++++++++++++++++++
>>>  arch/arm/mm/dma-mapping.c          |   26 +---
>>>  arch/arm/mm/mm.h                   |    3 +
>>>  arch/arm/mm/nommu.c                |    6 +
>>>  7 files changed, 278 insertions(+), 29 deletions(-)
>>>  create mode 100644 arch/arm/mm/dma-mapping-nommu.c
>>>
>>
>> First, thanks for this series.
>>
>> I tested it on stm32f746 platform. Main issues related to cache and DMA are fixed but I still have an issue using dma_zalloc_alloc API. Allocated memory is not set to zero.
>> Can you have a look on it please?
>
> Thanks for testing! I think following diff should fix dma_zalloc_alloc():

Yes. With the patch below, it works fine.

Thanks
Alex

>
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> index f92d98a..1f97bb8 100644
> --- a/arch/arm/mm/dma-mapping-nommu.c
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -38,6 +38,7 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t size,
>
>         ptr = (void *)gen_pool_alloc(dma_pool, size);
>         if (ptr) {
> +               memset(ptr, 0, size);
>                 *dma_handle = __pa(ptr);
>                 dmac_flush_range(ptr, ptr + size);
>                 outer_flush_range(__pa(ptr), __pa(ptr) + size);
>
>
> Cheers
> Vladimir
>
>>
>> Thanks
>> Alex
>>
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU
  2016-12-13 13:45 ` [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU Vladimir Murzin
@ 2017-01-02 15:26   ` Benjamin Gaignard
  2017-01-04 10:33     ` Vladimir Murzin
  0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Gaignard @ 2017-01-02 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Vladimir,

I have tested your patch on my setup (stm32f4: no MMU, no MPU) where
I'm writing display driver.
This driver use dma_alloc_wc() and dma_mmap_wc() for frame buffer
allocation and mmapping.

In dma-mapping-nommu.c you haven't implement dma_map_ops.mmap so
obviously my driver
doesn't work with your code.
In current implementation it is buggy too but I submit a patch to fix
that problem:
http://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8633/1

Could it be possible for you to include mmap support in dma-mapping-nommu.c ?

Regards,
Benjamin


2016-12-13 14:45 GMT+01:00 Vladimir Murzin <vladimir.murzin@arm.com>:
> R/M classes of cpus can have momory covered by MPU which in turn might
> configure RAM as Normal i.e. bufferable and cacheable. It breaks
> dma_alloc_coherent() and friends, since data can stuck in caches now
> or be buffered.
>
> This patch introduces the way to specify region of memory (via
> "memdma=size at start" command line option) suitable for consistent DMA
> operations. It is supposed that such region is marked by MPU as
> non-cacheable.
>
> For configuration without cache support (like Cortex-M3/M4) dma
> operations are forced to be coherent and wired with dma-noop. Such
> decision is made based on cacheid global variable. In case cpu
> supports caches and no coherent memory region is given - dma is
> disallowed.
>
> Reported-by: Alexandre Torgue <alexandre.torgue@st.com>
> Reported-by: Andras Szemzo <sza@esh.hu>
> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
> ---
>  arch/arm/include/asm/dma-mapping.h |    3 +-
>  arch/arm/mm/Makefile               |    5 +-
>  arch/arm/mm/dma-mapping-nommu.c    |  262 ++++++++++++++++++++++++++++++++++++
>  arch/arm/mm/mm.h                   |    3 +
>  arch/arm/mm/nommu.c                |    6 +
>  5 files changed, 275 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm/mm/dma-mapping-nommu.c
>
> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index bf02dbd..559faad 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -20,7 +20,8 @@ static inline struct dma_map_ops *__generic_dma_ops(struct device *dev)
>  {
>         if (dev && dev->archdata.dma_ops)
>                 return dev->archdata.dma_ops;
> -       return &arm_dma_ops;
> +
> +       return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_noop_ops;
>  }
>
>  static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> index 2ac7988..5796357 100644
> --- a/arch/arm/mm/Makefile
> +++ b/arch/arm/mm/Makefile
> @@ -2,9 +2,8 @@
>  # Makefile for the linux arm-specific parts of the memory manager.
>  #
>
> -obj-y                          := dma-mapping.o extable.o fault.o init.o \
> -                                  iomap.o
> -
> +obj-y                          := extable.o fault.o init.o iomap.o
> +obj-y                          += dma-mapping$(MMUEXT).o
>  obj-$(CONFIG_MMU)              += fault-armv.o flush.o idmap.o ioremap.o \
>                                    mmap.o pgd.o mmu.o pageattr.o
>
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> new file mode 100644
> index 0000000..f92d98a
> --- /dev/null
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -0,0 +1,262 @@
> +/*
> + *  Based on linux/arch/arm/mm/dma-mapping.c
> + *
> + *  Copyright (C) 2000-2004 Russell King
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + *  DMA uncached mapping support.
> + */
> +
> +#include <linux/export.h>
> +#include <linux/mm.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/scatterlist.h>
> +#include <linux/genalloc.h>
> +
> +#include <asm/cachetype.h>
> +#include <asm/cacheflush.h>
> +#include <asm/outercache.h>
> +
> +#include "dma.h"
> +
> +unsigned long dma_start __initdata;
> +unsigned long dma_size __initdata;
> +
> +static struct gen_pool *dma_pool;
> +
> +static void *arm_nommu_dma_alloc(struct device *dev, size_t size,
> +                           dma_addr_t *dma_handle, gfp_t gfp,
> +                           unsigned long attrs)
> +{
> +       void *ptr;
> +
> +       if (!dma_pool)
> +               return NULL;
> +
> +       ptr = (void *)gen_pool_alloc(dma_pool, size);
> +       if (ptr) {
> +               *dma_handle = __pa(ptr);
> +               dmac_flush_range(ptr, ptr + size);
> +               outer_flush_range(__pa(ptr), __pa(ptr) + size);
> +       }
> +
> +       return ptr;
> +}
> +
> +static void arm_nommu_dma_free(struct device *dev, size_t size,
> +                         void *cpu_addr, dma_addr_t dma_addr,
> +                         unsigned long attrs)
> +{
> +       gen_pool_free(dma_pool, (unsigned long)cpu_addr, size);
> +}
> +
> +static void __dma_page_cpu_to_dev(dma_addr_t handle, size_t size,
> +                        enum dma_data_direction dir)
> +{
> +       dmac_map_area(__va(handle), size, dir);
> +
> +       if (dir == DMA_FROM_DEVICE)
> +               outer_inv_range(handle, handle + size);
> +       else
> +               outer_clean_range(handle, handle + size);
> +}
> +
> +static void __dma_page_dev_to_cpu(dma_addr_t handle, size_t size,
> +                        enum dma_data_direction dir)
> +{
> +       if (dir != DMA_TO_DEVICE) {
> +               outer_inv_range(handle, handle + size);
> +               dmac_unmap_area(__va(handle), size, dir);
> +       }
> +}
> +
> +static dma_addr_t arm_nommu_dma_map_page(struct device *dev, struct page *page,
> +                                     unsigned long offset, size_t size,
> +                                     enum dma_data_direction dir,
> +                                     unsigned long attrs)
> +{
> +       dma_addr_t handle = page_to_phys(page) + offset;
> +
> +       __dma_page_cpu_to_dev(handle, size, dir);
> +
> +       return handle;
> +}
> +
> +static void arm_nommu_dma_unmap_page(struct device *dev, dma_addr_t handle,
> +               size_t size, enum dma_data_direction dir, unsigned long attrs)
> +{
> +       __dma_page_dev_to_cpu(handle, size, dir);
> +}
> +
> +
> +static int arm_nommu_dma_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
> +                            enum dma_data_direction dir,
> +                            unsigned long attrs)
> +{
> +       int i;
> +       struct scatterlist *sg;
> +
> +       for_each_sg(sgl, sg, nents, i) {
> +               sg_dma_address(sg) = sg_phys(sg);
> +               sg_dma_len(sg) = sg->length;
> +               __dma_page_cpu_to_dev(sg_dma_address(sg), sg_dma_len(sg), dir);
> +       }
> +
> +       return nents;
> +}
> +
> +static void arm_nommu_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, int nents,
> +               enum dma_data_direction dir, unsigned long attrs)
> +{
> +       struct scatterlist *sg;
> +       int i;
> +
> +       for_each_sg(sgl, sg, nents, i)
> +               __dma_page_dev_to_cpu(sg_dma_address(sg), sg_dma_len(sg), dir);
> +}
> +
> +static void arm_nommu_dma_sync_single_for_device(struct device *dev,
> +               dma_addr_t handle, size_t size, enum dma_data_direction dir)
> +{
> +       __dma_page_cpu_to_dev(handle, size, dir);
> +}
> +
> +static void arm_nommu_dma_sync_single_for_cpu(struct device *dev,
> +               dma_addr_t handle, size_t size, enum dma_data_direction dir)
> +{
> +       __dma_page_cpu_to_dev(handle, size, dir);
> +}
> +
> +static void arm_nommu_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
> +                                     int nents, enum dma_data_direction dir)
> +{
> +       struct scatterlist *sg;
> +       int i;
> +
> +       for_each_sg(sgl, sg, nents, i)
> +               __dma_page_cpu_to_dev(sg_dma_address(sg), sg_dma_len(sg), dir);
> +}
> +
> +static void arm_nommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sgl,
> +                                  int nents, enum dma_data_direction dir)
> +{
> +       struct scatterlist *sg;
> +       int i;
> +
> +       for_each_sg(sgl, sg, nents, i)
> +               __dma_page_dev_to_cpu(sg_dma_address(sg), sg_dma_len(sg), dir);
> +}
> +
> +struct dma_map_ops arm_nommu_dma_ops = {
> +       .alloc                  = arm_nommu_dma_alloc,
> +       .free                   = arm_nommu_dma_free,
> +       .map_page               = arm_nommu_dma_map_page,
> +       .unmap_page             = arm_nommu_dma_unmap_page,
> +       .map_sg                 = arm_nommu_dma_map_sg,
> +       .unmap_sg               = arm_nommu_dma_unmap_sg,
> +       .sync_single_for_device = arm_nommu_dma_sync_single_for_device,
> +       .sync_single_for_cpu    = arm_nommu_dma_sync_single_for_cpu,
> +       .sync_sg_for_device     = arm_nommu_dma_sync_sg_for_device,
> +       .sync_sg_for_cpu        = arm_nommu_dma_sync_sg_for_cpu,
> +};
> +EXPORT_SYMBOL(arm_nommu_dma_ops);
> +
> +static struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
> +{
> +       return coherent ? &dma_noop_ops : &arm_nommu_dma_ops;
> +}
> +
> +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> +                       const struct iommu_ops *iommu, bool coherent)
> +{
> +       struct dma_map_ops *dma_ops;
> +
> +       /*
> +        * Cahe support for v7m is optional, so can be treated as
> +        * coherent if no cache has been detected.
> +        */
> +       dev->archdata.dma_coherent = (cacheid) ? coherent : true;
> +
> +       dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
> +
> +       set_dma_ops(dev, dma_ops);
> +}
> +
> +void arch_teardown_dma_ops(struct device *dev)
> +{
> +}
> +
> +int dma_supported(struct device *dev, u64 mask)
> +{
> +       if (cacheid && !dma_pool)
> +               return 0;
> +
> +       return 1;
> +}
> +
> +EXPORT_SYMBOL(dma_supported);
> +
> +#define PREALLOC_DMA_DEBUG_ENTRIES     4096
> +
> +static int __init dma_debug_do_init(void)
> +{
> +       dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
> +       return 0;
> +}
> +core_initcall(dma_debug_do_init);
> +
> +/*
> + * Initialise the coherent pool for DMA allocations.
> + */
> +static int  __init dma_pool_init(void)
> +{
> +       int ret;
> +
> +       if (cacheid && !dma_size) {
> +               pr_warn("DMA: coherent memory region has not been given.\n");
> +               return 0;
> +       }
> +
> +       dma_pool = gen_pool_create(PAGE_SHIFT, -1);
> +
> +       if (!dma_pool)
> +               goto out;
> +
> +       ret = gen_pool_add_virt(dma_pool, (unsigned long)dma_start, (unsigned long)dma_start,
> +                               dma_size, -1);
> +       if (ret)
> +               goto destroy_genpool;
> +
> +       gen_pool_set_algo(dma_pool, gen_pool_first_fit_order_align, NULL);
> +
> +       pr_info("DMA: coherent memory region 0x%lx - 0x%lx (%lu KiB)\n",
> +               dma_start, dma_start + dma_size, dma_size >> 10);
> +
> +       return 0;
> +
> +destroy_genpool:
> +       gen_pool_destroy(dma_pool);
> +       dma_pool = NULL;
> +out:
> +       pr_err("DMA: failed to allocate coherent memory region\n");
> +       return -ENOMEM;
> +}
> +
> +postcore_initcall(dma_pool_init);
> +
> +/* "memdma=<size>@<address>" parsing. */
> +static int __init early_memdma(char *p)
> +{
> +       if (!p)
> +               return -EINVAL;
> +
> +       dma_size = memparse(p, &p);
> +       if (*p == '@')
> +               dma_start = memparse(p + 1, &p);
> +
> +       return 0;
> +}
> +early_param("memdma", early_memdma);
> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
> index ce727d4..18eb869 100644
> --- a/arch/arm/mm/mm.h
> +++ b/arch/arm/mm/mm.h
> @@ -97,3 +97,6 @@ struct static_vm {
>  void dma_contiguous_remap(void);
>
>  unsigned long __clear_cr(unsigned long mask);
> +
> +extern unsigned long dma_start  __initdata;
> +extern unsigned long dma_size  __initdata;
> diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
> index 681cec8..5827e54 100644
> --- a/arch/arm/mm/nommu.c
> +++ b/arch/arm/mm/nommu.c
> @@ -303,6 +303,12 @@ void __init sanity_check_meminfo(void)
>         end = memblock_end_of_DRAM();
>         high_memory = __va(end - 1) + 1;
>         memblock_set_current_limit(end);
> +
> +       if (dma_size &&
> +           memblock_overlaps_region(&memblock.memory, dma_start, dma_size)) {
> +               pr_crit("DMA: coherent memory region overlaps with main memory.\n");
> +               dma_size = 0;
> +       }
>  }
>
>  /*
> --
> 1.7.9.5
>



-- 
Benjamin Gaignard

Graphic Study Group

Linaro.org ? Open source software for ARM SoCs

Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU
  2017-01-02 15:26   ` Benjamin Gaignard
@ 2017-01-04 10:33     ` Vladimir Murzin
  2017-01-06 13:58       ` Benjamin Gaignard
  0 siblings, 1 reply; 19+ messages in thread
From: Vladimir Murzin @ 2017-01-04 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Benjamin,

On 02/01/17 15:26, Benjamin Gaignard wrote:
> Hello Vladimir,
> 
> I have tested your patch on my setup (stm32f4: no MMU, no MPU) where
> I'm writing display driver.
> This driver use dma_alloc_wc() and dma_mmap_wc() for frame buffer
> allocation and mmapping.
> 
> In dma-mapping-nommu.c you haven't implement dma_map_ops.mmap so
> obviously my driver
> doesn't work with your code.
> In current implementation it is buggy too but I submit a patch to fix
> that problem:
> http://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8633/1
> 
> Could it be possible for you to include mmap support in dma-mapping-nommu.c ?
> 

IIRC, stm32f4 is Cortex-M4, so no cache support and it means that it uses
dma_noop_ops. I offloaded mmap to common implementation, but completely forgot
it has the same restriction as arm counterpart. Regardless, thanks for
noticing that!

It seems that I need to check that mapping is done from DMA coherent area (I'm
moving to dma-coherent and here we have dma_mmap_from_coherent for that) and
something like bellow for dma_noop_ops:

diff --git a/lib/dma-noop.c b/lib/dma-noop.c
index 3d766e7..d838b88 100644
--- a/lib/dma-noop.c
+++ b/lib/dma-noop.c
@@ -64,6 +64,25 @@ static int dma_noop_supported(struct device *dev, u64 mask)
 	return 1;
 }
 
+static int dma_noop_mmap(struct device *dev, struct vm_area_struct *vma,
+			 void *cpu_addr, dma_addr_t dma_addr, size_t size)
+{
+	unsigned long user_count = vma_pages(vma);
+	unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
+	unsigned long off = vma->vm_pgoff;
+	int ret = -ENXIO;
+
+	if (off < count && user_count <= (count - off)) {
+		ret = remap_pfn_range(vma, vma->vm_start,
+				      pfn + off,
+				      user_count << PAGE_SHIFT,
+				      vma->vm_page_prot);
+	}
+
+	return ret;
+}
+
 struct dma_map_ops dma_noop_ops = {
 	.alloc			= dma_noop_alloc,
 	.free			= dma_noop_free,
@@ -71,6 +90,7 @@ struct dma_map_ops dma_noop_ops = {
 	.map_sg			= dma_noop_map_sg,
 	.mapping_error		= dma_noop_mapping_error,
 	.dma_supported		= dma_noop_supported,
+	.mmap			= dma_noop_mmap,
 };
 
 EXPORT_SYMBOL(dma_noop_ops);

I'd be glad to hear if it works for you.

Cheers
Vladimir

> Regards,
> Benjamin

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU
  2017-01-04 10:33     ` Vladimir Murzin
@ 2017-01-06 13:58       ` Benjamin Gaignard
  2017-01-09 13:54         ` Vladimir Murzin
  0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Gaignard @ 2017-01-06 13:58 UTC (permalink / raw)
  To: linux-arm-kernel

2017-01-04 11:33 GMT+01:00 Vladimir Murzin <vladimir.murzin@arm.com>:
> Hello Benjamin,
>
> On 02/01/17 15:26, Benjamin Gaignard wrote:
>> Hello Vladimir,
>>
>> I have tested your patch on my setup (stm32f4: no MMU, no MPU) where
>> I'm writing display driver.
>> This driver use dma_alloc_wc() and dma_mmap_wc() for frame buffer
>> allocation and mmapping.
>>
>> In dma-mapping-nommu.c you haven't implement dma_map_ops.mmap so
>> obviously my driver
>> doesn't work with your code.
>> In current implementation it is buggy too but I submit a patch to fix
>> that problem:
>> http://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8633/1
>>
>> Could it be possible for you to include mmap support in dma-mapping-nommu.c ?
>>
>
> IIRC, stm32f4 is Cortex-M4, so no cache support and it means that it uses
> dma_noop_ops. I offloaded mmap to common implementation, but completely forgot
> it has the same restriction as arm counterpart. Regardless, thanks for
> noticing that!
>
> It seems that I need to check that mapping is done from DMA coherent area (I'm
> moving to dma-coherent and here we have dma_mmap_from_coherent for that) and
> something like bellow for dma_noop_ops:
>
> diff --git a/lib/dma-noop.c b/lib/dma-noop.c
> index 3d766e7..d838b88 100644
> --- a/lib/dma-noop.c
> +++ b/lib/dma-noop.c
> @@ -64,6 +64,25 @@ static int dma_noop_supported(struct device *dev, u64 mask)
>         return 1;
>  }
>
> +static int dma_noop_mmap(struct device *dev, struct vm_area_struct *vma,
> +                        void *cpu_addr, dma_addr_t dma_addr, size_t size)
> +{
> +       unsigned long user_count = vma_pages(vma);
> +       unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +       unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
> +       unsigned long off = vma->vm_pgoff;
> +       int ret = -ENXIO;
> +
> +       if (off < count && user_count <= (count - off)) {
> +               ret = remap_pfn_range(vma, vma->vm_start,
> +                                     pfn + off,
> +                                     user_count << PAGE_SHIFT,
> +                                     vma->vm_page_prot);
> +       }
> +
> +       return ret;
> +}
> +
>  struct dma_map_ops dma_noop_ops = {
>         .alloc                  = dma_noop_alloc,
>         .free                   = dma_noop_free,
> @@ -71,6 +90,7 @@ struct dma_map_ops dma_noop_ops = {
>         .map_sg                 = dma_noop_map_sg,
>         .mapping_error          = dma_noop_mapping_error,
>         .dma_supported          = dma_noop_supported,
> +       .mmap                   = dma_noop_mmap,
>  };
>
>  EXPORT_SYMBOL(dma_noop_ops);
>
> I'd be glad to hear if it works for you.

With your patch mmap() does return an address unfortunately
framebuffer isn't displayed
anymore, I have a black screen instead of the usual pattern.

Without your patches my allocations dma_alloc_wc requests go to
dma-mapping so I guess
the problem is coming from dma noop implementation.
I have try to use dma-mapping-nommu ops but the status is the same.

>
> Cheers
> Vladimir
>
>> Regards,
>> Benjamin



-- 
Benjamin Gaignard

Graphic Study Group

Linaro.org ? Open source software for ARM SoCs

Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU
  2017-01-06 13:58       ` Benjamin Gaignard
@ 2017-01-09 13:54         ` Vladimir Murzin
  0 siblings, 0 replies; 19+ messages in thread
From: Vladimir Murzin @ 2017-01-09 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/01/17 13:58, Benjamin Gaignard wrote:
> 2017-01-04 11:33 GMT+01:00 Vladimir Murzin <vladimir.murzin@arm.com>:
>> Hello Benjamin,
>>
>> On 02/01/17 15:26, Benjamin Gaignard wrote:
>>> Hello Vladimir,
>>>
>>> I have tested your patch on my setup (stm32f4: no MMU, no MPU) where
>>> I'm writing display driver.
>>> This driver use dma_alloc_wc() and dma_mmap_wc() for frame buffer
>>> allocation and mmapping.
>>>
>>> In dma-mapping-nommu.c you haven't implement dma_map_ops.mmap so
>>> obviously my driver
>>> doesn't work with your code.
>>> In current implementation it is buggy too but I submit a patch to fix
>>> that problem:
>>> http://www.armlinux.org.uk/developer/patches/viewpatch.php?id=8633/1
>>>
>>> Could it be possible for you to include mmap support in dma-mapping-nommu.c ?
>>>
>>
>> IIRC, stm32f4 is Cortex-M4, so no cache support and it means that it uses
>> dma_noop_ops. I offloaded mmap to common implementation, but completely forgot
>> it has the same restriction as arm counterpart. Regardless, thanks for
>> noticing that!
>>
>> It seems that I need to check that mapping is done from DMA coherent area (I'm
>> moving to dma-coherent and here we have dma_mmap_from_coherent for that) and
>> something like bellow for dma_noop_ops:
>>
>> diff --git a/lib/dma-noop.c b/lib/dma-noop.c
>> index 3d766e7..d838b88 100644
>> --- a/lib/dma-noop.c
>> +++ b/lib/dma-noop.c
>> @@ -64,6 +64,25 @@ static int dma_noop_supported(struct device *dev, u64 mask)
>>         return 1;
>>  }
>>
>> +static int dma_noop_mmap(struct device *dev, struct vm_area_struct *vma,
>> +                        void *cpu_addr, dma_addr_t dma_addr, size_t size)
>> +{
>> +       unsigned long user_count = vma_pages(vma);
>> +       unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
>> +       unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
>> +       unsigned long off = vma->vm_pgoff;
>> +       int ret = -ENXIO;
>> +
>> +       if (off < count && user_count <= (count - off)) {
>> +               ret = remap_pfn_range(vma, vma->vm_start,
>> +                                     pfn + off,
>> +                                     user_count << PAGE_SHIFT,
>> +                                     vma->vm_page_prot);
>> +       }
>> +
>> +       return ret;
>> +}
>> +
>>  struct dma_map_ops dma_noop_ops = {
>>         .alloc                  = dma_noop_alloc,
>>         .free                   = dma_noop_free,
>> @@ -71,6 +90,7 @@ struct dma_map_ops dma_noop_ops = {
>>         .map_sg                 = dma_noop_map_sg,
>>         .mapping_error          = dma_noop_mapping_error,
>>         .dma_supported          = dma_noop_supported,
>> +       .mmap                   = dma_noop_mmap,
>>  };
>>
>>  EXPORT_SYMBOL(dma_noop_ops);
>>
>> I'd be glad to hear if it works for you.
> 
> With your patch mmap() does return an address unfortunately
> framebuffer isn't displayed
> anymore, I have a black screen instead of the usual pattern.
> 
> Without your patches my allocations dma_alloc_wc requests go to
> dma-mapping so I guess
> the problem is coming from dma noop implementation.
> I have try to use dma-mapping-nommu ops but the status is the same.

Thanks for giving it a go! I messed up function prototype above, it should be: 

static int dma_noop_mmap(struct device *dev, struct vm_area_struct *vma,
                        void *cpu_addr, dma_addr_t dma_addr, size_t size,
                        unsigned long attrs)

I've just sent v3 and I tested mmap() there with amba-clcd driver + some hacks
to make mmap call work with framebuffer.

Cheers
Vladimir

> 
>>
>> Cheers
>> Vladimir
>>
>>> Regards,
>>> Benjamin
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-01-09 13:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13 13:45 [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Vladimir Murzin
2016-12-13 13:45 ` [RFC v2 PATCH 1/3] ARM: NOMMU: introduce dma operations for noMMU Vladimir Murzin
2017-01-02 15:26   ` Benjamin Gaignard
2017-01-04 10:33     ` Vladimir Murzin
2017-01-06 13:58       ` Benjamin Gaignard
2017-01-09 13:54         ` Vladimir Murzin
2016-12-13 13:45 ` [RFC v2 PATCH 2/3] ARM: NOMMU: set ARM_DMA_MEM_BUFFERABLE for M-class cpus Vladimir Murzin
2016-12-13 13:45 ` [RFC v2 PATCH 3/3] ARM: dma-mapping: remove traces of NOMMU code Vladimir Murzin
2016-12-13 14:07 ` [RFC v2 PATCH 0/3] Fix dma_alloc_coherent() and friends for NOMMU Russell King - ARM Linux
2016-12-13 14:14   ` Vladimir Murzin
2016-12-13 14:25     ` Robin Murphy
2016-12-13 15:02       ` Vladimir Murzin
2016-12-13 18:32         ` Robin Murphy
2016-12-14 10:15           ` Vladimir Murzin
2016-12-13 14:33 ` Szemző András
2016-12-13 15:04   ` Vladimir Murzin
2016-12-16 14:57 ` Alexandre Torgue
2016-12-16 15:00   ` Vladimir Murzin
2016-12-16 15:33     ` Alexandre Torgue

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.