All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH] ARM: support for cache coherent allocations
@ 2012-05-30 21:41 Ilya Yanok
  2012-06-14 15:13 ` Ilya Yanok
  2012-06-15 22:29 ` Marek Vasut
  0 siblings, 2 replies; 7+ messages in thread
From: Ilya Yanok @ 2012-05-30 21:41 UTC (permalink / raw)
  To: u-boot

This is a draft implementation of cache coherent memory allocator.
This simple implementation just reserves memory area below malloc
space and leave it uncached even if data cache is enabled.
Allocations are even simpler: code just verifies that we have
enough space and increments the offset counter. No deallocations
supported for now. In future versions we could probably use
dlmalloc allocator to get space out of coherent pool.

Signed-off-by: Ilya Yanok <ilya.yanok@cogentembedded.com>
---
 arch/arm/include/asm/dma-mapping.h |    4 ++++
 arch/arm/include/asm/global_data.h |    4 ++++
 arch/arm/lib/Makefile              |    1 +
 arch/arm/lib/board.c               |    8 ++++++++
 arch/arm/lib/cache-cp15.c          |    5 +++++
 arch/arm/lib/dma-coherent.c        |   37 ++++++++++++++++++++++++++++++++++++
 6 files changed, 59 insertions(+)
 create mode 100644 arch/arm/lib/dma-coherent.c

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 5bbb0a0..a2145fc 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -30,11 +30,15 @@ enum dma_data_direction {
 	DMA_FROM_DEVICE		= 2,
 };
 
+#ifndef CONFIG_DMA_COHERENT
 static void *dma_alloc_coherent(size_t len, unsigned long *handle)
 {
 	*handle = (unsigned long)malloc(len);
 	return (void *)*handle;
 }
+#else
+void *dma_alloc_coherent(size_t len, unsigned long *handle);
+#endif
 
 static inline unsigned long dma_map_single(volatile void *vaddr, size_t len,
 					   enum dma_data_direction dir)
diff --git a/arch/arm/include/asm/global_data.h b/arch/arm/include/asm/global_data.h
index c3ff789..4655035 100644
--- a/arch/arm/include/asm/global_data.h
+++ b/arch/arm/include/asm/global_data.h
@@ -76,6 +76,10 @@ typedef	struct	global_data {
 #if !(defined(CONFIG_SYS_ICACHE_OFF) && defined(CONFIG_SYS_DCACHE_OFF))
 	unsigned long	tlb_addr;
 #endif
+#ifdef CONFIG_DMA_COHERENT
+	unsigned long	coherent_base;	/* Start address of coherent space */
+	unsigned long	coherent_size;	/* Size of coherent space */
+#endif
 	const void	*fdt_blob;	/* Our device tree, NULL if none */
 	void		**jt;		/* jump table */
 	char		env_buf[32];	/* buffer for getenv() before reloc. */
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 39a9550..e91dcd0 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -40,6 +40,7 @@ GLCOBJS	+= div0.o
 COBJS-y	+= board.o
 COBJS-y	+= bootm.o
 COBJS-$(CONFIG_SYS_L2_PL310) += cache-pl310.o
+COBJS-$(CONFIG_DMA_COHERENT) += dma-coherent.o
 COBJS-y	+= interrupts.o
 COBJS-y	+= reset.o
 SOBJS-$(CONFIG_USE_ARCH_MEMSET) += memset.o
diff --git a/arch/arm/lib/board.c b/arch/arm/lib/board.c
index 5270c11..6541a49 100644
--- a/arch/arm/lib/board.c
+++ b/arch/arm/lib/board.c
@@ -400,6 +400,14 @@ void board_init_f(ulong bootflag)
 	debug("Reserving %zu Bytes for Global Data at: %08lx\n",
 			sizeof (gd_t), addr_sp);
 
+#ifdef CONFIG_DMA_COHERENT
+	/* reserve space for cache coherent allocations */
+	gd->coherent_size = ALIGN(CONFIG_DMA_COHERENT_SIZE, 1 << 20);
+	addr_sp &= ~((1 << 20) - 1);
+	addr_sp -= gd->coherent_size;
+	gd->coherent_base = addr_sp;
+#endif
+
 	/* setup stackpointer for exeptions */
 	gd->irq_sp = addr_sp;
 #ifdef CONFIG_USE_IRQ
diff --git a/arch/arm/lib/cache-cp15.c b/arch/arm/lib/cache-cp15.c
index e6c3eae..c11e871 100644
--- a/arch/arm/lib/cache-cp15.c
+++ b/arch/arm/lib/cache-cp15.c
@@ -60,6 +60,11 @@ static inline void dram_bank_mmu_setup(int bank)
 	for (i = bd->bi_dram[bank].start >> 20;
 	     i < (bd->bi_dram[bank].start + bd->bi_dram[bank].size) >> 20;
 	     i++) {
+#ifdef CONFIG_DMA_COHERENT
+		if ((i >= gd->coherent_base >> 20) &&
+		    (i < (gd->coherent_base + gd->coherent_size) >> 20))
+			continue;
+#endif
 		page_table[i] = i << 20 | (3 << 10) | CACHE_SETUP;
 	}
 }
diff --git a/arch/arm/lib/dma-coherent.c b/arch/arm/lib/dma-coherent.c
new file mode 100644
index 0000000..30fa893
--- /dev/null
+++ b/arch/arm/lib/dma-coherent.c
@@ -0,0 +1,37 @@
+/*
+ * (C) Copyright 2012
+ * Ilya Yanok, ilya.yanok@gmail.com
+ *
+ * See file CREDITS for list of people who contributed to this
+ * project.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ */
+
+#include <common.h>
+
+DECLARE_GLOBAL_DATA_PTR;
+
+size_t offset;
+
+void *dma_alloc_coherent(size_t size, unsigned long *handle)
+{
+	if (size + offset > gd->coherent_size)
+		return NULL;
+
+	*handle = gd->coherent_base + offset;
+	offset += size;
+
+	return (void*)(*handle);
+}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] ARM: support for cache coherent allocations
  2012-05-30 21:41 [U-Boot] [PATCH] ARM: support for cache coherent allocations Ilya Yanok
@ 2012-06-14 15:13 ` Ilya Yanok
  2012-06-15 17:34   ` Tom Rini
  2012-06-15 22:29 ` Marek Vasut
  1 sibling, 1 reply; 7+ messages in thread
From: Ilya Yanok @ 2012-06-14 15:13 UTC (permalink / raw)
  To: u-boot

Hi All,

On Thu, May 31, 2012 at 1:41 AM, Ilya Yanok
<ilya.yanok@cogentembedded.com>wrote:

> This is a draft implementation of cache coherent memory allocator.
> This simple implementation just reserves memory area below malloc
> space and leave it uncached even if data cache is enabled.
> Allocations are even simpler: code just verifies that we have
> enough space and increments the offset counter. No deallocations
> supported for now. In future versions we could probably use
> dlmalloc allocator to get space out of coherent pool.
>

Any comments on this?

Regards, Ilya.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] ARM: support for cache coherent allocations
  2012-06-14 15:13 ` Ilya Yanok
@ 2012-06-15 17:34   ` Tom Rini
  0 siblings, 0 replies; 7+ messages in thread
From: Tom Rini @ 2012-06-15 17:34 UTC (permalink / raw)
  To: u-boot

On Thu, Jun 14, 2012 at 8:13 AM, Ilya Yanok
<ilya.yanok@cogentembedded.com> wrote:
> Hi All,
>
> On Thu, May 31, 2012 at 1:41 AM, Ilya Yanok
> <ilya.yanok@cogentembedded.com>wrote:
>
>> This is a draft implementation of cache coherent memory allocator.
>> This simple implementation just reserves memory area below malloc
>> space and leave it uncached even if data cache is enabled.
>> Allocations are even simpler: code just verifies that we have
>> enough space and increments the offset counter. No deallocations
>> supported for now. In future versions we could probably use
>> dlmalloc allocator to get space out of coherent pool.
>>
>
> Any comments on this?

Albert?

-- 
Tom

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] ARM: support for cache coherent allocations
  2012-05-30 21:41 [U-Boot] [PATCH] ARM: support for cache coherent allocations Ilya Yanok
  2012-06-14 15:13 ` Ilya Yanok
@ 2012-06-15 22:29 ` Marek Vasut
  2012-06-18 18:15   ` Ilya Yanok
  1 sibling, 1 reply; 7+ messages in thread
From: Marek Vasut @ 2012-06-15 22:29 UTC (permalink / raw)
  To: u-boot

Dear Ilya Yanok,

> This is a draft implementation of cache coherent memory allocator.
> This simple implementation just reserves memory area below malloc
> space and leave it uncached even if data cache is enabled.
> Allocations are even simpler: code just verifies that we have
> enough space and increments the offset counter. No deallocations
> supported for now. In future versions we could probably use
> dlmalloc allocator to get space out of coherent pool.
> 
> Signed-off-by: Ilya Yanok <ilya.yanok@cogentembedded.com>

Hm, can't we just punch a hole in the MMU table at runtime instead of 
preallocating it like this?

Also, what is this for? Can we not simply flush/invalidate the caches?

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] ARM: support for cache coherent allocations
  2012-06-15 22:29 ` Marek Vasut
@ 2012-06-18 18:15   ` Ilya Yanok
  2012-06-18 23:37     ` Marek Vasut
  0 siblings, 1 reply; 7+ messages in thread
From: Ilya Yanok @ 2012-06-18 18:15 UTC (permalink / raw)
  To: u-boot

Hi Marek,

[sorry for copying, forget to CC the list]

On Sat, Jun 16, 2012 at 2:29 AM, Marek Vasut <marek.vasut@gmail.com> wrote:

> Hm, can't we just punch a hole in the MMU table at runtime instead of
> preallocating it like this?
>

 It's allocated at runtime now, do you mean allocate it on demand? Good
point, Probably we can malloc big enough block and make it uncached
directly from dma_alloc_coherent(). Is it what you suggest?

Also, what is this for? Can we not simply flush/invalidate the caches?
>

 flush/invalidate can be racy for some hardware. Sometimes we need to write
some field to DMA descriptor and then read another one. And because one
cannot flush/invalidate individual bytes write/flush can destroy the field
updated by hardware. (Well, we can invalidate/read before write/flush but
that introduces a race).

Regards, Ilya.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] ARM: support for cache coherent allocations
  2012-06-18 18:15   ` Ilya Yanok
@ 2012-06-18 23:37     ` Marek Vasut
  2012-06-19 14:32       ` Ilya Yanok
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Vasut @ 2012-06-18 23:37 UTC (permalink / raw)
  To: u-boot

Dear Ilya Yanok,

> Hi Marek,
> 
> [sorry for copying, forget to CC the list]
> 
> On Sat, Jun 16, 2012 at 2:29 AM, Marek Vasut <marek.vasut@gmail.com> wrote:
> > Hm, can't we just punch a hole in the MMU table at runtime instead of
> > preallocating it like this?
> 
>  It's allocated at runtime now, do you mean allocate it on demand? Good
> point, Probably we can malloc big enough block and make it uncached
> directly from dma_alloc_coherent(). Is it what you suggest?

Kind of ... I mean rather insert an entry into MMU table at runtime that says 
"this region is uncached". But that'd need some hack in the mallocator now that 
I think about it. It might not be as simple as I thought at first.

On the other hand, most MMUs allow you to allocate stuff with 4k density, which 
should be ok.

> Also, what is this for? Can we not simply flush/invalidate the caches?
> 
> 
>  flush/invalidate can be racy for some hardware. Sometimes we need to write
> some field to DMA descriptor and then read another one. And because one
> cannot flush/invalidate individual bytes write/flush can destroy the field
> updated by hardware. (Well, we can invalidate/read before write/flush but
> that introduces a race).

But that's shitty hardware. Do you really need to do it? Where? I fixed similar 
issue in fec_mxc.c recently.

> 
> Regards, Ilya.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [U-Boot] [PATCH] ARM: support for cache coherent allocations
  2012-06-18 23:37     ` Marek Vasut
@ 2012-06-19 14:32       ` Ilya Yanok
  0 siblings, 0 replies; 7+ messages in thread
From: Ilya Yanok @ 2012-06-19 14:32 UTC (permalink / raw)
  To: u-boot

Hi Marek,

On Tue, Jun 19, 2012 at 3:37 AM, Marek Vasut <marek.vasut@gmail.com> wrote:

>
> Kind of ... I mean rather insert an entry into MMU table at runtime that
> says
> "this region is uncached". But that'd need some hack in the mallocator now
> that
> I think about it. It might not be as simple as I thought at first.
>

Hm. I don't quite understand. Do you mean patch malloc code to handle
turning caching off for the region? I doubt that's a good approach. I would
make the malloc calling code to handle caching.
How about something like this (pseudocode):

if (/* have enough room in space allocated previously*/ )
     /* update existing allocation data and return */

newp = malloc_align(ALIGN(size, min_size), min_size);
/* Allocate both size and offset aligned block.
 * min_size is minimal block size for which cache can be turned off, on ARM
currently it's 1MB
 * malloc_align will return block starting on aligned address, I don't
think current allocator has some support for this so effectively we will
have to allocate ALIGN(size, min_size) + min_size - 1 bytes
 */

/* Patch pagetable (on ARM) or do whatever needed to make allocated block
uncached */
/* Save data about unused space in the block (in case we allocated more
than was requested) */

return newp;

Existing allocation data could be as simple pair of static variables. This
is not optimal in some scenarios but very simple.

On the other hand, most MMUs allow you to allocate stuff with 4k density,
> which
> should be ok.
>

Hm, I don't really feel like rebuilding page tables on the fly in U-Boot is
a good idea. Currently U-Boot ARM uses page table with 1MB granularity.


> > Also, what is this for? Can we not simply flush/invalidate the caches?
> >
> >
> >  flush/invalidate can be racy for some hardware. Sometimes we need to
> write
> > some field to DMA descriptor and then read another one. And because one
> > cannot flush/invalidate individual bytes write/flush can destroy the
> field
> > updated by hardware. (Well, we can invalidate/read before write/flush but
> > that introduces a race).
>
> But that's shitty hardware. Do you really need to do it? Where? I fixed
> similar
> issue in fec_mxc.c recently.
>

CPSW switch on TI AM33x. Here is the code:

         /* not the first packet - enqueue at the tail */
        prev = chan->tail;
        desc_write(prev, hw_next, desc);
        chan->tail = desc;

        /* next check if EOQ has been triggered already */
        if (desc_read(prev, hw_mode) & CPDMA_DESC_EOQ)
                chan_write(chan, hdp, desc);


desc_{read/write} are just readl/writel on fields of desc structure. Note
that one have to flush for desc_write() to reach the hardware but as fields
of the DMA descriptor are going to be in the same cache line this flush
will harm other fields that are probably updated by the hardware. Well, I
have to say I never seen a manifestation of this problem in the wild and
probably hardware somehow takes care of this situation but we can't be sure.

Regards, Ilya.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-06-19 14:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-30 21:41 [U-Boot] [PATCH] ARM: support for cache coherent allocations Ilya Yanok
2012-06-14 15:13 ` Ilya Yanok
2012-06-15 17:34   ` Tom Rini
2012-06-15 22:29 ` Marek Vasut
2012-06-18 18:15   ` Ilya Yanok
2012-06-18 23:37     ` Marek Vasut
2012-06-19 14:32       ` Ilya Yanok

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.