[U-Boot] ARM

* [U-Boot] ARM - cache and alignment
@ 2017-01-16 13:29 Jean-Jacques Hiblot
  2017-01-16 16:00 ` Marek Vasut
  2017-01-20  3:36 ` Tom Rini
  0 siblings, 2 replies; 12+ messages in thread
From: Jean-Jacques Hiblot @ 2017-01-16 13:29 UTC (permalink / raw)
  To: u-boot

Tom, Marek

At the moment, whenever an unaligned address is used in cache operations 
(invalidate_dcache_range, or flush_dcache_range), the whole request is 
discarded  for am926ejs. for armV7 or armV8 only the aligned part is 
maintained. This is probably what is causing the bug addressed in 
8133f43d1cd. There are a lot of unaligned buffers used in DMA operations 
and for all of them, we're possibly handling the cached partially or not 
at all. I've seen this when using the environment from a file stored in 
a FAT partition. commit 8133f43d1cd addresses this by using a bounce 
buffer at the FAT level but it's only one of many cases.

I think we can do better with unaligned cache operations:

* flush (writeback + invalidate): Suppose we use address p which is 
unaligned, flush_dcache_range() can do the writeback+invalidate on the 
whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There 
should no problem with that since writeback can happen at any point in time.

* invalidation

It is a bit trickier. here is a pseudo-code:
invalidate_dcache_range(p,length)
{
          write_back_invalidate(first line)
          write_back_invalidate(last line)
          invalidate(all other lines)
}

Here again this should work fine IF invalidate_dcache_range() is called 
BEFORE the DMA operation (again the writeback can happen at time so it's 
valid do it here). Calling it only AFTER the operation, may corrupt the 
data written by the DMA with old data from CPU. This how I used to 
handle unaligned buffers in some other projects.

There is however one loophole: a data sitting in the first or the last 
line is accessed before the memory is updated by the DMA, then the 
first/line will be corrupted. But it's not highly probable as this data 
would have to be used in parallel of the DMA (interrupt handling, SMP?, 
dma mgt related variable). So it's not perfect but it would still be 
better than we have today.

cheers,

Jean-Jacques

^ permalink raw reply	[flat|nested] 12+ messages in thread