All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] ARM - cache and alignment
@ 2017-01-16 13:29 Jean-Jacques Hiblot
  2017-01-16 16:00 ` Marek Vasut
  2017-01-20  3:36 ` Tom Rini
  0 siblings, 2 replies; 12+ messages in thread
From: Jean-Jacques Hiblot @ 2017-01-16 13:29 UTC (permalink / raw)
  To: u-boot

Tom, Marek

At the moment, whenever an unaligned address is used in cache operations 
(invalidate_dcache_range, or flush_dcache_range), the whole request is 
discarded  for am926ejs. for armV7 or armV8 only the aligned part is 
maintained. This is probably what is causing the bug addressed in 
8133f43d1cd. There are a lot of unaligned buffers used in DMA operations 
and for all of them, we're possibly handling the cached partially or not 
at all. I've seen this when using the environment from a file stored in 
a FAT partition. commit 8133f43d1cd addresses this by using a bounce 
buffer at the FAT level but it's only one of many cases.

I think we can do better with unaligned cache operations:

* flush (writeback + invalidate): Suppose we use address p which is 
unaligned, flush_dcache_range() can do the writeback+invalidate on the 
whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There 
should no problem with that since writeback can happen at any point in time.

* invalidation

It is a bit trickier. here is a pseudo-code:
invalidate_dcache_range(p,length)
{
          write_back_invalidate(first line)
          write_back_invalidate(last line)
          invalidate(all other lines)
}

Here again this should work fine IF invalidate_dcache_range() is called 
BEFORE the DMA operation (again the writeback can happen at time so it's 
valid do it here). Calling it only AFTER the operation, may corrupt the 
data written by the DMA with old data from CPU. This how I used to 
handle unaligned buffers in some other projects.


There is however one loophole: a data sitting in the first or the last 
line is accessed before the memory is updated by the DMA, then the 
first/line will be corrupted. But it's not highly probable as this data 
would have to be used in parallel of the DMA (interrupt handling, SMP?, 
dma mgt related variable). So it's not perfect but it would still be 
better than we have today.

cheers,

Jean-Jacques

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-16 13:29 [U-Boot] ARM - cache and alignment Jean-Jacques Hiblot
@ 2017-01-16 16:00 ` Marek Vasut
  2017-01-16 19:16   ` Jean-Jacques Hiblot
  2017-01-20  3:36 ` Tom Rini
  1 sibling, 1 reply; 12+ messages in thread
From: Marek Vasut @ 2017-01-16 16:00 UTC (permalink / raw)
  To: u-boot

On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
> Tom, Marek

Hi,

> At the moment, whenever an unaligned address is used in cache operations
> (invalidate_dcache_range, or flush_dcache_range), the whole request is
> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
> maintained. This is probably what is causing the bug addressed in
> 8133f43d1cd. There are a lot of unaligned buffers used in DMA operations
> and for all of them, we're possibly handling the cached partially or not
> at all. I've seen this when using the environment from a file stored in
> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
> buffer at the FAT level but it's only one of many cases.
> 
> I think we can do better with unaligned cache operations:
> 
> * flush (writeback + invalidate): Suppose we use address p which is
> unaligned, flush_dcache_range() can do the writeback+invalidate on the
> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
> should no problem with that since writeback can happen at any point in
> time.
> 
> * invalidation
> 
> It is a bit trickier. here is a pseudo-code:
> invalidate_dcache_range(p,length)
> {
>          write_back_invalidate(first line)
>          write_back_invalidate(last line)
>          invalidate(all other lines)
> }
> 
> Here again this should work fine IF invalidate_dcache_range() is called
> BEFORE the DMA operation (again the writeback can happen at time so it's
> valid do it here). Calling it only AFTER the operation, may corrupt the
> data written by the DMA with old data from CPU. This how I used to
> handle unaligned buffers in some other projects.
> 
> 
> There is however one loophole: a data sitting in the first or the last
> line is accessed before the memory is updated by the DMA, then the
> first/line will be corrupted. But it's not highly probable as this data
> would have to be used in parallel of the DMA (interrupt handling, SMP?,
> dma mgt related variable). So it's not perfect but it would still be
> better than we have today.

Or just fix all the code which complains about unaligned buffers, done.
That's the way to go without all the complications above.

-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-16 16:00 ` Marek Vasut
@ 2017-01-16 19:16   ` Jean-Jacques Hiblot
  2017-01-16 19:33     ` Marek Vasut
  0 siblings, 1 reply; 12+ messages in thread
From: Jean-Jacques Hiblot @ 2017-01-16 19:16 UTC (permalink / raw)
  To: u-boot



On 16/01/2017 17:00, Marek Vasut wrote:
> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>> Tom, Marek
> Hi,
>
>> At the moment, whenever an unaligned address is used in cache operations
>> (invalidate_dcache_range, or flush_dcache_range), the whole request is
>> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
>> maintained. This is probably what is causing the bug addressed in
>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA operations
>> and for all of them, we're possibly handling the cached partially or not
>> at all. I've seen this when using the environment from a file stored in
>> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
>> buffer at the FAT level but it's only one of many cases.
>>
>> I think we can do better with unaligned cache operations:
>>
>> * flush (writeback + invalidate): Suppose we use address p which is
>> unaligned, flush_dcache_range() can do the writeback+invalidate on the
>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
>> should no problem with that since writeback can happen at any point in
>> time.
>>
>> * invalidation
>>
>> It is a bit trickier. here is a pseudo-code:
>> invalidate_dcache_range(p,length)
>> {
>>           write_back_invalidate(first line)
>>           write_back_invalidate(last line)
>>           invalidate(all other lines)
>> }
>>
>> Here again this should work fine IF invalidate_dcache_range() is called
>> BEFORE the DMA operation (again the writeback can happen at time so it's
>> valid do it here). Calling it only AFTER the operation, may corrupt the
>> data written by the DMA with old data from CPU. This how I used to
>> handle unaligned buffers in some other projects.
>>
>>
>> There is however one loophole: a data sitting in the first or the last
>> line is accessed before the memory is updated by the DMA, then the
>> first/line will be corrupted. But it's not highly probable as this data
>> would have to be used in parallel of the DMA (interrupt handling, SMP?,
>> dma mgt related variable). So it's not perfect but it would still be
>> better than we have today.
> Or just fix all the code which complains about unaligned buffers, done.
> That's the way to go without all the complications above.
It's not that complex, but it's not perfect. We would need to keep the 
same warning as we have now, but it would make it work in more cases.

Tracking every possible unaligned buffer that gets invalidated is not a 
trivial job. Most of the time the buffer is allocated in a upper layer 
and passed down to a driver via layers like network stack, block layer 
etc.And in many cases, the warning will come and go depending on how the 
variable aligned on the stack or the heap.

>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-16 19:16   ` Jean-Jacques Hiblot
@ 2017-01-16 19:33     ` Marek Vasut
  2017-01-17  9:08       ` Jean-Jacques Hiblot
  0 siblings, 1 reply; 12+ messages in thread
From: Marek Vasut @ 2017-01-16 19:33 UTC (permalink / raw)
  To: u-boot

On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
> 
> 
> On 16/01/2017 17:00, Marek Vasut wrote:
>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>> Tom, Marek
>> Hi,
>>
>>> At the moment, whenever an unaligned address is used in cache operations
>>> (invalidate_dcache_range, or flush_dcache_range), the whole request is
>>> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
>>> maintained. This is probably what is causing the bug addressed in
>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA operations
>>> and for all of them, we're possibly handling the cached partially or not
>>> at all. I've seen this when using the environment from a file stored in
>>> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
>>> buffer at the FAT level but it's only one of many cases.
>>>
>>> I think we can do better with unaligned cache operations:
>>>
>>> * flush (writeback + invalidate): Suppose we use address p which is
>>> unaligned, flush_dcache_range() can do the writeback+invalidate on the
>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
>>> should no problem with that since writeback can happen at any point in
>>> time.
>>>
>>> * invalidation
>>>
>>> It is a bit trickier. here is a pseudo-code:
>>> invalidate_dcache_range(p,length)
>>> {
>>>           write_back_invalidate(first line)
>>>           write_back_invalidate(last line)
>>>           invalidate(all other lines)
>>> }
>>>
>>> Here again this should work fine IF invalidate_dcache_range() is called
>>> BEFORE the DMA operation (again the writeback can happen at time so it's
>>> valid do it here). Calling it only AFTER the operation, may corrupt the
>>> data written by the DMA with old data from CPU. This how I used to
>>> handle unaligned buffers in some other projects.
>>>
>>>
>>> There is however one loophole: a data sitting in the first or the last
>>> line is accessed before the memory is updated by the DMA, then the
>>> first/line will be corrupted. But it's not highly probable as this data
>>> would have to be used in parallel of the DMA (interrupt handling, SMP?,
>>> dma mgt related variable). So it's not perfect but it would still be
>>> better than we have today.
>> Or just fix all the code which complains about unaligned buffers, done.
>> That's the way to go without all the complications above.
> It's not that complex, but it's not perfect. We would need to keep the
> same warning as we have now, but it would make it work in more cases.

The warning is there for that exact reason -- to inform you something's
wrong.

> Tracking every possible unaligned buffer that gets invalidated is not a
> trivial job. Most of the time the buffer is allocated in a upper layer
> and passed down to a driver via layers like network stack, block layer
> etc.And in many cases, the warning will come and go depending on how the
> variable aligned on the stack or the heap.

I didn't observe this much in fact. I usually see the buffers coming it
aligned or being allocated in drivers. Also, I think that's why the RC
cycle is there, so we can test the next release and fix these issues.


-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-16 19:33     ` Marek Vasut
@ 2017-01-17  9:08       ` Jean-Jacques Hiblot
  2017-01-17  9:15         ` Marek Vasut
  0 siblings, 1 reply; 12+ messages in thread
From: Jean-Jacques Hiblot @ 2017-01-17  9:08 UTC (permalink / raw)
  To: u-boot



On 16/01/2017 20:33, Marek Vasut wrote:
> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>
>> On 16/01/2017 17:00, Marek Vasut wrote:
>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>> Tom, Marek
>>> Hi,
>>>
>>>> At the moment, whenever an unaligned address is used in cache operations
>>>> (invalidate_dcache_range, or flush_dcache_range), the whole request is
>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
>>>> maintained. This is probably what is causing the bug addressed in
>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA operations
>>>> and for all of them, we're possibly handling the cached partially or not
>>>> at all. I've seen this when using the environment from a file stored in
>>>> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
>>>> buffer at the FAT level but it's only one of many cases.
>>>>
>>>> I think we can do better with unaligned cache operations:
>>>>
>>>> * flush (writeback + invalidate): Suppose we use address p which is
>>>> unaligned, flush_dcache_range() can do the writeback+invalidate on the
>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
>>>> should no problem with that since writeback can happen at any point in
>>>> time.
>>>>
>>>> * invalidation
>>>>
>>>> It is a bit trickier. here is a pseudo-code:
>>>> invalidate_dcache_range(p,length)
>>>> {
>>>>            write_back_invalidate(first line)
>>>>            write_back_invalidate(last line)
>>>>            invalidate(all other lines)
>>>> }
>>>>
>>>> Here again this should work fine IF invalidate_dcache_range() is called
>>>> BEFORE the DMA operation (again the writeback can happen at time so it's
>>>> valid do it here). Calling it only AFTER the operation, may corrupt the
>>>> data written by the DMA with old data from CPU. This how I used to
>>>> handle unaligned buffers in some other projects.
>>>>
>>>>
>>>> There is however one loophole: a data sitting in the first or the last
>>>> line is accessed before the memory is updated by the DMA, then the
>>>> first/line will be corrupted. But it's not highly probable as this data
>>>> would have to be used in parallel of the DMA (interrupt handling, SMP?,
>>>> dma mgt related variable). So it's not perfect but it would still be
>>>> better than we have today.
>>> Or just fix all the code which complains about unaligned buffers, done.
>>> That's the way to go without all the complications above.
>> It's not that complex, but it's not perfect. We would need to keep the
>> same warning as we have now, but it would make it work in more cases.
> The warning is there for that exact reason -- to inform you something's
> wrong.
>
>> Tracking every possible unaligned buffer that gets invalidated is not a
>> trivial job. Most of the time the buffer is allocated in a upper layer
>> and passed down to a driver via layers like network stack, block layer
>> etc.And in many cases, the warning will come and go depending on how the
>> variable aligned on the stack or the heap.
> I didn't observe this much in fact. I usually see the buffers coming it
> aligned or being allocated in drivers. Also, I think that's why the RC
> cycle is there, so we can test the next release and fix these issues.
It's not commonly seen but I came across it some times.

Here are two examples:

Network:
U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
CPU  : DRA752-HS ES2.0
Model: TI DRA742
[...]
Booting from network ...
cpsw Waiting for PHY auto negotiation to complete.... done
link up on port 0, speed 1000, full duplex
BOOTP broadcast 1
CACHE: Misaligned operation at range [dffecb40, dffecc96]
CACHE: Misaligned operation at range [dffed140, dffed17e]
BOOTP broadcast 2
[...]
File transfer via NFS from server 10.0.1.26; our IP address is 128.247.83.128; sending through gateway 128.247.82.1
[...]
Load address: 0x82000000
Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]


FAT: it has been fixed recently by using a bounce buffer in FAT code by "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
I've also seen it a few years back on PowerPC platforms when accessing a USB storage.

I'm sure that we could find more examples. I don't mean that they shouldn't be fixed, simply that we could still make things work in most cases at a low cost.
The modifications I proposed do not change the behaviour of the code for aligned buffers, it just make it much more likely that it would work with unaligned buffers. I fail to see the reason why this wouldn't be a good thing.

BTW the L2 uniphier cache implements this for flush and invalidation, and some architectures (pxa, blackfin, openrisc, etc.) do the flush() this way too.

Jean-Jacques

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-17  9:08       ` Jean-Jacques Hiblot
@ 2017-01-17  9:15         ` Marek Vasut
  2017-01-17  9:35           ` Jean-Jacques Hiblot
  0 siblings, 1 reply; 12+ messages in thread
From: Marek Vasut @ 2017-01-17  9:15 UTC (permalink / raw)
  To: u-boot

On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
> 
> 
> On 16/01/2017 20:33, Marek Vasut wrote:
>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>
>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>> Tom, Marek
>>>> Hi,
>>>>
>>>>> At the moment, whenever an unaligned address is used in cache
>>>>> operations
>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole request is
>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
>>>>> maintained. This is probably what is causing the bug addressed in
>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>> operations
>>>>> and for all of them, we're possibly handling the cached partially
>>>>> or not
>>>>> at all. I've seen this when using the environment from a file
>>>>> stored in
>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>
>>>>> I think we can do better with unaligned cache operations:
>>>>>
>>>>> * flush (writeback + invalidate): Suppose we use address p which is
>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate on the
>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
>>>>> should no problem with that since writeback can happen at any point in
>>>>> time.
>>>>>
>>>>> * invalidation
>>>>>
>>>>> It is a bit trickier. here is a pseudo-code:
>>>>> invalidate_dcache_range(p,length)
>>>>> {
>>>>>            write_back_invalidate(first line)
>>>>>            write_back_invalidate(last line)
>>>>>            invalidate(all other lines)
>>>>> }
>>>>>
>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>> called
>>>>> BEFORE the DMA operation (again the writeback can happen at time so
>>>>> it's
>>>>> valid do it here). Calling it only AFTER the operation, may corrupt
>>>>> the
>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>> handle unaligned buffers in some other projects.
>>>>>
>>>>>
>>>>> There is however one loophole: a data sitting in the first or the last
>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>> data
>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>> SMP?,
>>>>> dma mgt related variable). So it's not perfect but it would still be
>>>>> better than we have today.
>>>> Or just fix all the code which complains about unaligned buffers, done.
>>>> That's the way to go without all the complications above.
>>> It's not that complex, but it's not perfect. We would need to keep the
>>> same warning as we have now, but it would make it work in more cases.
>> The warning is there for that exact reason -- to inform you something's
>> wrong.
>>
>>> Tracking every possible unaligned buffer that gets invalidated is not a
>>> trivial job. Most of the time the buffer is allocated in a upper layer
>>> and passed down to a driver via layers like network stack, block layer
>>> etc.And in many cases, the warning will come and go depending on how the
>>> variable aligned on the stack or the heap.
>> I didn't observe this much in fact. I usually see the buffers coming it
>> aligned or being allocated in drivers. Also, I think that's why the RC
>> cycle is there, so we can test the next release and fix these issues.
> It's not commonly seen but I came across it some times.
> 
> Here are two examples:
> 
> Network:
> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
> CPU  : DRA752-HS ES2.0
> Model: TI DRA742
> [...]
> Booting from network ...
> cpsw Waiting for PHY auto negotiation to complete.... done
> link up on port 0, speed 1000, full duplex
> BOOTP broadcast 1
> CACHE: Misaligned operation at range [dffecb40, dffecc96]
> CACHE: Misaligned operation at range [dffed140, dffed17e]
> BOOTP broadcast 2
> [...]
> File transfer via NFS from server 10.0.1.26; our IP address is
> 128.247.83.128; sending through gateway 128.247.82.1
> [...]
> Load address: 0x82000000
> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
> 
> 
> FAT: it has been fixed recently by using a bounce buffer in FAT code by
> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
> I've also seen it a few years back on PowerPC platforms when accessing a
> USB storage.
> 
> I'm sure that we could find more examples. I don't mean that they
> shouldn't be fixed, simply that we could still make things work in most
> cases at a low cost.
> The modifications I proposed do not change the behaviour of the code for
> aligned buffers, it just make it much more likely that it would work
> with unaligned buffers. I fail to see the reason why this wouldn't be a
> good thing.

It seems to me like "it kinda works, but it really doesn't sometimes
...." , so it's encouraging bad practice as people will get used to
ignoring this warning because things "kinda work, in most cases".

> BTW the L2 uniphier cache implements this for flush and invalidation,
> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
> this way too.
> 
> Jean-Jacques
> 


-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-17  9:15         ` Marek Vasut
@ 2017-01-17  9:35           ` Jean-Jacques Hiblot
  2017-01-17  9:38             ` Marek Vasut
  0 siblings, 1 reply; 12+ messages in thread
From: Jean-Jacques Hiblot @ 2017-01-17  9:35 UTC (permalink / raw)
  To: u-boot



On 17/01/2017 10:15, Marek Vasut wrote:
> On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
>>
>> On 16/01/2017 20:33, Marek Vasut wrote:
>>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>>> Tom, Marek
>>>>> Hi,
>>>>>
>>>>>> At the moment, whenever an unaligned address is used in cache
>>>>>> operations
>>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole request is
>>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
>>>>>> maintained. This is probably what is causing the bug addressed in
>>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>>> operations
>>>>>> and for all of them, we're possibly handling the cached partially
>>>>>> or not
>>>>>> at all. I've seen this when using the environment from a file
>>>>>> stored in
>>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
>>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>>
>>>>>> I think we can do better with unaligned cache operations:
>>>>>>
>>>>>> * flush (writeback + invalidate): Suppose we use address p which is
>>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate on the
>>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
>>>>>> should no problem with that since writeback can happen at any point in
>>>>>> time.
>>>>>>
>>>>>> * invalidation
>>>>>>
>>>>>> It is a bit trickier. here is a pseudo-code:
>>>>>> invalidate_dcache_range(p,length)
>>>>>> {
>>>>>>             write_back_invalidate(first line)
>>>>>>             write_back_invalidate(last line)
>>>>>>             invalidate(all other lines)
>>>>>> }
>>>>>>
>>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>>> called
>>>>>> BEFORE the DMA operation (again the writeback can happen at time so
>>>>>> it's
>>>>>> valid do it here). Calling it only AFTER the operation, may corrupt
>>>>>> the
>>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>>> handle unaligned buffers in some other projects.
>>>>>>
>>>>>>
>>>>>> There is however one loophole: a data sitting in the first or the last
>>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>>> data
>>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>>> SMP?,
>>>>>> dma mgt related variable). So it's not perfect but it would still be
>>>>>> better than we have today.
>>>>> Or just fix all the code which complains about unaligned buffers, done.
>>>>> That's the way to go without all the complications above.
>>>> It's not that complex, but it's not perfect. We would need to keep the
>>>> same warning as we have now, but it would make it work in more cases.
>>> The warning is there for that exact reason -- to inform you something's
>>> wrong.
>>>
>>>> Tracking every possible unaligned buffer that gets invalidated is not a
>>>> trivial job. Most of the time the buffer is allocated in a upper layer
>>>> and passed down to a driver via layers like network stack, block layer
>>>> etc.And in many cases, the warning will come and go depending on how the
>>>> variable aligned on the stack or the heap.
>>> I didn't observe this much in fact. I usually see the buffers coming it
>>> aligned or being allocated in drivers. Also, I think that's why the RC
>>> cycle is there, so we can test the next release and fix these issues.
>> It's not commonly seen but I came across it some times.
>>
>> Here are two examples:
>>
>> Network:
>> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
>> CPU  : DRA752-HS ES2.0
>> Model: TI DRA742
>> [...]
>> Booting from network ...
>> cpsw Waiting for PHY auto negotiation to complete.... done
>> link up on port 0, speed 1000, full duplex
>> BOOTP broadcast 1
>> CACHE: Misaligned operation at range [dffecb40, dffecc96]
>> CACHE: Misaligned operation at range [dffed140, dffed17e]
>> BOOTP broadcast 2
>> [...]
>> File transfer via NFS from server 10.0.1.26; our IP address is
>> 128.247.83.128; sending through gateway 128.247.82.1
>> [...]
>> Load address: 0x82000000
>> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
>>
>>
>> FAT: it has been fixed recently by using a bounce buffer in FAT code by
>> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
>> I've also seen it a few years back on PowerPC platforms when accessing a
>> USB storage.
>>
>> I'm sure that we could find more examples. I don't mean that they
>> shouldn't be fixed, simply that we could still make things work in most
>> cases at a low cost.
>> The modifications I proposed do not change the behaviour of the code for
>> aligned buffers, it just make it much more likely that it would work
>> with unaligned buffers. I fail to see the reason why this wouldn't be a
>> good thing.
> It seems to me like "it kinda works, but it really doesn't sometimes
> ...." , so it's encouraging bad practice as people will get used to
> ignoring this warning because things "kinda work, in most cases".
I see your point but the current situation is exactly like that: it 
works most of the time.
I'm not pushing for the changes, I just wanted to let know that we could 
do better if we wish.

>> BTW the L2 uniphier cache implements this for flush and invalidation,
>> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
>> this way too.
>>
>> Jean-Jacques
>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-17  9:35           ` Jean-Jacques Hiblot
@ 2017-01-17  9:38             ` Marek Vasut
  2017-01-17  9:51               ` Jean-Jacques Hiblot
  0 siblings, 1 reply; 12+ messages in thread
From: Marek Vasut @ 2017-01-17  9:38 UTC (permalink / raw)
  To: u-boot

On 01/17/2017 10:35 AM, Jean-Jacques Hiblot wrote:
> 
> 
> On 17/01/2017 10:15, Marek Vasut wrote:
>> On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
>>>
>>> On 16/01/2017 20:33, Marek Vasut wrote:
>>>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>>>> Tom, Marek
>>>>>> Hi,
>>>>>>
>>>>>>> At the moment, whenever an unaligned address is used in cache
>>>>>>> operations
>>>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole
>>>>>>> request is
>>>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
>>>>>>> maintained. This is probably what is causing the bug addressed in
>>>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>>>> operations
>>>>>>> and for all of them, we're possibly handling the cached partially
>>>>>>> or not
>>>>>>> at all. I've seen this when using the environment from a file
>>>>>>> stored in
>>>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
>>>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>>>
>>>>>>> I think we can do better with unaligned cache operations:
>>>>>>>
>>>>>>> * flush (writeback + invalidate): Suppose we use address p which is
>>>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate
>>>>>>> on the
>>>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
>>>>>>> should no problem with that since writeback can happen at any
>>>>>>> point in
>>>>>>> time.
>>>>>>>
>>>>>>> * invalidation
>>>>>>>
>>>>>>> It is a bit trickier. here is a pseudo-code:
>>>>>>> invalidate_dcache_range(p,length)
>>>>>>> {
>>>>>>>             write_back_invalidate(first line)
>>>>>>>             write_back_invalidate(last line)
>>>>>>>             invalidate(all other lines)
>>>>>>> }
>>>>>>>
>>>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>>>> called
>>>>>>> BEFORE the DMA operation (again the writeback can happen at time so
>>>>>>> it's
>>>>>>> valid do it here). Calling it only AFTER the operation, may corrupt
>>>>>>> the
>>>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>>>> handle unaligned buffers in some other projects.
>>>>>>>
>>>>>>>
>>>>>>> There is however one loophole: a data sitting in the first or the
>>>>>>> last
>>>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>>>> data
>>>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>>>> SMP?,
>>>>>>> dma mgt related variable). So it's not perfect but it would still be
>>>>>>> better than we have today.
>>>>>> Or just fix all the code which complains about unaligned buffers,
>>>>>> done.
>>>>>> That's the way to go without all the complications above.
>>>>> It's not that complex, but it's not perfect. We would need to keep the
>>>>> same warning as we have now, but it would make it work in more cases.
>>>> The warning is there for that exact reason -- to inform you something's
>>>> wrong.
>>>>
>>>>> Tracking every possible unaligned buffer that gets invalidated is
>>>>> not a
>>>>> trivial job. Most of the time the buffer is allocated in a upper layer
>>>>> and passed down to a driver via layers like network stack, block layer
>>>>> etc.And in many cases, the warning will come and go depending on
>>>>> how the
>>>>> variable aligned on the stack or the heap.
>>>> I didn't observe this much in fact. I usually see the buffers coming it
>>>> aligned or being allocated in drivers. Also, I think that's why the RC
>>>> cycle is there, so we can test the next release and fix these issues.
>>> It's not commonly seen but I came across it some times.
>>>
>>> Here are two examples:
>>>
>>> Network:
>>> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
>>> CPU  : DRA752-HS ES2.0
>>> Model: TI DRA742
>>> [...]
>>> Booting from network ...
>>> cpsw Waiting for PHY auto negotiation to complete.... done
>>> link up on port 0, speed 1000, full duplex
>>> BOOTP broadcast 1
>>> CACHE: Misaligned operation at range [dffecb40, dffecc96]
>>> CACHE: Misaligned operation at range [dffed140, dffed17e]
>>> BOOTP broadcast 2
>>> [...]
>>> File transfer via NFS from server 10.0.1.26; our IP address is
>>> 128.247.83.128; sending through gateway 128.247.82.1
>>> [...]
>>> Load address: 0x82000000
>>> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
>>>
>>>
>>> FAT: it has been fixed recently by using a bounce buffer in FAT code by
>>> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
>>> I've also seen it a few years back on PowerPC platforms when accessing a
>>> USB storage.
>>>
>>> I'm sure that we could find more examples. I don't mean that they
>>> shouldn't be fixed, simply that we could still make things work in most
>>> cases at a low cost.
>>> The modifications I proposed do not change the behaviour of the code for
>>> aligned buffers, it just make it much more likely that it would work
>>> with unaligned buffers. I fail to see the reason why this wouldn't be a
>>> good thing.
>> It seems to me like "it kinda works, but it really doesn't sometimes
>> ...." , so it's encouraging bad practice as people will get used to
>> ignoring this warning because things "kinda work, in most cases".
> I see your point but the current situation is exactly like that: it
> works most of the time.

But in this case, it's a property of the hardware. With your patch, it's
also a property of the code which works "most of the time". You
even state that in the patch description.

> I'm not pushing for the changes, I just wanted to let know that we could
> do better if we wish.

Replacing one problem with another which looks like it isn't problem
anymore (but still is) is not better IMO.

>>> BTW the L2 uniphier cache implements this for flush and invalidation,
>>> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
>>> this way too.
>>>
>>> Jean-Jacques
>>>
>>
> 


-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-17  9:38             ` Marek Vasut
@ 2017-01-17  9:51               ` Jean-Jacques Hiblot
  2017-01-17  9:54                 ` Marek Vasut
  0 siblings, 1 reply; 12+ messages in thread
From: Jean-Jacques Hiblot @ 2017-01-17  9:51 UTC (permalink / raw)
  To: u-boot



On 17/01/2017 10:38, Marek Vasut wrote:
> On 01/17/2017 10:35 AM, Jean-Jacques Hiblot wrote:
>>
>> On 17/01/2017 10:15, Marek Vasut wrote:
>>> On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
>>>> On 16/01/2017 20:33, Marek Vasut wrote:
>>>>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>>>>> Tom, Marek
>>>>>>> Hi,
>>>>>>>
>>>>>>>> At the moment, whenever an unaligned address is used in cache
>>>>>>>> operations
>>>>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole
>>>>>>>> request is
>>>>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned part is
>>>>>>>> maintained. This is probably what is causing the bug addressed in
>>>>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>>>>> operations
>>>>>>>> and for all of them, we're possibly handling the cached partially
>>>>>>>> or not
>>>>>>>> at all. I've seen this when using the environment from a file
>>>>>>>> stored in
>>>>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a bounce
>>>>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>>>>
>>>>>>>> I think we can do better with unaligned cache operations:
>>>>>>>>
>>>>>>>> * flush (writeback + invalidate): Suppose we use address p which is
>>>>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate
>>>>>>>> on the
>>>>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)]. There
>>>>>>>> should no problem with that since writeback can happen at any
>>>>>>>> point in
>>>>>>>> time.
>>>>>>>>
>>>>>>>> * invalidation
>>>>>>>>
>>>>>>>> It is a bit trickier. here is a pseudo-code:
>>>>>>>> invalidate_dcache_range(p,length)
>>>>>>>> {
>>>>>>>>              write_back_invalidate(first line)
>>>>>>>>              write_back_invalidate(last line)
>>>>>>>>              invalidate(all other lines)
>>>>>>>> }
>>>>>>>>
>>>>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>>>>> called
>>>>>>>> BEFORE the DMA operation (again the writeback can happen at time so
>>>>>>>> it's
>>>>>>>> valid do it here). Calling it only AFTER the operation, may corrupt
>>>>>>>> the
>>>>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>>>>> handle unaligned buffers in some other projects.
>>>>>>>>
>>>>>>>>
>>>>>>>> There is however one loophole: a data sitting in the first or the
>>>>>>>> last
>>>>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>>>>> data
>>>>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>>>>> SMP?,
>>>>>>>> dma mgt related variable). So it's not perfect but it would still be
>>>>>>>> better than we have today.
>>>>>>> Or just fix all the code which complains about unaligned buffers,
>>>>>>> done.
>>>>>>> That's the way to go without all the complications above.
>>>>>> It's not that complex, but it's not perfect. We would need to keep the
>>>>>> same warning as we have now, but it would make it work in more cases.
>>>>> The warning is there for that exact reason -- to inform you something's
>>>>> wrong.
>>>>>
>>>>>> Tracking every possible unaligned buffer that gets invalidated is
>>>>>> not a
>>>>>> trivial job. Most of the time the buffer is allocated in a upper layer
>>>>>> and passed down to a driver via layers like network stack, block layer
>>>>>> etc.And in many cases, the warning will come and go depending on
>>>>>> how the
>>>>>> variable aligned on the stack or the heap.
>>>>> I didn't observe this much in fact. I usually see the buffers coming it
>>>>> aligned or being allocated in drivers. Also, I think that's why the RC
>>>>> cycle is there, so we can test the next release and fix these issues.
>>>> It's not commonly seen but I came across it some times.
>>>>
>>>> Here are two examples:
>>>>
>>>> Network:
>>>> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
>>>> CPU  : DRA752-HS ES2.0
>>>> Model: TI DRA742
>>>> [...]
>>>> Booting from network ...
>>>> cpsw Waiting for PHY auto negotiation to complete.... done
>>>> link up on port 0, speed 1000, full duplex
>>>> BOOTP broadcast 1
>>>> CACHE: Misaligned operation at range [dffecb40, dffecc96]
>>>> CACHE: Misaligned operation at range [dffed140, dffed17e]
>>>> BOOTP broadcast 2
>>>> [...]
>>>> File transfer via NFS from server 10.0.1.26; our IP address is
>>>> 128.247.83.128; sending through gateway 128.247.82.1
>>>> [...]
>>>> Load address: 0x82000000
>>>> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
>>>>
>>>>
>>>> FAT: it has been fixed recently by using a bounce buffer in FAT code by
>>>> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
>>>> I've also seen it a few years back on PowerPC platforms when accessing a
>>>> USB storage.
>>>>
>>>> I'm sure that we could find more examples. I don't mean that they
>>>> shouldn't be fixed, simply that we could still make things work in most
>>>> cases at a low cost.
>>>> The modifications I proposed do not change the behaviour of the code for
>>>> aligned buffers, it just make it much more likely that it would work
>>>> with unaligned buffers. I fail to see the reason why this wouldn't be a
>>>> good thing.
>>> It seems to me like "it kinda works, but it really doesn't sometimes
>>> ...." , so it's encouraging bad practice as people will get used to
>>> ignoring this warning because things "kinda work, in most cases".
>> I see your point but the current situation is exactly like that: it
>> works most of the time.
> But in this case, it's a property of the hardware. With your patch, it's
> also a property of the code which works "most of the time". You
> even state that in the patch description.
I'm not sure I understand your point. But as I said I'm not pushing. 
You're not keen to the idea and you're more knowledgeable than I am in 
u-boot, so be it. I'll drop the subject
Thanks for taking time to discuss it.
>> I'm not pushing for the changes, I just wanted to let know that we could
>> do better if we wish.
> Replacing one problem with another which looks like it isn't problem
> anymore (but still is) is not better IMO.
>
>>>> BTW the L2 uniphier cache implements this for flush and invalidation,
>>>> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
>>>> this way too.
>>>>
>>>> Jean-Jacques
>>>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-17  9:51               ` Jean-Jacques Hiblot
@ 2017-01-17  9:54                 ` Marek Vasut
  2017-01-17 20:54                   ` Benoît Thébaudeau
  0 siblings, 1 reply; 12+ messages in thread
From: Marek Vasut @ 2017-01-17  9:54 UTC (permalink / raw)
  To: u-boot

On 01/17/2017 10:51 AM, Jean-Jacques Hiblot wrote:
> 
> 
> On 17/01/2017 10:38, Marek Vasut wrote:
>> On 01/17/2017 10:35 AM, Jean-Jacques Hiblot wrote:
>>>
>>> On 17/01/2017 10:15, Marek Vasut wrote:
>>>> On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
>>>>> On 16/01/2017 20:33, Marek Vasut wrote:
>>>>>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>>>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>>>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>>>>>> Tom, Marek
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>> At the moment, whenever an unaligned address is used in cache
>>>>>>>>> operations
>>>>>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole
>>>>>>>>> request is
>>>>>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned
>>>>>>>>> part is
>>>>>>>>> maintained. This is probably what is causing the bug addressed in
>>>>>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>>>>>> operations
>>>>>>>>> and for all of them, we're possibly handling the cached partially
>>>>>>>>> or not
>>>>>>>>> at all. I've seen this when using the environment from a file
>>>>>>>>> stored in
>>>>>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a
>>>>>>>>> bounce
>>>>>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>>>>>
>>>>>>>>> I think we can do better with unaligned cache operations:
>>>>>>>>>
>>>>>>>>> * flush (writeback + invalidate): Suppose we use address p
>>>>>>>>> which is
>>>>>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate
>>>>>>>>> on the
>>>>>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)].
>>>>>>>>> There
>>>>>>>>> should no problem with that since writeback can happen at any
>>>>>>>>> point in
>>>>>>>>> time.
>>>>>>>>>
>>>>>>>>> * invalidation
>>>>>>>>>
>>>>>>>>> It is a bit trickier. here is a pseudo-code:
>>>>>>>>> invalidate_dcache_range(p,length)
>>>>>>>>> {
>>>>>>>>>              write_back_invalidate(first line)
>>>>>>>>>              write_back_invalidate(last line)
>>>>>>>>>              invalidate(all other lines)
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>>>>>> called
>>>>>>>>> BEFORE the DMA operation (again the writeback can happen at
>>>>>>>>> time so
>>>>>>>>> it's
>>>>>>>>> valid do it here). Calling it only AFTER the operation, may
>>>>>>>>> corrupt
>>>>>>>>> the
>>>>>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>>>>>> handle unaligned buffers in some other projects.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There is however one loophole: a data sitting in the first or the
>>>>>>>>> last
>>>>>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>>>>>> data
>>>>>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>>>>>> SMP?,
>>>>>>>>> dma mgt related variable). So it's not perfect but it would
>>>>>>>>> still be
>>>>>>>>> better than we have today.
>>>>>>>> Or just fix all the code which complains about unaligned buffers,
>>>>>>>> done.
>>>>>>>> That's the way to go without all the complications above.
>>>>>>> It's not that complex, but it's not perfect. We would need to
>>>>>>> keep the
>>>>>>> same warning as we have now, but it would make it work in more
>>>>>>> cases.
>>>>>> The warning is there for that exact reason -- to inform you
>>>>>> something's
>>>>>> wrong.
>>>>>>
>>>>>>> Tracking every possible unaligned buffer that gets invalidated is
>>>>>>> not a
>>>>>>> trivial job. Most of the time the buffer is allocated in a upper
>>>>>>> layer
>>>>>>> and passed down to a driver via layers like network stack, block
>>>>>>> layer
>>>>>>> etc.And in many cases, the warning will come and go depending on
>>>>>>> how the
>>>>>>> variable aligned on the stack or the heap.
>>>>>> I didn't observe this much in fact. I usually see the buffers
>>>>>> coming it
>>>>>> aligned or being allocated in drivers. Also, I think that's why
>>>>>> the RC
>>>>>> cycle is there, so we can test the next release and fix these issues.
>>>>> It's not commonly seen but I came across it some times.
>>>>>
>>>>> Here are two examples:
>>>>>
>>>>> Network:
>>>>> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
>>>>> CPU  : DRA752-HS ES2.0
>>>>> Model: TI DRA742
>>>>> [...]
>>>>> Booting from network ...
>>>>> cpsw Waiting for PHY auto negotiation to complete.... done
>>>>> link up on port 0, speed 1000, full duplex
>>>>> BOOTP broadcast 1
>>>>> CACHE: Misaligned operation at range [dffecb40, dffecc96]
>>>>> CACHE: Misaligned operation at range [dffed140, dffed17e]
>>>>> BOOTP broadcast 2
>>>>> [...]
>>>>> File transfer via NFS from server 10.0.1.26; our IP address is
>>>>> 128.247.83.128; sending through gateway 128.247.82.1
>>>>> [...]
>>>>> Load address: 0x82000000
>>>>> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
>>>>>
>>>>>
>>>>> FAT: it has been fixed recently by using a bounce buffer in FAT
>>>>> code by
>>>>> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
>>>>> I've also seen it a few years back on PowerPC platforms when
>>>>> accessing a
>>>>> USB storage.
>>>>>
>>>>> I'm sure that we could find more examples. I don't mean that they
>>>>> shouldn't be fixed, simply that we could still make things work in
>>>>> most
>>>>> cases at a low cost.
>>>>> The modifications I proposed do not change the behaviour of the
>>>>> code for
>>>>> aligned buffers, it just make it much more likely that it would work
>>>>> with unaligned buffers. I fail to see the reason why this wouldn't
>>>>> be a
>>>>> good thing.
>>>> It seems to me like "it kinda works, but it really doesn't sometimes
>>>> ...." , so it's encouraging bad practice as people will get used to
>>>> ignoring this warning because things "kinda work, in most cases".
>>> I see your point but the current situation is exactly like that: it
>>> works most of the time.
>> But in this case, it's a property of the hardware. With your patch, it's
>> also a property of the code which works "most of the time". You
>> even state that in the patch description.
> I'm not sure I understand your point.

I'm referring to the last paragraph of the commit message, "There is
however one loophole..."

> But as I said I'm not pushing.
> You're not keen to the idea and you're more knowledgeable than I am in
> u-boot, so be it. I'll drop the subject
> Thanks for taking time to discuss it.

Let's wait for what the others have to say maybe ?

>>> I'm not pushing for the changes, I just wanted to let know that we could
>>> do better if we wish.
>> Replacing one problem with another which looks like it isn't problem
>> anymore (but still is) is not better IMO.
>>
>>>>> BTW the L2 uniphier cache implements this for flush and invalidation,
>>>>> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
>>>>> this way too.
>>>>>
>>>>> Jean-Jacques
>>>>>
>>
> 


-- 
Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-17  9:54                 ` Marek Vasut
@ 2017-01-17 20:54                   ` Benoît Thébaudeau
  0 siblings, 0 replies; 12+ messages in thread
From: Benoît Thébaudeau @ 2017-01-17 20:54 UTC (permalink / raw)
  To: u-boot

Hi,

On Tue, Jan 17, 2017 at 10:54 AM, Marek Vasut <marex@denx.de> wrote:
> On 01/17/2017 10:51 AM, Jean-Jacques Hiblot wrote:
>>
>>
>> On 17/01/2017 10:38, Marek Vasut wrote:
>>> On 01/17/2017 10:35 AM, Jean-Jacques Hiblot wrote:
>>>>
>>>> On 17/01/2017 10:15, Marek Vasut wrote:
>>>>> On 01/17/2017 10:08 AM, Jean-Jacques Hiblot wrote:
>>>>>> On 16/01/2017 20:33, Marek Vasut wrote:
>>>>>>> On 01/16/2017 08:16 PM, Jean-Jacques Hiblot wrote:
>>>>>>>> On 16/01/2017 17:00, Marek Vasut wrote:
>>>>>>>>> On 01/16/2017 02:29 PM, Jean-Jacques Hiblot wrote:
>>>>>>>>>> Tom, Marek
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> At the moment, whenever an unaligned address is used in cache
>>>>>>>>>> operations
>>>>>>>>>> (invalidate_dcache_range, or flush_dcache_range), the whole
>>>>>>>>>> request is
>>>>>>>>>> discarded  for am926ejs. for armV7 or armV8 only the aligned
>>>>>>>>>> part is
>>>>>>>>>> maintained. This is probably what is causing the bug addressed in
>>>>>>>>>> 8133f43d1cd. There are a lot of unaligned buffers used in DMA
>>>>>>>>>> operations
>>>>>>>>>> and for all of them, we're possibly handling the cached partially
>>>>>>>>>> or not
>>>>>>>>>> at all. I've seen this when using the environment from a file
>>>>>>>>>> stored in
>>>>>>>>>> a FAT partition. commit 8133f43d1cd addresses this by using a
>>>>>>>>>> bounce
>>>>>>>>>> buffer at the FAT level but it's only one of many cases.
>>>>>>>>>>
>>>>>>>>>> I think we can do better with unaligned cache operations:
>>>>>>>>>>
>>>>>>>>>> * flush (writeback + invalidate): Suppose we use address p
>>>>>>>>>> which is
>>>>>>>>>> unaligned, flush_dcache_range() can do the writeback+invalidate
>>>>>>>>>> on the
>>>>>>>>>> whole range [p & ~(line_sz - 1); p + length | (line_sz - 1)].
>>>>>>>>>> There
>>>>>>>>>> should no problem with that since writeback can happen at any
>>>>>>>>>> point in
>>>>>>>>>> time.
>>>>>>>>>>
>>>>>>>>>> * invalidation
>>>>>>>>>>
>>>>>>>>>> It is a bit trickier. here is a pseudo-code:
>>>>>>>>>> invalidate_dcache_range(p,length)
>>>>>>>>>> {
>>>>>>>>>>              write_back_invalidate(first line)
>>>>>>>>>>              write_back_invalidate(last line)
>>>>>>>>>>              invalidate(all other lines)
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Here again this should work fine IF invalidate_dcache_range() is
>>>>>>>>>> called
>>>>>>>>>> BEFORE the DMA operation (again the writeback can happen at
>>>>>>>>>> time so
>>>>>>>>>> it's
>>>>>>>>>> valid do it here). Calling it only AFTER the operation, may
>>>>>>>>>> corrupt
>>>>>>>>>> the
>>>>>>>>>> data written by the DMA with old data from CPU. This how I used to
>>>>>>>>>> handle unaligned buffers in some other projects.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> There is however one loophole: a data sitting in the first or the
>>>>>>>>>> last
>>>>>>>>>> line is accessed before the memory is updated by the DMA, then the
>>>>>>>>>> first/line will be corrupted. But it's not highly probable as this
>>>>>>>>>> data
>>>>>>>>>> would have to be used in parallel of the DMA (interrupt handling,
>>>>>>>>>> SMP?,
>>>>>>>>>> dma mgt related variable). So it's not perfect but it would
>>>>>>>>>> still be
>>>>>>>>>> better than we have today.
>>>>>>>>> Or just fix all the code which complains about unaligned buffers,
>>>>>>>>> done.
>>>>>>>>> That's the way to go without all the complications above.
>>>>>>>> It's not that complex, but it's not perfect. We would need to
>>>>>>>> keep the
>>>>>>>> same warning as we have now, but it would make it work in more
>>>>>>>> cases.
>>>>>>> The warning is there for that exact reason -- to inform you
>>>>>>> something's
>>>>>>> wrong.
>>>>>>>
>>>>>>>> Tracking every possible unaligned buffer that gets invalidated is
>>>>>>>> not a
>>>>>>>> trivial job. Most of the time the buffer is allocated in a upper
>>>>>>>> layer
>>>>>>>> and passed down to a driver via layers like network stack, block
>>>>>>>> layer
>>>>>>>> etc.And in many cases, the warning will come and go depending on
>>>>>>>> how the
>>>>>>>> variable aligned on the stack or the heap.
>>>>>>> I didn't observe this much in fact. I usually see the buffers
>>>>>>> coming it
>>>>>>> aligned or being allocated in drivers. Also, I think that's why
>>>>>>> the RC
>>>>>>> cycle is there, so we can test the next release and fix these issues.
>>>>>> It's not commonly seen but I came across it some times.
>>>>>>
>>>>>> Here are two examples:
>>>>>>
>>>>>> Network:
>>>>>> U-Boot 2016.09-rc1-00087-gd40ff0a (Jul 27 2016 - 10:04:33 -0500)
>>>>>> CPU  : DRA752-HS ES2.0
>>>>>> Model: TI DRA742
>>>>>> [...]
>>>>>> Booting from network ...
>>>>>> cpsw Waiting for PHY auto negotiation to complete.... done
>>>>>> link up on port 0, speed 1000, full duplex
>>>>>> BOOTP broadcast 1
>>>>>> CACHE: Misaligned operation at range [dffecb40, dffecc96]
>>>>>> CACHE: Misaligned operation at range [dffed140, dffed17e]
>>>>>> BOOTP broadcast 2
>>>>>> [...]
>>>>>> File transfer via NFS from server 10.0.1.26; our IP address is
>>>>>> 128.247.83.128; sending through gateway 128.247.82.1
>>>>>> [...]
>>>>>> Load address: 0x82000000
>>>>>> Loading: CACHE: Misaligned operation at range [dffebfc0, dffebfea]
>>>>>>
>>>>>>
>>>>>> FAT: it has been fixed recently by using a bounce buffer in FAT
>>>>>> code by
>>>>>> "8133f43d1cd fs/fat/fat_write: Fix buffer alignments".
>>>>>> I've also seen it a few years back on PowerPC platforms when
>>>>>> accessing a
>>>>>> USB storage.
>>>>>>
>>>>>> I'm sure that we could find more examples. I don't mean that they
>>>>>> shouldn't be fixed, simply that we could still make things work in
>>>>>> most
>>>>>> cases at a low cost.
>>>>>> The modifications I proposed do not change the behaviour of the
>>>>>> code for
>>>>>> aligned buffers, it just make it much more likely that it would work
>>>>>> with unaligned buffers. I fail to see the reason why this wouldn't
>>>>>> be a
>>>>>> good thing.
>>>>> It seems to me like "it kinda works, but it really doesn't sometimes
>>>>> ...." , so it's encouraging bad practice as people will get used to
>>>>> ignoring this warning because things "kinda work, in most cases".
>>>> I see your point but the current situation is exactly like that: it
>>>> works most of the time.
>>> But in this case, it's a property of the hardware. With your patch, it's
>>> also a property of the code which works "most of the time". You
>>> even state that in the patch description.
>> I'm not sure I understand your point.
>
> I'm referring to the last paragraph of the commit message, "There is
> however one loophole..."
>
>> But as I said I'm not pushing.
>> You're not keen to the idea and you're more knowledgeable than I am in
>> u-boot, so be it. I'll drop the subject
>> Thanks for taking time to discuss it.
>
> Let's wait for what the others have to say maybe ?

IMHO, a 100% reliable U-Boot is preferable, even if this means less
performance. It does not seem like there would be a way of working
around this loophole, so I would avoid this cache hack, even if the
probability of getting an issue would be small.

Maintaining only the aligned part of DMA buffers for cache operations
as Jean-Jacques says is done for armv7/8 is also not perfect. This
means that probably at least all the storage-related device drivers on
ARM can be affected in some way by unaligned buffers. That would mean
a lot of changes to move the buffer bounce mechanism from the FAT
layer to the device driver layer. That's why I'd rather move it to the
block layer. The architecture/driver combinations requiring it could
add a flag or a configuration setting somewhere, so that the
performance is not lowered when not needed. It would also only be
applied if a misalignment is detected, in which case a debug() message
could also be printed.

>>>> I'm not pushing for the changes, I just wanted to let know that we could
>>>> do better if we wish.
>>> Replacing one problem with another which looks like it isn't problem
>>> anymore (but still is) is not better IMO.
>>>
>>>>>> BTW the L2 uniphier cache implements this for flush and invalidation,
>>>>>> and some architectures (pxa, blackfin, openrisc, etc.) do the flush()
>>>>>> this way too.

Best regards,
Beno?t

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [U-Boot] ARM - cache and alignment
  2017-01-16 13:29 [U-Boot] ARM - cache and alignment Jean-Jacques Hiblot
  2017-01-16 16:00 ` Marek Vasut
@ 2017-01-20  3:36 ` Tom Rini
  1 sibling, 0 replies; 12+ messages in thread
From: Tom Rini @ 2017-01-20  3:36 UTC (permalink / raw)
  To: u-boot

On Mon, Jan 16, 2017 at 02:29:21PM +0100, Jean-Jacques Hiblot wrote:

> Tom, Marek
> 
> At the moment, whenever an unaligned address is used in cache
> operations (invalidate_dcache_range, or flush_dcache_range), the
> whole request is discarded  for am926ejs. for armV7 or armV8 only
> the aligned part is maintained. This is probably what is causing the
> bug addressed in 8133f43d1cd. There are a lot of unaligned buffers
> used in DMA operations and for all of them, we're possibly handling
> the cached partially or not at all. I've seen this when using the
> environment from a file stored in a FAT partition. commit
> 8133f43d1cd addresses this by using a bounce buffer at the FAT level
> but it's only one of many cases.
> 
> I think we can do better with unaligned cache operations:

My current feeling here is that the main problems here, with respect to
buffers not being aligned, is that as we move forward with DM changes we
can push this up a level and it will be harder to get this wrong.
Patches to fix problems found in the mean time are welcome.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20170119/78b831cc/attachment.sig>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-01-20  3:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16 13:29 [U-Boot] ARM - cache and alignment Jean-Jacques Hiblot
2017-01-16 16:00 ` Marek Vasut
2017-01-16 19:16   ` Jean-Jacques Hiblot
2017-01-16 19:33     ` Marek Vasut
2017-01-17  9:08       ` Jean-Jacques Hiblot
2017-01-17  9:15         ` Marek Vasut
2017-01-17  9:35           ` Jean-Jacques Hiblot
2017-01-17  9:38             ` Marek Vasut
2017-01-17  9:51               ` Jean-Jacques Hiblot
2017-01-17  9:54                 ` Marek Vasut
2017-01-17 20:54                   ` Benoît Thébaudeau
2017-01-20  3:36 ` Tom Rini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.