linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
@ 2015-03-08 12:31 vichy
  2015-03-10 16:31 ` Catalin Marinas
  0 siblings, 1 reply; 10+ messages in thread
From: vichy @ 2015-03-08 12:31 UTC (permalink / raw)
  To: linux-arm-kernel

hi all:
Recently we bumped into the same issue like below path:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245908.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html

We have some question about this patch:
a. Under what circumstances, there will be memory returned by
dma_alloc_coherent and friends mapped as normal, cacheable mappings?

b. why "with CMA enabled, it should be safe not to set this bit."

Sincerely appreciate your kind help,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-08 12:31 some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register vichy
@ 2015-03-10 16:31 ` Catalin Marinas
  2015-03-10 16:34   ` Russell King - ARM Linux
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Catalin Marinas @ 2015-03-10 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Mar 08, 2015 at 08:31:45PM +0800, vichy wrote:
> Recently we bumped into the same issue like below path:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245908.html
> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html
> 
> We have some question about this patch:
> a. Under what circumstances, there will be memory returned by
> dma_alloc_coherent and friends mapped as normal, cacheable mappings?

dma_alloc_coherent() allocating from ZONE_DMA (or ZONE_NORMAL) which is
already mapped in the kernel linear mapping as Normal Cacheable.

> b. why "with CMA enabled, it should be safe not to set this bit."

It's not entirely safe either. I guess the assumption is that CMA
allocates from highmem which is not mapped in the kernel linear mapping.
However, to be able to flush the caches for such highmem pages, they
need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a
small window between dmac_flush_range() and kunmap_atomic() where
speculative cache line fills can still happen.

Bit 22 in PL310 AuxCtlr must be set for most (all) uses of the coherent
DMA API in Linux.

-- 
Catalin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-10 16:31 ` Catalin Marinas
@ 2015-03-10 16:34   ` Russell King - ARM Linux
  2015-03-10 17:40     ` Catalin Marinas
  2015-03-10 17:11   ` Krishna Reddy
  2015-03-11 14:35   ` vichy
  2 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2015-03-10 16:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 10, 2015 at 04:31:34PM +0000, Catalin Marinas wrote:
> It's not entirely safe either. I guess the assumption is that CMA
> allocates from highmem which is not mapped in the kernel linear mapping.
> However, to be able to flush the caches for such highmem pages, they
> need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a
> small window between dmac_flush_range() and kunmap_atomic() where
> speculative cache line fills can still happen.

That really ought to be fixed.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-10 16:31 ` Catalin Marinas
  2015-03-10 16:34   ` Russell King - ARM Linux
@ 2015-03-10 17:11   ` Krishna Reddy
  2015-03-10 17:27     ` Russell King - ARM Linux
  2015-03-11 14:35   ` vichy
  2 siblings, 1 reply; 10+ messages in thread
From: Krishna Reddy @ 2015-03-10 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

> > b. why "with CMA enabled, it should be safe not to set this bit."
> 
> It's not entirely safe either. I guess the assumption is that CMA allocates from
> highmem which is not mapped in the kernel linear mapping.
> However, to be able to flush the caches for such highmem pages, they need
> to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a small
> window between dmac_flush_range() and kunmap_atomic() where
> speculative cache line fills can still happen.

Even Low mem CMA pages has similar race window as highmem CMA pages. 
__free_from_contiguous() in dma-mapping.c maps the CMA Low Mem pages as
 Cached using __dma_remap().
During the subsequent allocation of same CMA low mem pages, the window exist between
__dma_clear_buffer() and __dma_remap() in __alloc_from_contiguous().

--Krishna Reddy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-10 17:11   ` Krishna Reddy
@ 2015-03-10 17:27     ` Russell King - ARM Linux
  0 siblings, 0 replies; 10+ messages in thread
From: Russell King - ARM Linux @ 2015-03-10 17:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 10, 2015 at 05:11:51PM +0000, Krishna Reddy wrote:
> > > b. why "with CMA enabled, it should be safe not to set this bit."
> > 
> > It's not entirely safe either. I guess the assumption is that CMA allocates from
> > highmem which is not mapped in the kernel linear mapping.
> > However, to be able to flush the caches for such highmem pages, they need
> > to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a small
> > window between dmac_flush_range() and kunmap_atomic() where
> > speculative cache line fills can still happen.
> 
> Even Low mem CMA pages has similar race window as highmem CMA pages. 
> __free_from_contiguous() in dma-mapping.c maps the CMA Low Mem pages as
>  Cached using __dma_remap().
> During the subsequent allocation of same CMA low mem pages, the window exist between
> __dma_clear_buffer() and __dma_remap() in __alloc_from_contiguous().

Again, something else which needs to be fixed.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-10 16:34   ` Russell King - ARM Linux
@ 2015-03-10 17:40     ` Catalin Marinas
  2015-03-10 17:48       ` Russell King - ARM Linux
  0 siblings, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2015-03-10 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 10, 2015 at 04:34:12PM +0000, Russell King - ARM Linux wrote:
> On Tue, Mar 10, 2015 at 04:31:34PM +0000, Catalin Marinas wrote:
> > It's not entirely safe either. I guess the assumption is that CMA
> > allocates from highmem which is not mapped in the kernel linear mapping.
> > However, to be able to flush the caches for such highmem pages, they
> > need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a
> > small window between dmac_flush_range() and kunmap_atomic() where
> > speculative cache line fills can still happen.
> 
> That really ought to be fixed.

For non-CMA DMA allocations, the solution is to set some memory aside
which is not mapped (IIRC you tried this long time ago).

As for CMA, do we have a guarantee that memory only comes from highmem?
If yes (or we can enforce this somehow), we need something like
kmap_atomic_prot() implemented for flushing such pages.

But I still think it's easier if we just set bit 22 for PL310 ;). When
this bit is cleared, the system is no longer compliant with the
statements on mismatched memory attributes.

-- 
Catalin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-10 17:40     ` Catalin Marinas
@ 2015-03-10 17:48       ` Russell King - ARM Linux
  0 siblings, 0 replies; 10+ messages in thread
From: Russell King - ARM Linux @ 2015-03-10 17:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 10, 2015 at 05:40:19PM +0000, Catalin Marinas wrote:
> On Tue, Mar 10, 2015 at 04:34:12PM +0000, Russell King - ARM Linux wrote:
> > On Tue, Mar 10, 2015 at 04:31:34PM +0000, Catalin Marinas wrote:
> > > It's not entirely safe either. I guess the assumption is that CMA
> > > allocates from highmem which is not mapped in the kernel linear mapping.
> > > However, to be able to flush the caches for such highmem pages, they
> > > need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a
> > > small window between dmac_flush_range() and kunmap_atomic() where
> > > speculative cache line fills can still happen.
> > 
> > That really ought to be fixed.
> 
> For non-CMA DMA allocations, the solution is to set some memory aside
> which is not mapped (IIRC you tried this long time ago).
> 
> As for CMA, do we have a guarantee that memory only comes from highmem?
> If yes (or we can enforce this somehow), we need something like
> kmap_atomic_prot() implemented for flushing such pages.

For CMA, if it comes from lowmem, CMA changes the memory attributes to
make it non-cacheable, rather than using a separate mapping of the same
page.  That eliminates the aliasing mapping problem, and was done to
explicitly avoid this scenario.

However, the CMA changes on ARM were never properly reviewed before they
went in.  Like a lot of things today, they bypassed my tree for no good
reason (and a lot of DMA API stuff continues to do so.)  So it's not
surprising that the quality of core ARM code has been getting half-
thoughtout fixes like this...

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-10 16:31 ` Catalin Marinas
  2015-03-10 16:34   ` Russell King - ARM Linux
  2015-03-10 17:11   ` Krishna Reddy
@ 2015-03-11 14:35   ` vichy
  2015-03-11 21:54     ` vichy
  2 siblings, 1 reply; 10+ messages in thread
From: vichy @ 2015-03-11 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

hi Catalin:

2015-03-11 0:31 GMT+08:00 Catalin Marinas <catalin.marinas@arm.com>:
> On Sun, Mar 08, 2015 at 08:31:45PM +0800, vichy wrote:
>> Recently we bumped into the same issue like below path:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245908.html
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html
>>
>> We have some question about this patch:
>> a. Under what circumstances, there will be memory returned by
>> dma_alloc_coherent and friends mapped as normal, cacheable mappings?
>
> dma_alloc_coherent() allocating from ZONE_DMA (or ZONE_NORMAL) which is
> already mapped in the kernel linear mapping as Normal Cacheable.
>
>> b. why "with CMA enabled, it should be safe not to set this bit."
>
> It's not entirely safe either. I guess the assumption is that CMA
> allocates from highmem which is not mapped in the kernel linear mapping.
> However, to be able to flush the caches for such highmem pages, they
> need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a
> small window between dmac_flush_range() and kunmap_atomic() where
> speculative cache line fills can still happen.
>
> Bit 22 in PL310 AuxCtlr must be set for most (all) uses of the coherent
> DMA API in Linux.
if so, under what circumstance, the Bit22 in PL310 AuxCtlr will be cleared?
thanks your kind help,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-11 14:35   ` vichy
@ 2015-03-11 21:54     ` vichy
  2015-03-13 14:11       ` Catalin Marinas
  0 siblings, 1 reply; 10+ messages in thread
From: vichy @ 2015-03-11 21:54 UTC (permalink / raw)
  To: linux-arm-kernel

hi Catalin:

2015-03-11 22:35 GMT+08:00 vichy <vichy.kuo@gmail.com>:
> hi Catalin:
>
> 2015-03-11 0:31 GMT+08:00 Catalin Marinas <catalin.marinas@arm.com>:
>> On Sun, Mar 08, 2015 at 08:31:45PM +0800, vichy wrote:
>>> Recently we bumped into the same issue like below path:
>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245908.html
>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html
>>>
>>> We have some question about this patch:
>>> a. Under what circumstances, there will be memory returned by
>>> dma_alloc_coherent and friends mapped as normal, cacheable mappings?
>>
>> dma_alloc_coherent() allocating from ZONE_DMA (or ZONE_NORMAL) which is
>> already mapped in the kernel linear mapping as Normal Cacheable.
>>
>>> b. why "with CMA enabled, it should be safe not to set this bit."
>>
>> It's not entirely safe either. I guess the assumption is that CMA
>> allocates from highmem which is not mapped in the kernel linear mapping.
>> However, to be able to flush the caches for such highmem pages, they
>> need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a
>> small window between dmac_flush_range() and kunmap_atomic() where
>> speculative cache line fills can still happen.
>>
>> Bit 22 in PL310 AuxCtlr must be set for most (all) uses of the coherent
>> DMA API in Linux.
Sorry for not describing my question more clear.
Not only the above 2 links I pasted in the mail, I also found other
threads has the issue as mine.
(about L2C_AUX_CTRL_SHARED_OVERRIDE)
And all of them(so far I see) suggest to set this bit on.
if so, under what circumstance, the Bit22 in PL310 AuxCtlr will be cleared?
thanks your kind help,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register
  2015-03-11 21:54     ` vichy
@ 2015-03-13 14:11       ` Catalin Marinas
  0 siblings, 0 replies; 10+ messages in thread
From: Catalin Marinas @ 2015-03-13 14:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 12, 2015 at 05:54:59AM +0800, vichy wrote:
> 2015-03-11 22:35 GMT+08:00 vichy <vichy.kuo@gmail.com>:
> > 2015-03-11 0:31 GMT+08:00 Catalin Marinas <catalin.marinas@arm.com>:
> >> Bit 22 in PL310 AuxCtlr must be set for most (all) uses of the coherent
> >> DMA API in Linux.
> 
> Not only the above 2 links I pasted in the mail, I also found other
> threads has the issue as mine.
> (about L2C_AUX_CTRL_SHARED_OVERRIDE)
> And all of them(so far I see) suggest to set this bit on.
> if so, under what circumstance, the Bit22 in PL310 AuxCtlr will be cleared?

IIRC, the PL310 designers thought of this as a harmless optimisation.
Basically on a system where a device (graphics usually) access could go
through the PL310 (but not able to snoop the CPU caches), they thought
that there wouldn't be any problem if the device can see the content of
PL310 but not allocate into the cache. They actually wanted to avoid
read-allocation that you get with a Normal Cacheable access from the
device since such graphics device would easily fill the PL310. That's
basically an emulation of a Cacheable no-read no-write allocate access.

However, this broke many assumptions that Linux was making (mismatched
aliases, later clarified in the ARM ARM). The feature wasn't easy to use
either since to be beneficial, the buffer writer (CPU in this case)
would have to allocate cache lines in PL310. For this, the device driver
allocating the framebuffer would need to either create an inner
non-cacheable, outer cacheable mapping (not supported by the DMA API) or
flush the L1 cache only (not part of the standard DMA API either).

My position over the past 4 years has clearly been that all firmware or
SoC code must set this bit before PL310 is enabled. There is no
performance drop nor other side effects caused by setting it.
Unfortunately, Russell never accepted the patch to handle this at the
cache-l2x0.c level (or print a warning if the firmware hasn't set it).
Four years later, Linux still creates mismatched memory attributes
aliases (which are not a bad thing, the ARM ARM clarifies the behaviour
for ARMv7 and we know they've been fine on prior implementations; the
arm64 port is not going to do anything special to avoid such aliases).

-- 
Catalin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-03-13 14:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-08 12:31 some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register vichy
2015-03-10 16:31 ` Catalin Marinas
2015-03-10 16:34   ` Russell King - ARM Linux
2015-03-10 17:40     ` Catalin Marinas
2015-03-10 17:48       ` Russell King - ARM Linux
2015-03-10 17:11   ` Krishna Reddy
2015-03-10 17:27     ` Russell King - ARM Linux
2015-03-11 14:35   ` vichy
2015-03-11 21:54     ` vichy
2015-03-13 14:11       ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).