* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register @ 2015-03-08 12:31 vichy 2015-03-10 16:31 ` Catalin Marinas 0 siblings, 1 reply; 10+ messages in thread From: vichy @ 2015-03-08 12:31 UTC (permalink / raw) To: linux-arm-kernel hi all: Recently we bumped into the same issue like below path: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245908.html http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html We have some question about this patch: a. Under what circumstances, there will be memory returned by dma_alloc_coherent and friends mapped as normal, cacheable mappings? b. why "with CMA enabled, it should be safe not to set this bit." Sincerely appreciate your kind help, ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-08 12:31 some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register vichy @ 2015-03-10 16:31 ` Catalin Marinas 2015-03-10 16:34 ` Russell King - ARM Linux ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Catalin Marinas @ 2015-03-10 16:31 UTC (permalink / raw) To: linux-arm-kernel On Sun, Mar 08, 2015 at 08:31:45PM +0800, vichy wrote: > Recently we bumped into the same issue like below path: > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245908.html > http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html > > We have some question about this patch: > a. Under what circumstances, there will be memory returned by > dma_alloc_coherent and friends mapped as normal, cacheable mappings? dma_alloc_coherent() allocating from ZONE_DMA (or ZONE_NORMAL) which is already mapped in the kernel linear mapping as Normal Cacheable. > b. why "with CMA enabled, it should be safe not to set this bit." It's not entirely safe either. I guess the assumption is that CMA allocates from highmem which is not mapped in the kernel linear mapping. However, to be able to flush the caches for such highmem pages, they need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a small window between dmac_flush_range() and kunmap_atomic() where speculative cache line fills can still happen. Bit 22 in PL310 AuxCtlr must be set for most (all) uses of the coherent DMA API in Linux. -- Catalin ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-10 16:31 ` Catalin Marinas @ 2015-03-10 16:34 ` Russell King - ARM Linux 2015-03-10 17:40 ` Catalin Marinas 2015-03-10 17:11 ` Krishna Reddy 2015-03-11 14:35 ` vichy 2 siblings, 1 reply; 10+ messages in thread From: Russell King - ARM Linux @ 2015-03-10 16:34 UTC (permalink / raw) To: linux-arm-kernel On Tue, Mar 10, 2015 at 04:31:34PM +0000, Catalin Marinas wrote: > It's not entirely safe either. I guess the assumption is that CMA > allocates from highmem which is not mapped in the kernel linear mapping. > However, to be able to flush the caches for such highmem pages, they > need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a > small window between dmac_flush_range() and kunmap_atomic() where > speculative cache line fills can still happen. That really ought to be fixed. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-10 16:34 ` Russell King - ARM Linux @ 2015-03-10 17:40 ` Catalin Marinas 2015-03-10 17:48 ` Russell King - ARM Linux 0 siblings, 1 reply; 10+ messages in thread From: Catalin Marinas @ 2015-03-10 17:40 UTC (permalink / raw) To: linux-arm-kernel On Tue, Mar 10, 2015 at 04:34:12PM +0000, Russell King - ARM Linux wrote: > On Tue, Mar 10, 2015 at 04:31:34PM +0000, Catalin Marinas wrote: > > It's not entirely safe either. I guess the assumption is that CMA > > allocates from highmem which is not mapped in the kernel linear mapping. > > However, to be able to flush the caches for such highmem pages, they > > need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a > > small window between dmac_flush_range() and kunmap_atomic() where > > speculative cache line fills can still happen. > > That really ought to be fixed. For non-CMA DMA allocations, the solution is to set some memory aside which is not mapped (IIRC you tried this long time ago). As for CMA, do we have a guarantee that memory only comes from highmem? If yes (or we can enforce this somehow), we need something like kmap_atomic_prot() implemented for flushing such pages. But I still think it's easier if we just set bit 22 for PL310 ;). When this bit is cleared, the system is no longer compliant with the statements on mismatched memory attributes. -- Catalin ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-10 17:40 ` Catalin Marinas @ 2015-03-10 17:48 ` Russell King - ARM Linux 0 siblings, 0 replies; 10+ messages in thread From: Russell King - ARM Linux @ 2015-03-10 17:48 UTC (permalink / raw) To: linux-arm-kernel On Tue, Mar 10, 2015 at 05:40:19PM +0000, Catalin Marinas wrote: > On Tue, Mar 10, 2015 at 04:34:12PM +0000, Russell King - ARM Linux wrote: > > On Tue, Mar 10, 2015 at 04:31:34PM +0000, Catalin Marinas wrote: > > > It's not entirely safe either. I guess the assumption is that CMA > > > allocates from highmem which is not mapped in the kernel linear mapping. > > > However, to be able to flush the caches for such highmem pages, they > > > need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a > > > small window between dmac_flush_range() and kunmap_atomic() where > > > speculative cache line fills can still happen. > > > > That really ought to be fixed. > > For non-CMA DMA allocations, the solution is to set some memory aside > which is not mapped (IIRC you tried this long time ago). > > As for CMA, do we have a guarantee that memory only comes from highmem? > If yes (or we can enforce this somehow), we need something like > kmap_atomic_prot() implemented for flushing such pages. For CMA, if it comes from lowmem, CMA changes the memory attributes to make it non-cacheable, rather than using a separate mapping of the same page. That eliminates the aliasing mapping problem, and was done to explicitly avoid this scenario. However, the CMA changes on ARM were never properly reviewed before they went in. Like a lot of things today, they bypassed my tree for no good reason (and a lot of DMA API stuff continues to do so.) So it's not surprising that the quality of core ARM code has been getting half- thoughtout fixes like this... -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-10 16:31 ` Catalin Marinas 2015-03-10 16:34 ` Russell King - ARM Linux @ 2015-03-10 17:11 ` Krishna Reddy 2015-03-10 17:27 ` Russell King - ARM Linux 2015-03-11 14:35 ` vichy 2 siblings, 1 reply; 10+ messages in thread From: Krishna Reddy @ 2015-03-10 17:11 UTC (permalink / raw) To: linux-arm-kernel > > b. why "with CMA enabled, it should be safe not to set this bit." > > It's not entirely safe either. I guess the assumption is that CMA allocates from > highmem which is not mapped in the kernel linear mapping. > However, to be able to flush the caches for such highmem pages, they need > to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a small > window between dmac_flush_range() and kunmap_atomic() where > speculative cache line fills can still happen. Even Low mem CMA pages has similar race window as highmem CMA pages. __free_from_contiguous() in dma-mapping.c maps the CMA Low Mem pages as Cached using __dma_remap(). During the subsequent allocation of same CMA low mem pages, the window exist between __dma_clear_buffer() and __dma_remap() in __alloc_from_contiguous(). --Krishna Reddy ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-10 17:11 ` Krishna Reddy @ 2015-03-10 17:27 ` Russell King - ARM Linux 0 siblings, 0 replies; 10+ messages in thread From: Russell King - ARM Linux @ 2015-03-10 17:27 UTC (permalink / raw) To: linux-arm-kernel On Tue, Mar 10, 2015 at 05:11:51PM +0000, Krishna Reddy wrote: > > > b. why "with CMA enabled, it should be safe not to set this bit." > > > > It's not entirely safe either. I guess the assumption is that CMA allocates from > > highmem which is not mapped in the kernel linear mapping. > > However, to be able to flush the caches for such highmem pages, they need > > to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a small > > window between dmac_flush_range() and kunmap_atomic() where > > speculative cache line fills can still happen. > > Even Low mem CMA pages has similar race window as highmem CMA pages. > __free_from_contiguous() in dma-mapping.c maps the CMA Low Mem pages as > Cached using __dma_remap(). > During the subsequent allocation of same CMA low mem pages, the window exist between > __dma_clear_buffer() and __dma_remap() in __alloc_from_contiguous(). Again, something else which needs to be fixed. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-10 16:31 ` Catalin Marinas 2015-03-10 16:34 ` Russell King - ARM Linux 2015-03-10 17:11 ` Krishna Reddy @ 2015-03-11 14:35 ` vichy 2015-03-11 21:54 ` vichy 2 siblings, 1 reply; 10+ messages in thread From: vichy @ 2015-03-11 14:35 UTC (permalink / raw) To: linux-arm-kernel hi Catalin: 2015-03-11 0:31 GMT+08:00 Catalin Marinas <catalin.marinas@arm.com>: > On Sun, Mar 08, 2015 at 08:31:45PM +0800, vichy wrote: >> Recently we bumped into the same issue like below path: >> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245908.html >> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html >> >> We have some question about this patch: >> a. Under what circumstances, there will be memory returned by >> dma_alloc_coherent and friends mapped as normal, cacheable mappings? > > dma_alloc_coherent() allocating from ZONE_DMA (or ZONE_NORMAL) which is > already mapped in the kernel linear mapping as Normal Cacheable. > >> b. why "with CMA enabled, it should be safe not to set this bit." > > It's not entirely safe either. I guess the assumption is that CMA > allocates from highmem which is not mapped in the kernel linear mapping. > However, to be able to flush the caches for such highmem pages, they > need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a > small window between dmac_flush_range() and kunmap_atomic() where > speculative cache line fills can still happen. > > Bit 22 in PL310 AuxCtlr must be set for most (all) uses of the coherent > DMA API in Linux. if so, under what circumstance, the Bit22 in PL310 AuxCtlr will be cleared? thanks your kind help, ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-11 14:35 ` vichy @ 2015-03-11 21:54 ` vichy 2015-03-13 14:11 ` Catalin Marinas 0 siblings, 1 reply; 10+ messages in thread From: vichy @ 2015-03-11 21:54 UTC (permalink / raw) To: linux-arm-kernel hi Catalin: 2015-03-11 22:35 GMT+08:00 vichy <vichy.kuo@gmail.com>: > hi Catalin: > > 2015-03-11 0:31 GMT+08:00 Catalin Marinas <catalin.marinas@arm.com>: >> On Sun, Mar 08, 2015 at 08:31:45PM +0800, vichy wrote: >>> Recently we bumped into the same issue like below path: >>> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/245908.html >>> http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html >>> >>> We have some question about this patch: >>> a. Under what circumstances, there will be memory returned by >>> dma_alloc_coherent and friends mapped as normal, cacheable mappings? >> >> dma_alloc_coherent() allocating from ZONE_DMA (or ZONE_NORMAL) which is >> already mapped in the kernel linear mapping as Normal Cacheable. >> >>> b. why "with CMA enabled, it should be safe not to set this bit." >> >> It's not entirely safe either. I guess the assumption is that CMA >> allocates from highmem which is not mapped in the kernel linear mapping. >> However, to be able to flush the caches for such highmem pages, they >> need to be mapped (kmap_atomic() in __dma_clear_buffer()) but there is a >> small window between dmac_flush_range() and kunmap_atomic() where >> speculative cache line fills can still happen. >> >> Bit 22 in PL310 AuxCtlr must be set for most (all) uses of the coherent >> DMA API in Linux. Sorry for not describing my question more clear. Not only the above 2 links I pasted in the mail, I also found other threads has the issue as mine. (about L2C_AUX_CTRL_SHARED_OVERRIDE) And all of them(so far I see) suggest to set this bit on. if so, under what circumstance, the Bit22 in PL310 AuxCtlr will be cleared? thanks your kind help, ^ permalink raw reply [flat|nested] 10+ messages in thread
* some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register 2015-03-11 21:54 ` vichy @ 2015-03-13 14:11 ` Catalin Marinas 0 siblings, 0 replies; 10+ messages in thread From: Catalin Marinas @ 2015-03-13 14:11 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 12, 2015 at 05:54:59AM +0800, vichy wrote: > 2015-03-11 22:35 GMT+08:00 vichy <vichy.kuo@gmail.com>: > > 2015-03-11 0:31 GMT+08:00 Catalin Marinas <catalin.marinas@arm.com>: > >> Bit 22 in PL310 AuxCtlr must be set for most (all) uses of the coherent > >> DMA API in Linux. > > Not only the above 2 links I pasted in the mail, I also found other > threads has the issue as mine. > (about L2C_AUX_CTRL_SHARED_OVERRIDE) > And all of them(so far I see) suggest to set this bit on. > if so, under what circumstance, the Bit22 in PL310 AuxCtlr will be cleared? IIRC, the PL310 designers thought of this as a harmless optimisation. Basically on a system where a device (graphics usually) access could go through the PL310 (but not able to snoop the CPU caches), they thought that there wouldn't be any problem if the device can see the content of PL310 but not allocate into the cache. They actually wanted to avoid read-allocation that you get with a Normal Cacheable access from the device since such graphics device would easily fill the PL310. That's basically an emulation of a Cacheable no-read no-write allocate access. However, this broke many assumptions that Linux was making (mismatched aliases, later clarified in the ARM ARM). The feature wasn't easy to use either since to be beneficial, the buffer writer (CPU in this case) would have to allocate cache lines in PL310. For this, the device driver allocating the framebuffer would need to either create an inner non-cacheable, outer cacheable mapping (not supported by the DMA API) or flush the L1 cache only (not part of the standard DMA API either). My position over the past 4 years has clearly been that all firmware or SoC code must set this bit before PL310 is enabled. There is no performance drop nor other side effects caused by setting it. Unfortunately, Russell never accepted the patch to handle this at the cache-l2x0.c level (or print a warning if the firmware hasn't set it). Four years later, Linux still creates mismatched memory attributes aliases (which are not a bad thing, the ARM ARM clarifies the behaviour for ARMv7 and we know they've been fine on prior implementations; the arm64 port is not going to do anything special to avoid such aliases). -- Catalin ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-03-13 14:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-03-08 12:31 some question about Set bit 22 in the PL310 (cache controller) AuxCtlr register vichy 2015-03-10 16:31 ` Catalin Marinas 2015-03-10 16:34 ` Russell King - ARM Linux 2015-03-10 17:40 ` Catalin Marinas 2015-03-10 17:48 ` Russell King - ARM Linux 2015-03-10 17:11 ` Krishna Reddy 2015-03-10 17:27 ` Russell King - ARM Linux 2015-03-11 14:35 ` vichy 2015-03-11 21:54 ` vichy 2015-03-13 14:11 ` Catalin Marinas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).