* Overlapping ioremap() calls, set_memory_*() semantics @ 2016-03-03 21:28 Luis R. Rodriguez 2016-03-03 21:28 ` Luis R. Rodriguez 2016-03-04 9:44 ` Ingo Molnar 0 siblings, 2 replies; 28+ messages in thread From: Luis R. Rodriguez @ 2016-03-03 21:28 UTC (permalink / raw) To: Toshi Kani, Paul McKenney Cc: Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter At kernel summit, during the semantics of ioremap() session, Paul mentioned we'd write something up to help get some notes out on what we need to do and help clarify things. I've run into an issue (just a warning) with a user on some odd system that I suspect may be the result of a driver using overlapping ioremap() calls on conflicting memory regions, so I'm a bit interested to see a resolution to some of these lingering discussions now. Although we spoke of quite a bit of things, I raised in particular the 'semantics of overlapping ioremap()' calls as one item of interest we should address across architectures. At least on x86 it seems we would not get an error if this is done and in fact its expected behavior; Toshi had determined we could not enable error'ing out on overlapping ioremap() calls given we have a few users that use it intentionally, for instance the /dev/mem setup code. I had suggested long ago then that one possible resolution was for us to add an API that *enables* overlapping ioremap() calls, and only use it on select locations in the kernel. This means we only have to convert a few users to that call to white list such semantics, and by default we'd disable overlapping calls. To kick things off -- is this strategy agreeable for all other architectures? The problem is that without this it remains up to the developer of the driver to avoid overlapping calls, and a user may just get sporadic errors if this happens. As another problem case, set_memor_*() will not fail on MMIO even though set_memor_*() is designed only for RAM. If the above strategy on avoiding overlapping is agreeable, could the next step, or an orthogonal step be to error out on set_memory_*() on IO memory? Luis ^ permalink raw reply [flat|nested] 28+ messages in thread
* Overlapping ioremap() calls, set_memory_*() semantics 2016-03-03 21:28 Overlapping ioremap() calls, set_memory_*() semantics Luis R. Rodriguez @ 2016-03-03 21:28 ` Luis R. Rodriguez 2016-03-04 9:44 ` Ingo Molnar 1 sibling, 0 replies; 28+ messages in thread From: Luis R. Rodriguez @ 2016-03-03 21:28 UTC (permalink / raw) To: Toshi Kani, Paul McKenney Cc: Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter At kernel summit, during the semantics of ioremap() session, Paul mentioned we'd write something up to help get some notes out on what we need to do and help clarify things. I've run into an issue (just a warning) with a user on some odd system that I suspect may be the result of a driver using overlapping ioremap() calls on conflicting memory regions, so I'm a bit interested to see a resolution to some of these lingering discussions now. Although we spoke of quite a bit of things, I raised in particular the 'semantics of overlapping ioremap()' calls as one item of interest we should address across architectures. At least on x86 it seems we would not get an error if this is done and in fact its expected behavior; Toshi had determined we could not enable error'ing out on overlapping ioremap() calls given we have a few users that use it intentionally, for instance the /dev/mem setup code. I had suggested long ago then that one possible resolution was for us to add an API that *enables* overlapping ioremap() calls, and only use it on select locations in the kernel. This means we only have to convert a few users to that call to white list such semantics, and by default we'd disable overlapping calls. To kick things off -- is this strategy agreeable for all other architectures? The problem is that without this it remains up to the developer of the driver to avoid overlapping calls, and a user may just get sporadic errors if this happens. As another problem case, set_memor_*() will not fail on MMIO even though set_memor_*() is designed only for RAM. If the above strategy on avoiding overlapping is agreeable, could the next step, or an orthogonal step be to error out on set_memory_*() on IO memory? Luis ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-03 21:28 Overlapping ioremap() calls, set_memory_*() semantics Luis R. Rodriguez 2016-03-03 21:28 ` Luis R. Rodriguez @ 2016-03-04 9:44 ` Ingo Molnar 2016-03-04 18:18 ` Toshi Kani 1 sibling, 1 reply; 28+ messages in thread From: Ingo Molnar @ 2016-03-04 9:44 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst * Luis R. Rodriguez <mcgrof@kernel.org> wrote: > At kernel summit, during the semantics of ioremap() session, Paul > mentioned we'd write something up to help get some notes out on what > we need to do and help clarify things. I've run into an issue (just a > warning) with a user on some odd system that I suspect may be the > result of a driver using overlapping ioremap() calls on conflicting > memory regions, so I'm a bit interested to see a resolution to some of > these lingering discussions now. > > Although we spoke of quite a bit of things, I raised in particular the > 'semantics of overlapping ioremap()' calls as one item of interest we > should address across architectures. At least on x86 it seems we would > not get an error if this is done and in fact its expected behavior; > Toshi had determined we could not enable error'ing out on overlapping > ioremap() calls given we have a few users that use it intentionally, > for instance the /dev/mem setup code. I had suggested long ago then > that one possible resolution was for us to add an API that *enables* > overlapping ioremap() calls, and only use it on select locations in > the kernel. This means we only have to convert a few users to that > call to white list such semantics, and by default we'd disable > overlapping calls. To kick things off -- is this strategy agreeable > for all other architectures? So I'd say that since ioremap() in itself is fragile enough, we should work towards eliminating overlapping ranges. The thing is, the whole vmap_area logic is based around non-overlapping ranges, sorted into the vmap_area_root rbtree. Just check the logic in mm/vmalloc.c::alloc_vmap_area(): it's based on finding holes in the kernel-virtual allocations. 'Overlapping ranges' is very much not part of that logic, at least to my understanding. How are overlapping ioremap()s even possible with that logic? The allocator searches for holes, not allowing for overlaps. What am I missing? Could you outline a specific case where it's done intentionally - and the purpose behind that intention? > The problem is that without this it remains up to the developer of the driver to > avoid overlapping calls, and a user may just get sporadic errors if this > happens. As another problem case, set_memor_*() will not fail on MMIO even > though set_memor_*() is designed only for RAM. If the above strategy on avoiding > overlapping is agreeable, could the next step, or an orthogonal step be to error > out on set_memory_*() on IO memory? So how do drivers set up WC or WB MMIO areas? Does none of our upstream video drivers do that? Thanks, Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-04 9:44 ` Ingo Molnar @ 2016-03-04 18:18 ` Toshi Kani 2016-03-04 18:18 ` Toshi Kani ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Toshi Kani @ 2016-03-04 18:18 UTC (permalink / raw) To: Ingo Molnar, Luis R. Rodriguez Cc: Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Fri, 2016-03-04 at 10:44 +0100, Ingo Molnar wrote: > * Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > At kernel summit, during the semantics of ioremap() session, Paul > > mentioned we'd write something up to help get some notes out on what > > we need to do and help clarify things. I've run into an issue (just a > > warning) with a user on some odd system that I suspect may be the > > result of a driver using overlapping ioremap() calls on conflicting > > memory regions, so I'm a bit interested to see a resolution to some of > > these lingering discussions now. > > > > Although we spoke of quite a bit of things, I raised in particular the > > 'semantics of overlapping ioremap()' calls as one item of interest we > > should address across architectures. At least on x86 it seems we would > > not get an error if this is done and in fact its expected behavior; > > Toshi had determined we could not enable error'ing out on overlapping > > ioremap() calls given we have a few users that use it intentionally, > > for instance the /dev/mem setup code. I had suggested long ago then > > that one possible resolution was for us to add an API that *enables* > > overlapping ioremap() calls, and only use it on select locations in > > the kernel. This means we only have to convert a few users to that > > call to white list such semantics, and by default we'd disable > > overlapping calls. To kick things off -- is this strategy agreeable > > for all other architectures? > > So I'd say that since ioremap() in itself is fragile enough, we should > work towards eliminating overlapping ranges. > > The thing is, the whole vmap_area logic is based around non-overlapping > ranges, sorted into the vmap_area_root rbtree. > > Just check the logic in mm/vmalloc.c::alloc_vmap_area(): it's based on > finding holes in the kernel-virtual allocations. 'Overlapping ranges' is > very much not part of that logic, at least to my understanding. > > How are overlapping ioremap()s even possible with that logic? The > allocator searches for holes, not allowing for overlaps. What am I > missing? > > Could you outline a specific case where it's done intentionally - and the > purpose behind that intention? The term "overlapping" is a bit misleading. This is "alias" mapping -- a physical address range is mapped to multiple virtual address ranges. There is no overlapping in VMA. Such alias mappings are used by multiple modules. For instance, a PMEM range is mapped to the kernel and user spaces. /dev/mem is another example that creates a user space mapping to a physical address where other mappings may already exist. Hence, alias mapping itself is a supported use-case. However, alias mapping with different cache types is not as it causes undefined behavior. Therefore, PAT module protects from this case by tracking cache types used for mapping physical ranges. When a different cache type is requested, is_new_memtype_allowed() checks if the request needs to be failed or can be changed to the existing type. I agree that the current implementation is fragile, and some interfaces skip such check at all, ex. vm_insert_pfn(). > > The problem is that without this it remains up to the developer of the > > driver to avoid overlapping calls, and a user may just get sporadic > > errors if this happens. As another problem case, set_memor_*() will > > not fail on MMIO even though set_memor_*() is designed only for RAM. If > > the above strategy on avoiding overlapping is agreeable, could the next > > step, or an orthogonal step be to error out on set_memory_*() on IO > > memory? > > So how do drivers set up WC or WB MMIO areas? Does none of our upstream > video drivers do that? Drivers use ioremap family with a right cache type when mapping MMIO ranges, ex. ioremap_wc(). They do not need to change the type to MMIO. RAM is different since it's already mapped with WB at boot-time. set_memory_*() allows us to change the type from WB, and put it back to WB. Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-04 18:18 ` Toshi Kani @ 2016-03-04 18:18 ` Toshi Kani 2016-03-04 18:51 ` Luis R. Rodriguez 2016-03-05 11:40 ` Ingo Molnar 2 siblings, 0 replies; 28+ messages in thread From: Toshi Kani @ 2016-03-04 18:18 UTC (permalink / raw) To: Ingo Molnar, Luis R. Rodriguez Cc: Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Fri, 2016-03-04 at 10:44 +0100, Ingo Molnar wrote: > * Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > At kernel summit, during the semantics of ioremap() session, Paul > > mentioned we'd write something up to help get some notes out on what > > we need to do and help clarify things. I've run into an issue (just a > > warning) with a user on some odd system that I suspect may be the > > result of a driver using overlapping ioremap() calls on conflicting > > memory regions, so I'm a bit interested to see a resolution to some of > > these lingering discussions now. > > > > Although we spoke of quite a bit of things, I raised in particular the > > 'semantics of overlapping ioremap()' calls as one item of interest we > > should address across architectures. At least on x86 it seems we would > > not get an error if this is done and in fact its expected behavior; > > Toshi had determined we could not enable error'ing out on overlapping > > ioremap() calls given we have a few users that use it intentionally, > > for instance the /dev/mem setup code. I had suggested long ago then > > that one possible resolution was for us to add an API that *enables* > > overlapping ioremap() calls, and only use it on select locations in > > the kernel. This means we only have to convert a few users to that > > call to white list such semantics, and by default we'd disable > > overlapping calls. To kick things off -- is this strategy agreeable > > for all other architectures? > > So I'd say that since ioremap() in itself is fragile enough, we should > work towards eliminating overlapping ranges. > > The thing is, the whole vmap_area logic is based around non-overlapping > ranges, sorted into the vmap_area_root rbtree. > > Just check the logic in mm/vmalloc.c::alloc_vmap_area(): it's based on > finding holes in the kernel-virtual allocations. 'Overlapping ranges' is > very much not part of that logic, at least to my understanding. > > How are overlapping ioremap()s even possible with that logic? The > allocator searches for holes, not allowing for overlaps. What am I > missing? > > Could you outline a specific case where it's done intentionally - and the > purpose behind that intention? The term "overlapping" is a bit misleading. This is "alias" mapping -- a physical address range is mapped to multiple virtual address ranges. There is no overlapping in VMA. Such alias mappings are used by multiple modules. For instance, a PMEM range is mapped to the kernel and user spaces. /dev/mem is another example that creates a user space mapping to a physical address where other mappings may already exist. Hence, alias mapping itself is a supported use-case. However, alias mapping with different cache types is not as it causes undefined behavior. Therefore, PAT module protects from this case by tracking cache types used for mapping physical ranges. When a different cache type is requested, is_new_memtype_allowed() checks if the request needs to be failed or can be changed to the existing type. I agree that the current implementation is fragile, and some interfaces skip such check at all, ex. vm_insert_pfn(). > > The problem is that without this it remains up to the developer of the > > driver to avoid overlapping calls, and a user may just get sporadic > > errors if this happens. As another problem case, set_memor_*() will > > not fail on MMIO even though set_memor_*() is designed only for RAM. If > > the above strategy on avoiding overlapping is agreeable, could the next > > step, or an orthogonal step be to error out on set_memory_*() on IO > > memory? > > So how do drivers set up WC or WB MMIO areas? Does none of our upstream > video drivers do that? Drivers use ioremap family with a right cache type when mapping MMIO ranges, ex. ioremap_wc(). They do not need to change the type to MMIO. RAM is different since it's already mapped with WB at boot-time. set_memory_*() allows us to change the type from WB, and put it back to WB. Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-04 18:18 ` Toshi Kani 2016-03-04 18:18 ` Toshi Kani @ 2016-03-04 18:51 ` Luis R. Rodriguez 2016-03-04 21:39 ` Toshi Kani 2016-03-05 11:42 ` Ingo Molnar 2016-03-05 11:40 ` Ingo Molnar 2 siblings, 2 replies; 28+ messages in thread From: Luis R. Rodriguez @ 2016-03-04 18:51 UTC (permalink / raw) To: Toshi Kani Cc: Ingo Molnar, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst, Dennis Dalessandro, Andy Walls, Mauro Carvalho Chehab, Julia Lawall On Fri, Mar 4, 2016 at 10:18 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > On Fri, 2016-03-04 at 10:44 +0100, Ingo Molnar wrote: >> * Luis R. Rodriguez <mcgrof@kernel.org> wrote: >> >> Could you outline a specific case where it's done intentionally - and the >> purpose behind that intention? > > The term "overlapping" is a bit misleading. To the driver author its overlapping, its overlapping on the physical address. > This is "alias" mapping -- a > physical address range is mapped to multiple virtual address ranges. There > is no overlapping in VMA. Meanwhile this is what happens behind the scenes, and because of this a driver developer may end up with two different offsets for overlapping physical addresses ranges, so long as they use the right offset they are *supposed* to get then the intended cache type effects. The resulting cache type effect will depend on architecture and intended cache type on the ioremap call. Furthermore there are modifiers possible, such as was MTRR (and fortunately now deprecated on kernel driver use), and those effects are now documented in Documentation/x86/pat.txt on a table for PAT/not-PAT systems, which I'll past here for convenience: ---------------------------------------------------------------------- MTRR Non-PAT PAT Linux ioremap value Effective memory type ---------------------------------------------------------------------- Non-PAT | PAT PAT |PCD ||PWT ||| WC 000 WB _PAGE_CACHE_MODE_WB WC | WC WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC WC 011 UC _PAGE_CACHE_MODE_UC UC | UC ---------------------------------------------------------------------- > Such alias mappings are used by multiple modules. For instance, a PMEM > range is mapped to the kernel and user spaces. /dev/mem is another example > that creates a user space mapping to a physical address where other > mappings may already exist. The above examples are IMHO perhaps valid uses, and for these perhaps we should add an API that enables overlapping on physical ranges. I believe this given that for regular device drivers, at least in review of write-combining, there were only exceptions I had found that were using overlapping ranges. The above table was used to devise a strategy to help phase MTRR in consideration for the overlapping uses on drivers. Below is a list of what was found: * atyfb: I ended up splitting the overlapping ranges, it was quite a bit of work, but I did this mostly to demo the effort needed on a device driver after a device driver already had used an overlapping range. To maintain the effective write-combining effects for non-PAT systems we added ioremap_uc() which can be used on the MMIO range, this would only be used on the MMIO area, meanwhile the framebuffer area used ioremap_wc(), the arch_phys_wc_add() call was then used to trigger write-combining on non-PAT systems due to the above tables. * ipath: while a split on ipath was possible the changes are quite significant, way more significant than what was on atyfb. Fortunately work on this driver for this triggered a discussion of eventually removing this device driver, so on v4.3-rc2 it was move to staging to phase it out. For completeness though I'll explain why the addressing the overlap was hard: apart from changing the driver to use different offset bases in different regions the driver also has a debug userspace filemap for the entire register map, the code there would need to be modified to use the right virtual address base depending on the virtual address accessed. The qib driver already has logic which could be mimic'd for this fortunatley, but still - this is considerable work. One side hack I had hoped for was that overlapping ioremap*() calls with different page attribute types would work even if the 2nd returned __iomem address was not used, fortunately this does not work though, only using the right virtual address would ensure the right caching technique is used. Since this driver was being phased out, we decided to annotate it required pat to be disabled and warn the user if they booted with PAT enabled. * ivtv: the driver does not have the PCI space mapped out separately, and in fact it actually does not do the math for the framebuffer, instead it lets the device's own CPU do that and assume where its at, see ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get but not a setter. Its not clear if the firmware would make a split easy. As with ipath, we decided to just require PAT disabled. This was a reasonable compromise as the hardware was really old and we estimated perhaps only a few lost souls were using this device driver, and for them it should be reasonable to disable PAT. Fortunately in retrospect it was only old hardware devices that used overlapping ioremap() calls, but we can't be sure things will remain that way. It used to be that only framebuffer devices used write-combining, these days we are seeing infiniband (ipath was one, qib another) use write-combining for PIO TX for some sort of blast feature, I would not be surprised if in the future other device drivers found a quirky use for this. The ivtv case is the *worst* example we can expect where the firmware hides from us the exact ranges for write-combining, that we should somehow just hope no one will ever do again. > Hence, alias mapping itself is a supported use-case. However, alias > mapping with different cache types is not as it causes undefined behavior. Just as-is at times some types of modifiers, such as was with MTRR. In retrospect, based on a cursory review through the phasing of MTRR, we only have ipath left (and that will be removed soon) and we just don't know what ivtv does. If we wanted a more in-depth analysis we might be able to use something like coccinelle to inspect for overlapping calls, but that may be a bit of a challenge in and of itself. It may be easier to phase that behavior out by first warning against it for a few cycles, and then just laying a hammer down and saying 'though shalt not do this', and only enable exceptions with a special API. > Therefore, PAT module protects from this case by tracking cache types used > for mapping physical ranges. When a different cache type is requested, > is_new_memtype_allowed() checks if the request needs to be failed or can be > changed to the existing type. Do we have a map of what is allowed for PAT and non-PAT, similar to the one for MTRR effects above? It would seem we should want something similar for other architectures ? But if there isn't much valid use for it, why not just rule this sort of stuff out and only make it an exception, by only having a few uses being allowed? > I agree that the current implementation is fragile, and some interfaces > skip such check at all, ex. vm_insert_pfn(). Ouch. >> > The problem is that without this it remains up to the developer of the >> > driver to avoid overlapping calls, and a user may just get sporadic >> > errors if this happens. As another problem case, set_memor_*() will >> > not fail on MMIO even though set_memor_*() is designed only for RAM. If >> > the above strategy on avoiding overlapping is agreeable, could the next >> > step, or an orthogonal step be to error out on set_memory_*() on IO >> > memory? >> >> So how do drivers set up WC or WB MMIO areas? Does none of our upstream >> video drivers do that? > > Drivers use ioremap family with a right cache type when mapping MMIO > ranges, ex. ioremap_wc(). They do not need to change the type to MMIO. > RAM is different since it's already mapped with WB at boot-time. > set_memory_*() allows us to change the type from WB, and put it back to > WB. Shouldn't set_memory_*() fail on IO memory? It does not today. Luis ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-04 18:51 ` Luis R. Rodriguez @ 2016-03-04 21:39 ` Toshi Kani 2016-03-05 11:42 ` Ingo Molnar 1 sibling, 0 replies; 28+ messages in thread From: Toshi Kani @ 2016-03-04 21:39 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Ingo Molnar, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst, Dennis Dalessandro, Andy Walls, Mauro Carvalho Chehab, Julia Lawall On Fri, 2016-03-04 at 10:51 -0800, Luis R. Rodriguez wrote: > On Fri, Mar 4, 2016 at 10:18 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > > On Fri, 2016-03-04 at 10:44 +0100, Ingo Molnar wrote: > > > * Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > > > > Could you outline a specific case where it's done intentionally - and > > > the purpose behind that intention? > > > > The term "overlapping" is a bit misleading. > > To the driver author its overlapping, its overlapping on the physical > address. Right, and I just wanted to state that there is no overlapping in VMA. > > This is "alias" mapping -- a physical address range is mapped to > > multiple virtual address ranges. There is no overlapping in VMA. > > Meanwhile this is what happens behind the scenes, and because of this > a driver developer may end up with two different offsets for > overlapping physical addresses ranges, so long as they use the right > offset they are *supposed* to get then the intended cache type > effects. The resulting cache type effect will depend on architecture > and intended cache type on the ioremap call. Furthermore there are > modifiers possible, such as was MTRR (and fortunately now deprecated > on kernel driver use), and those effects are now documented in > Documentation/x86/pat.txt on a table for PAT/not-PAT systems, which > I'll past here for convenience: Thanks for updating the documentation! > ---------------------------------------------------------------------- > MTRR Non-PAT PAT Linux ioremap value Effective memory type > ---------------------------------------------------------------------- > Non-PAT | PAT > PAT > |PCD > ||PWT > ||| > WC 000 WB _PAGE_CACHE_MODE_WB WC | WC > WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC > WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC In the above case, effective PAT is also WC. > WC 011 UC _PAGE_CACHE_MODE_UC UC | UC > ---------------------------------------------------------------------- > > > Such alias mappings are used by multiple modules. For instance, a PMEM > > range is mapped to the kernel and user spaces. /dev/mem is another > > example that creates a user space mapping to a physical address where > > other mappings may already exist. > > The above examples are IMHO perhaps valid uses, and for these perhaps > we should add an API that enables overlapping on physical ranges. I > believe this given that for regular device drivers, at least in review > of write-combining, there were only exceptions I had found that were > using overlapping ranges. The above table was used to devise a > strategy to help phase MTRR in consideration for the overlapping uses > on drivers. Below is a list of what was found: > > * atyfb: I ended up splitting the overlapping ranges, it was quite a > bit of work, but I did this mostly to demo the effort needed on a > device driver after a device driver already had used an overlapping > range. To maintain the effective write-combining effects for non-PAT > systems we added ioremap_uc() which can be used on the MMIO range, > this would only be used on the MMIO area, meanwhile the framebuffer > area used ioremap_wc(), the arch_phys_wc_add() call was then used to > trigger write-combining on non-PAT systems due to the above tables. > > * ipath: while a split on ipath was possible the changes are quite > significant, way more significant than what was on atyfb. Fortunately > work on this driver for this triggered a discussion of eventually > removing this device driver, so on v4.3-rc2 it was move to staging to > phase it out. For completeness though I'll explain why the addressing > the overlap was hard: apart from changing the driver to use different > offset bases in different regions the driver also has a debug > userspace filemap for the entire register map, the code there would > need to be modified to use the right virtual address base depending on > the virtual address accessed. The qib driver already has logic which > could be mimic'd for this fortunatley, but still - this is > considerable work. One side hack I had hoped for was that overlapping > ioremap*() calls with different page attribute types would work even > if the 2nd returned __iomem address was not used, fortunately this > does not work though, only using the right virtual address would > ensure the right caching technique is used. Since this driver was > being phased out, we decided to annotate it required pat to be > disabled and warn the user if they booted with PAT enabled. > > * ivtv: the driver does not have the PCI space mapped out > separately, and in fact it actually does not do the math for the > framebuffer, instead it lets > the device's own CPU do that and assume where its at, see > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get > but not a setter. Its not clear if the firmware would make a split > easy. As with ipath, we decided to just require PAT disabled. This was > a reasonable compromise as the hardware was really old and we > estimated perhaps only a few lost souls were using this device driver, > and for them it should be reasonable to disable PAT. > > Fortunately in retrospect it was only old hardware devices that used > overlapping ioremap() calls, but we can't be sure things will remain > that way. It used to be that only framebuffer devices used > write-combining, these days we are seeing infiniband (ipath was one, > qib another) use write-combining for PIO TX for some sort of blast > feature, I would not be surprised if in the future other device > drivers found a quirky use for this. The ivtv case is the *worst* > example we can expect where the firmware hides from us the exact > ranges for write-combining, that we should somehow just hope no one > will ever do again. > > > Hence, alias mapping itself is a supported use-case. However, alias > > mapping with different cache types is not as it causes undefined > > behavior. > > Just as-is at times some types of modifiers, such as was with MTRR. > > In retrospect, based on a cursory review through the phasing of MTRR, > we only have ipath left (and that will be removed soon) and we just > don't know what ivtv does. If we wanted a more in-depth analysis we > might be able to use something like coccinelle to inspect for > overlapping calls, but that may be a bit of a challenge in and of > itself. It may be easier to phase that behavior out by first warning > against it for a few cycles, and then just laying a hammer down and > saying 'though shalt not do this', and only enable exceptions with a > special API. These are great work! I agree that we should phase out those old drivers written to work with MTRR. > > Therefore, PAT module protects from this case by tracking cache types > > used for mapping physical ranges. When a different cache type is > > requested, is_new_memtype_allowed() checks if the request needs to be > > failed or can be changed to the existing type. > > Do we have a map of what is allowed for PAT and non-PAT, similar to > the one for MTRR effects above? This type check and tracking code are part of PAT code, so there is no check for the non-PAT case. > It would seem we should want something similar for other architectures ? As far as I know, other architectures does not support alias maps with different cache types, either. So, I agree that this is a common issue. > But if there isn't much valid use for it, why not just rule this sort of > stuff out and only make it an exception, by only having a few uses being > allowed? I agree that it should fail in general. One exception is WB request, which is allowed to be modified to any other type. I think this is needed for /dev/mem, which maps with WB without any knowledge of a target range. > > I agree that the current implementation is fragile, and some interfaces > > skip such check at all, ex. vm_insert_pfn(). > > Ouch. > > > > > The problem is that without this it remains up to the developer of > > > > the driver to avoid overlapping calls, and a user may just get > > > > sporadic errors if this happens. As another problem case, > > > > set_memor_*() will not fail on MMIO even though set_memor_*() is > > > > designed only for RAM. If the above strategy on avoiding > > > > overlapping is agreeable, could the next step, or an orthogonal > > > > step be to error out on set_memory_*() on IO memory? > > > > > > So how do drivers set up WC or WB MMIO areas? Does none of our > > > upstream video drivers do that? > > > > Drivers use ioremap family with a right cache type when mapping MMIO > > ranges, ex. ioremap_wc(). They do not need to change the type to MMIO. > > RAM is different since it's already mapped with WB at boot-time. > > set_memory_*() allows us to change the type from WB, and put it back > > to WB. > > Shouldn't set_memory_*() fail on IO memory? It does not today. Agreed. It should have had such check to begin with. Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-04 18:51 ` Luis R. Rodriguez 2016-03-04 21:39 ` Toshi Kani @ 2016-03-05 11:42 ` Ingo Molnar 1 sibling, 0 replies; 28+ messages in thread From: Ingo Molnar @ 2016-03-05 11:42 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Toshi Kani, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst, Dennis Dalessandro, Andy Walls, Mauro Carvalho Chehab, Julia Lawall * Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Fri, Mar 4, 2016 at 10:18 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > > On Fri, 2016-03-04 at 10:44 +0100, Ingo Molnar wrote: > >> * Luis R. Rodriguez <mcgrof@kernel.org> wrote: > >> > >> Could you outline a specific case where it's done intentionally - and the > >> purpose behind that intention? > > > > The term "overlapping" is a bit misleading. > > To the driver author its overlapping, its overlapping on the physical address. Guys please stop this crazy talk - when the same physical address is mapped to multiple virtual addresses is called 'aliasing' - the term 'overlapping' is used when memory ranges overlap ... Thanks, Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-04 18:18 ` Toshi Kani 2016-03-04 18:18 ` Toshi Kani 2016-03-04 18:51 ` Luis R. Rodriguez @ 2016-03-05 11:40 ` Ingo Molnar 2016-03-07 17:03 ` Toshi Kani 2 siblings, 1 reply; 28+ messages in thread From: Ingo Molnar @ 2016-03-05 11:40 UTC (permalink / raw) To: Toshi Kani Cc: Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst * Toshi Kani <toshi.kani@hpe.com> wrote: > > So I'd say that since ioremap() in itself is fragile enough, we should work > > towards eliminating overlapping ranges. > > > > The thing is, the whole vmap_area logic is based around non-overlapping > > ranges, sorted into the vmap_area_root rbtree. > > > > Just check the logic in mm/vmalloc.c::alloc_vmap_area(): it's based on finding > > holes in the kernel-virtual allocations. 'Overlapping ranges' is very much not > > part of that logic, at least to my understanding. > > > > How are overlapping ioremap()s even possible with that logic? The allocator > > searches for holes, not allowing for overlaps. What am I missing? > > > > Could you outline a specific case where it's done intentionally - and the > > purpose behind that intention? > > The term "overlapping" is a bit misleading. [...] A bit? It was totally misleading ... You meant virtual aliases for the same physical address, and those of course are allowed, as long the cache attributes are compatible, that is what the whole memtype infrastructure is about, as you yourself note: > [...] This is "alias" mapping -- a physical address range is mapped to multiple > virtual address ranges. There is no overlapping in VMA. > > Such alias mappings are used by multiple modules. For instance, a PMEM range is > mapped to the kernel and user spaces. /dev/mem is another example that creates > a user space mapping to a physical address where other mappings may already > exist. > > Hence, alias mapping itself is a supported use-case. However, alias mapping > with different cache types is not as it causes undefined behavior. Therefore, > PAT module protects from this case by tracking cache types used for mapping > physical ranges. When a different cache type is requested, > is_new_memtype_allowed() checks if the request needs to be failed or can be > changed to the existing type. So where is the problem? The memtype implementation and hence most ioremap() users are supposed to be safe. set_memory_*() APIs are supposed to be safe as well, as they too go via the memtype API. > I agree that the current implementation is fragile, and some interfaces skip > such check at all, ex. vm_insert_pfn(). Most of those are really just low level interfaces for special cases that skip the memtype infrastructure. Thanks, Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-05 11:40 ` Ingo Molnar @ 2016-03-07 17:03 ` Toshi Kani 2016-03-07 17:03 ` Toshi Kani ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Toshi Kani @ 2016-03-07 17:03 UTC (permalink / raw) To: Ingo Molnar Cc: Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Sat, 2016-03-05 at 12:40 +0100, Ingo Molnar wrote: > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > So I'd say that since ioremap() in itself is fragile enough, we > > > should work towards eliminating overlapping ranges. > > > > > > The thing is, the whole vmap_area logic is based around non- > > > overlapping ranges, sorted into the vmap_area_root rbtree. > > > > > > Just check the logic in mm/vmalloc.c::alloc_vmap_area(): it's based > > > on finding holes in the kernel-virtual allocations. 'Overlapping > > > ranges' is very much not part of that logic, at least to my > > > understanding. > > > > > > How are overlapping ioremap()s even possible with that logic? The > > > allocator searches for holes, not allowing for overlaps. What am I > > > missing? > > > > > > Could you outline a specific case where it's done intentionally - and > > > the purpose behind that intention? > > > > The term "overlapping" is a bit misleading. [...] > > A bit? It was totally misleading ... > > You meant virtual aliases for the same physical address, and those of > course are allowed, as long the cache attributes are compatible, that is > what the whole memtype infrastructure is about, as you yourself note: > > > [...] This is "alias" mapping -- a physical address range is mapped to > > multiple virtual address ranges. There is no overlapping in VMA. > > > > Such alias mappings are used by multiple modules. For instance, a PMEM > > range is mapped to the kernel and user spaces. /dev/mem is another > > example that creates a user space mapping to a physical address where > > other mappings may already exist. > > > > Hence, alias mapping itself is a supported use-case. However, alias > > mapping with different cache types is not as it causes undefined > > behavior. Therefore, PAT module protects from this case by tracking > > cache types used for mapping physical ranges. When a different cache > > type is requested, is_new_memtype_allowed() checks if the request needs > > to be failed or can be changed to the existing type. > > So where is the problem? The memtype implementation and hence most > ioremap() users are supposed to be safe. set_memory_*() APIs are supposed > to be safe as well, as they too go via the memtype API. Let me try to summarize... The original issue Luis brought up was that drivers written to work with MTRR may create a single ioremap range covering multiple cache attributes since MTRR can overwrite cache attribute of a certain range. Converting such drivers with PAT-based ioremap interfaces, i.e. ioremap_wc() and ioremap_nocache(), requires a separate ioremap map for each cache attribute, which can be challenging as it may result in overlapping ioremap ranges (in his term) with different cache attributes. So, Luis asked about 'sematics of overlapping ioremap()' calls. Hence, I responded that aliasing mapping itself is supported, but alias with different cache attribute is not. We have checks in place to detect such condition. Overlapping ioremap calls with a different cache attribute either fails or gets redirected to the existing cache attribute on x86. > > I agree that the current implementation is fragile, and some interfaces > > skip such check at all, ex. vm_insert_pfn(). > > Most of those are really just low level interfaces forl cases that skip the memtype infrastructure. Yes, and I'm just stating the fact that some pfn map use-cases, such as mmap, are not tracked in memtype. For example, drm_gem_mmap() -> drm_gem_mmap_obj() sets its VMA as WC attribute. i915_gem_fault() then creates a WC map with vm_insert_pfn() at fault. Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-07 17:03 ` Toshi Kani @ 2016-03-07 17:03 ` Toshi Kani 2016-03-08 12:16 ` Ingo Molnar 2016-03-11 6:47 ` Andy Lutomirski 2 siblings, 0 replies; 28+ messages in thread From: Toshi Kani @ 2016-03-07 17:03 UTC (permalink / raw) To: Ingo Molnar Cc: Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Sat, 2016-03-05 at 12:40 +0100, Ingo Molnar wrote: > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > So I'd say that since ioremap() in itself is fragile enough, we > > > should work towards eliminating overlapping ranges. > > > > > > The thing is, the whole vmap_area logic is based around non- > > > overlapping ranges, sorted into the vmap_area_root rbtree. > > > > > > Just check the logic in mm/vmalloc.c::alloc_vmap_area(): it's based > > > on finding holes in the kernel-virtual allocations. 'Overlapping > > > ranges' is very much not part of that logic, at least to my > > > understanding. > > > > > > How are overlapping ioremap()s even possible with that logic? The > > > allocator searches for holes, not allowing for overlaps. What am I > > > missing? > > > > > > Could you outline a specific case where it's done intentionally - and > > > the purpose behind that intention? > > > > The term "overlapping" is a bit misleading. [...] > > A bit? It was totally misleading ... > > You meant virtual aliases for the same physical address, and those of > course are allowed, as long the cache attributes are compatible, that is > what the whole memtype infrastructure is about, as you yourself note: > > > [...] This is "alias" mapping -- a physical address range is mapped to > > multiple virtual address ranges. There is no overlapping in VMA. > > > > Such alias mappings are used by multiple modules. For instance, a PMEM > > range is mapped to the kernel and user spaces. /dev/mem is another > > example that creates a user space mapping to a physical address where > > other mappings may already exist. > > > > Hence, alias mapping itself is a supported use-case. However, alias > > mapping with different cache types is not as it causes undefined > > behavior. Therefore, PAT module protects from this case by tracking > > cache types used for mapping physical ranges. When a different cache > > type is requested, is_new_memtype_allowed() checks if the request needs > > to be failed or can be changed to the existing type. > > So where is the problem? The memtype implementation and hence most > ioremap() users are supposed to be safe. set_memory_*() APIs are supposed > to be safe as well, as they too go via the memtype API. Let me try to summarize... The original issue Luis brought up was that drivers written to work with MTRR may create a single ioremap range covering multiple cache attributes since MTRR can overwrite cache attribute of a certain range. Converting such drivers with PAT-based ioremap interfaces, i.e. ioremap_wc() and ioremap_nocache(), requires a separate ioremap map for each cache attribute, which can be challenging as it may result in overlapping ioremap ranges (in his term) with different cache attributes. So, Luis asked about 'sematics of overlapping ioremap()' calls. Hence, I responded that aliasing mapping itself is supported, but alias with different cache attribute is not. We have checks in place to detect such condition. Overlapping ioremap calls with a different cache attribute either fails or gets redirected to the existing cache attribute on x86. > > I agree that the current implementation is fragile, and some interfaces > > skip such check at all, ex. vm_insert_pfn(). > > Most of those are really just low level interfaces forl cases that skip the memtype infrastructure. Yes, and I'm just stating the fact that some pfn map use-cases, such as mmap, are not tracked in memtype. For example, drm_gem_mmap() -> drm_gem_mmap_obj() sets its VMA as WC attribute. i915_gem_fault() then creates a WC map with vm_insert_pfn() at fault. Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-07 17:03 ` Toshi Kani 2016-03-07 17:03 ` Toshi Kani @ 2016-03-08 12:16 ` Ingo Molnar 2016-03-09 0:29 ` Toshi Kani 2016-03-11 6:47 ` Andy Lutomirski 2 siblings, 1 reply; 28+ messages in thread From: Ingo Molnar @ 2016-03-08 12:16 UTC (permalink / raw) To: Toshi Kani Cc: Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst * Toshi Kani <toshi.kani@hpe.com> wrote: > > So where is the problem? The memtype implementation and hence most > > ioremap() users are supposed to be safe. set_memory_*() APIs are supposed > > to be safe as well, as they too go via the memtype API. > > Let me try to summarize... > > The original issue Luis brought up was that drivers written to work with > MTRR may create a single ioremap range covering multiple cache attributes > since MTRR can overwrite cache attribute of a certain range. Converting > such drivers with PAT-based ioremap interfaces, i.e. ioremap_wc() and > ioremap_nocache(), requires a separate ioremap map for each cache > attribute, which can be challenging as it may result in overlapping ioremap > ranges (in his term) with different cache attributes. > > So, Luis asked about 'sematics of overlapping ioremap()' calls. Hence, I > responded that aliasing mapping itself is supported, but alias with > different cache attribute is not. We have checks in place to detect such > condition. Overlapping ioremap calls with a different cache attribute > either fails or gets redirected to the existing cache attribute on x86. Ok, fair enough! So to go back to the original suggestion from Luis, I've quoted it, but with a s/overlapping/aliased substitution: > I had suggested long ago then that one possible resolution was for us to add an > API that *enables* aliased ioremap() calls, and only use it on select locations > in the kernel. This means we only have to convert a few users to that call to > white list such semantics, and by default we'd disable aliased calls. To kick > things off -- is this strategy agreeable for all other architectures? I'd say that since the overwhelming majority of ioremap() calls are not aliased, ever, thus making it 'harder' to accidentally alias is probably a good idea. The memtype infrastructure of phyisical memory ranges in that case acts as a security measure, to avoid unintended (not just physically incompatible) aliasing. I suspect it would be helpful during driver development as well. What extra API are you thinking about to enable aliasing in the few cases where it's justified? the other problem listed: > As another problem case, set_memor_*() will not fail on MMIO even though > set_memor_*() is designed only for RAM. So what does this mean exactly? Having WB caching on MMIO area is not good, but UC, WC and WB sure is still sensible in some cases, right? > [...] If the above strategy on avoiding aliasing is agreeable, could the next > step, or an orthogonal step be to error out on set_memory_*() on IO memory? Well, do we have drivers that nevertheless change caching attributes on MMIO areas? Basically if ioremap_uc() and ioremap_wc() is allowed on MMIO areas, then I see no reason in principle why it should be invalid to change the area from UC to WC after it has been ioremap()ed. Thanks, Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-08 12:16 ` Ingo Molnar @ 2016-03-09 0:29 ` Toshi Kani 2016-03-09 9:15 ` Ingo Molnar 0 siblings, 1 reply; 28+ messages in thread From: Toshi Kani @ 2016-03-09 0:29 UTC (permalink / raw) To: Ingo Molnar Cc: Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Tue, 2016-03-08 at 13:16 +0100, Ingo Molnar wrote: > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > So where is the problem? The memtype implementation and hence most > > > ioremap() users are supposed to be safe. set_memory_*() APIs are > > > supposed > > > to be safe as well, as they too go via the memtype API. > > > > Let me try to summarize... > > > > The original issue Luis brought up was that drivers written to work > > with MTRR may create a single ioremap range covering multiple cache > > attributes since MTRR can overwrite cache attribute of a certain range. > > Converting such drivers with PAT-based ioremap interfaces, i.e. > > ioremap_wc() and ioremap_nocache(), requires a separate ioremap map for > > each cache attribute, which can be challenging as it may result in > > overlapping ioremap ranges (in his term) with different cache > > attributes. > > > > So, Luis asked about 'sematics of overlapping ioremap()' calls. Hence, > > I responded that aliasing mapping itself is supported, but alias with > > different cache attribute is not. We have checks in place to detect > > such condition. Overlapping ioremap calls with a different cache > > attribute either fails or gets redirected to the existing cache > > attribute on x86. > > Ok, fair enough! > > So to go back to the original suggestion from Luis, I've quoted it, but > with a s/overlapping/aliased substitution: > > > I had suggested long ago then that one possible resolution was for us > > to add an API that *enables* aliased ioremap() calls, and only use it > > on select locations in the kernel. This means we only have to convert a > > few users to that call to white list such semantics, and by default > > we'd disable aliased calls. To kick things off -- is this strategy > > agreeable for all other architectures? > > I'd say that since the overwhelming majority of ioremap() calls are not > aliased, ever, thus making it 'harder' to accidentally alias is probably > a good idea. Did you mean 'aliased' or 'aliased with different cache attribute'? The former check might be too strict. > The memtype infrastructure of phyisical memory ranges in that case acts > as a security measure, to avoid unintended (not just physically > incompatible) aliasing. I suspect it would be helpful during driver > development as well. The memtype infrastructure does not track caller interfaces. So, it will check against to any map, i.e. kernel & user map. I assume a kernel map is created before user map, though. > What extra API are you thinking about to enable aliasing in the few cases > where it's justified? I'll defer this for Luis... > the other problem listed: > > > As another problem case, set_memor_*() will not fail on MMIO even > > though set_memor_*() is designed only for RAM. > > So what does this mean exactly? Having WB caching on MMIO area is not > good, but UC, WC and WB sure is still sensible in some cases, right? I responded to Luis in other email that: | Drivers use ioremap family with a right cache type when mapping MMIO | ranges, ex. ioremap_wc(). They do not need to change the type to MMIO. | RAM is different since it's already mapped with WB at boot-time. | set_memory_*() allows us to change the type from WB, and put it back to | WB. > > [...] If the above strategy on avoiding aliasing is agreeable, could > > the next step, or an orthogonal step be to error out on set_memory_*() > > on IO memory? > > Well, do we have drivers that nevertheless change caching attributes on > MMIO areas? Not sure. We will need to check all callers of set_memory_xx() if we change it to fail on MMIO. > Basically if ioremap_uc() and ioremap_wc() is allowed on MMIO areas, then > I see no reason in principle why it should be invalid to change the area > from UC to WC after it has been ioremap()ed. The current implementation does not support MMIO. - It does not track cache attribute correctly for MMIO since it uses __pa(). - It only supports attribute transition of {WB -> NewType -> WB} for RAM. RAM is tracked differently that WB is treated as "no map". So, this transition does not cause a conflict on RAM. This will causes a conflict on MMIO when it is tracked correctly. Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-09 0:29 ` Toshi Kani @ 2016-03-09 9:15 ` Ingo Molnar 2016-03-11 22:13 ` Toshi Kani 2016-03-21 17:38 ` Maciej W. Rozycki 0 siblings, 2 replies; 28+ messages in thread From: Ingo Molnar @ 2016-03-09 9:15 UTC (permalink / raw) To: Toshi Kani Cc: Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst * Toshi Kani <toshi.kani@hpe.com> wrote: > On Tue, 2016-03-08 at 13:16 +0100, Ingo Molnar wrote: > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > > So where is the problem? The memtype implementation and hence most > > > > ioremap() users are supposed to be safe. set_memory_*() APIs are > > > > supposed > > > > to be safe as well, as they too go via the memtype API. > > > > > > Let me try to summarize... > > > > > > The original issue Luis brought up was that drivers written to work > > > with MTRR may create a single ioremap range covering multiple cache > > > attributes since MTRR can overwrite cache attribute of a certain range. > > > Converting such drivers with PAT-based ioremap interfaces, i.e. > > > ioremap_wc() and ioremap_nocache(), requires a separate ioremap map for > > > each cache attribute, which can be challenging as it may result in > > > overlapping ioremap ranges (in his term) with different cache > > > attributes. > > > > > > So, Luis asked about 'sematics of overlapping ioremap()' calls. Hence, > > > I responded that aliasing mapping itself is supported, but alias with > > > different cache attribute is not. We have checks in place to detect > > > such condition. Overlapping ioremap calls with a different cache > > > attribute either fails or gets redirected to the existing cache > > > attribute on x86. > > > > Ok, fair enough! > > > > So to go back to the original suggestion from Luis, I've quoted it, but > > with a s/overlapping/aliased substitution: > > > > > I had suggested long ago then that one possible resolution was for us > > > to add an API that *enables* aliased ioremap() calls, and only use it > > > on select locations in the kernel. This means we only have to convert a > > > few users to that call to white list such semantics, and by default > > > we'd disable aliased calls. To kick things off -- is this strategy > > > agreeable for all other architectures? > > > > I'd say that since the overwhelming majority of ioremap() calls are not > > aliased, ever, thus making it 'harder' to accidentally alias is probably > > a good idea. > > Did you mean 'aliased' or 'aliased with different cache attribute'? The former > check might be too strict. I'd say even 'same attribute' aliasing is probably relatively rare. And 'different but compatible cache attribute' is in fact more of a sign that the driver author does the aliasing for a valid _reason_: to have two different types of access methods to the same piece of physical address space... > > The memtype infrastructure of phyisical memory ranges in that case acts as a > > security measure, to avoid unintended (not just physically incompatible) > > aliasing. I suspect it would be helpful during driver development as well. > > The memtype infrastructure does not track caller interfaces. So, it will check > against to any map, i.e. kernel & user map. I assume a kernel map is created > before user map, though. I don't understand this, could you give me a specific example? > > the other problem listed: > > > > > As another problem case, set_memor_*() will not fail on MMIO even > > > though set_memor_*() is designed only for RAM. > > > > So what does this mean exactly? Having WB caching on MMIO area is not > > good, but UC, WC and WB sure is still sensible in some cases, right? > > I responded to Luis in other email that: > | Drivers use ioremap family with a right cache type when mapping MMIO > | ranges, ex. ioremap_wc(). They do not need to change the type to MMIO. > | RAM is different since it's already mapped with WB at boot-time. > | set_memory_*() allows us to change the type from WB, and put it back to > | WB. So there's two different uses here: - set_memory_x() to set the caching attribute - set_memory_x() to set any of the RWX access attributes I'd in fact suggest we split these two uses to avoid confusion: keep set_memory_x() APIs for the RWX access attributes uses, and introduce a new API that sets caching attributes: set_cache_attr_wc() set_cache_attr_uc() set_cache_attr_wb() set_cache_attr_wt() it has the same arguments, so it's basically just a rename initially. And at that point we could definitely argue that set_cache_attr_*() APIs should probably generate a warning for _RAM_, because they mostly make sense for MMIO type of physical addresses, right? Regular RAM should always be WB. Are there cases where we change the caching attribute of RAM for valid reasons, outside of legacy quirks? > > Basically if ioremap_uc() and ioremap_wc() is allowed on MMIO areas, then > > I see no reason in principle why it should be invalid to change the area > > from UC to WC after it has been ioremap()ed. > > The current implementation does not support MMIO. > - It does not track cache attribute correctly for MMIO since it uses > __pa(). Hm, so __pa() works on mmio addresses as well - at least on x86. The whole mmtype tree is physical address based - which too is MMIO compatible in principle. The only problem would be if it was fundamentally struct page * based - but it's not AFAICS. We have a few APIs that work on struct page * arrays, but those are just vectored helper APIs AFAICS. What am I missing? > - It only supports attribute transition of {WB -> NewType -> WB} for RAM. > RAM is tracked differently that WB is treated as "no map". So, this > transition does not cause a conflict on RAM. This will causes a conflict > on MMIO when it is tracked correctly. That looks like a bug? Thanks, Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-09 9:15 ` Ingo Molnar @ 2016-03-11 22:13 ` Toshi Kani 2016-03-16 1:45 ` Luis R. Rodriguez 2016-03-21 17:38 ` Maciej W. Rozycki 1 sibling, 1 reply; 28+ messages in thread From: Toshi Kani @ 2016-03-11 22:13 UTC (permalink / raw) To: Ingo Molnar Cc: Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst Hi Ingo, My apology for the delay... On Wed, 2016-03-09 at 10:15 +0100, Ingo Molnar wrote: > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > On Tue, 2016-03-08 at 13:16 +0100, Ingo Molnar wrote: > > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > > > > So where is the problem? The memtype implementation and hence > > > > > most ioremap() users are supposed to be safe. set_memory_*() APIs > > > > > are supposed to be safe as well, as they too go via the memtype > > > > > API. > > > > > > > > Let me try to summarize... > > > > > > > > The original issue Luis brought up was that drivers written to work > > > > with MTRR may create a single ioremap range covering multiple cache > > > > attributes since MTRR can overwrite cache attribute of a certain > > > > range. Converting such drivers with PAT-based ioremap interfaces, > > > > i.e. ioremap_wc() and ioremap_nocache(), requires a separate > > > > ioremap map for each cache attribute, which can be challenging as > > > > it may result in overlapping ioremap ranges (in his term) with > > > > different cache attributes. > > > > > > > > So, Luis asked about 'sematics of overlapping ioremap()' calls. > > > > Hence, I responded that aliasing mapping itself is supported, but > > > > alias with different cache attribute is not. We have checks in > > > > place to detect such condition. Overlapping ioremap calls with a > > > > different cache attribute either fails or gets redirected to the > > > > existing cache attribute on x86. > > > > > > Ok, fair enough! > > > > > > So to go back to the original suggestion from Luis, I've quoted it, > > > but with a s/overlapping/aliased substitution: > > > > > > > I had suggested long ago then that one possible resolution was for > > > > us to add an API that *enables* aliased ioremap() calls, and only > > > > use it on select locations in the kernel. This means we only have > > > > to convert a few users to that call to white list such semantics, > > > > and by default we'd disable aliased calls. To kick things off -- is > > > > this strategy agreeable for all other architectures? > > > > > > I'd say that since the overwhelming majority of ioremap() calls are > > > not aliased, ever, thus making it 'harder' to accidentally alias is > > > probably a good idea. > > > > Did you mean 'aliased' or 'aliased with different cache attribute'? > > The former check might be too strict. > > I'd say even 'same attribute' aliasing is probably relatively rare. > > And 'different but compatible cache attribute' is in fact more of a sign > that the driver author does the aliasing for a valid _reason_: to have > two different types of access methods to the same piece of physical > address space... Right. So, if we change to fail ioremap() on aliased cases, it'd be easier to start with the different attribute case first. This case should be rare enough that we can manage to identify such callers and make them use a new API as necessary. If we go ahead to fail any aliased cases, it'd be challenging to manage without a regression or two. > > > The memtype infrastructure of phyisical memory ranges in that case > > > acts as a security measure, to avoid unintended (not just physically > > > incompatible) aliasing. I suspect it would be helpful during driver > > > development as well. > > > > The memtype infrastructure does not track caller interfaces. So, it > > will check against to any map, i.e. kernel & user map. I assume a > > kernel map is created before user map, though. > > I don't understand this, could you give me a specific example? Multiple pfn map interfaces use the memtype. So, when ioremap() detects an aliased map in the memtype, it may be created by ioremap(), remap_pfn_range(), or any pfn map that is tracked. So, strictly speaking, we cannot check against other ioremap(). > > > the other problem listed: > > > > > > > As another problem case, set_memor_*() will not fail on MMIO even > > > > though set_memor_*() is designed only for RAM. > > > > > > So what does this mean exactly? Having WB caching on MMIO area is not > > > good, but UC, WC and WB sure is still sensible in some cases, right? > > > > I responded to Luis in other email that: > > > Drivers use ioremap family with a right cache type when mapping MMIO > > > ranges, ex. ioremap_wc(). They do not need to change the type to > > > MMIO. RAM is different since it's already mapped with WB at boot- > > > time. set_memory_*() allows us to change the type from WB, and put it > > > back to WB. > > So there's two different uses here: > > - set_memory_x() to set the caching attribute > - set_memory_x() to set any of the RWX access attributes > > I'd in fact suggest we split these two uses to avoid confusion: keep > set_memory_x() APIs for the RWX access attributes uses, and introduce > a new API that sets caching attributes: > > set_cache_attr_wc() > set_cache_attr_uc() > set_cache_attr_wb() > set_cache_attr_wt() > > it has the same arguments, so it's basically just a rename initially. I think the "set_memory_" prefix implies that their target is regular memory only. > And at that point we could definitely argue that set_cache_attr_*() APIs > should probably generate a warning for _RAM_, because they mostly make > sense for MMIO type of physical addresses, right? Regular RAM should > always be WB. > > Are there cases where we change the caching attribute of RAM for valid > reasons, outside of legacy quirks? ati_create_page_map() is one example that it gets a RAM page by __get_free_page(), and changes it to UC by calling set_memory_uc(). Since RAM is already mapped with WB, the driver has to change its attribute when it needs a different cache type. For MMIO, drivers should be able to create a map with the right attribute since they own the ranges. > > > Basically if ioremap_uc() and ioremap_wc() is allowed on MMIO areas, > > > then I see no reason in principle why it should be invalid to change > > > the area from UC to WC after it has been ioremap()ed. > > > > The current implementation does not support MMIO. > > - It does not track cache attribute correctly for MMIO since it uses > > __pa(). > > Hm, so __pa() works on mmio addresses as well - at least on x86. The > whole mmtype tree is physical address based - which too is MMIO > compatible in principle. > > The only problem would be if it was fundamentally struct page * based - > but it's not AFAICS. We have a few APIs that work on struct page * > arrays, but those are just vectored helper APIs AFAICS. > > What am I missing? I do not think __pa() returns the right physical address for an ioremap'd virtual address since its virtual address is allocated from the vmalloc range. But we can simply use slow_virt_to_phys(), instead. > > - It only supports attribute transition of {WB -> NewType -> WB} for > > RAM. RAM is tracked differently that WB is treated as "no map". So, > > this transition does not cause a conflict on RAM. This will causes a > > conflict on MMIO when it is tracked correctly. > > That looks like a bug? This is by design since set_memory_xx was introduced for RAM only. If we extend it to MMIO, then we need to change how memtype manages MMIO. Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-11 22:13 ` Toshi Kani @ 2016-03-16 1:45 ` Luis R. Rodriguez 2016-03-16 1:45 ` Luis R. Rodriguez 2016-03-17 22:44 ` Toshi Kani 0 siblings, 2 replies; 28+ messages in thread From: Luis R. Rodriguez @ 2016-03-16 1:45 UTC (permalink / raw) To: Toshi Kani, Julia Lawall Cc: Ingo Molnar, Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Fri, Mar 11, 2016 at 03:13:52PM -0700, Toshi Kani wrote: > Hi Ingo, > > My apology for the delay... > > On Wed, 2016-03-09 at 10:15 +0100, Ingo Molnar wrote: > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > On Tue, 2016-03-08 at 13:16 +0100, Ingo Molnar wrote: > > > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > > > > > > So where is the problem? The memtype implementation and hence > > > > > > most ioremap() users are supposed to be safe. set_memory_*() APIs > > > > > > are supposed to be safe as well, as they too go via the memtype > > > > > > API. > > > > > > > > > > Let me try to summarize... > > > > > > > > > > The original issue Luis brought up was that drivers written to work > > > > > with MTRR may create a single ioremap range covering multiple cache > > > > > attributes since MTRR can overwrite cache attribute of a certain > > > > > range. Converting such drivers with PAT-based ioremap interfaces, > > > > > i.e. ioremap_wc() and ioremap_nocache(), requires a separate > > > > > ioremap map for each cache attribute, which can be challenging as > > > > > it may result in overlapping ioremap ranges (in his term) with > > > > > different cache attributes. > > > > > > > > > > So, Luis asked about 'sematics of overlapping ioremap()' calls. > > > > > Hence, I responded that aliasing mapping itself is supported, but > > > > > alias with different cache attribute is not. We have checks in > > > > > place to detect such condition. Overlapping ioremap calls with a > > > > > different cache attribute either fails or gets redirected to the > > > > > existing cache attribute on x86. > > > > > > > > Ok, fair enough! > > > > > > > > So to go back to the original suggestion from Luis, I've quoted it, > > > > but with a s/overlapping/aliased substitution: > > > > > > > > > I had suggested long ago then that one possible resolution was for > > > > > us to add an API that *enables* aliased ioremap() calls, and only > > > > > use it on select locations in the kernel. This means we only have > > > > > to convert a few users to that call to white list such semantics, > > > > > and by default we'd disable aliased calls. To kick things off -- is > > > > > this strategy agreeable for all other architectures? > > > > > > > > I'd say that since the overwhelming majority of ioremap() calls are > > > > not aliased, ever, thus making it 'harder' to accidentally alias is > > > > probably a good idea. > > > > > > Did you mean 'aliased' or 'aliased with different cache attribute'? > > > The former check might be too strict. > > > > I'd say even 'same attribute' aliasing is probably relatively rare. > > > > And 'different but compatible cache attribute' is in fact more of a sign > > that the driver author does the aliasing for a valid _reason_: to have > > two different types of access methods to the same piece of physical > > address space... > > Right. So, if we change to fail ioremap() on aliased cases, it'd be easier > to start with the different attribute case first. This case should be rare > enough that we can manage to identify such callers and make them use a new > API as necessary. If we go ahead to fail any aliased cases, it'd be > challenging to manage without a regression or two. From my experience on the ioremap_wc() crusade, I found that the need for aliasing with different cache types would have been needed in only 3 drivers. For these 3, the atyfb driver I did the proper split in MMIO and framebuffer, but that was significant work. I did this work to demo and document such work. It wasn't easy. For other two, ivtv and ipath we left as requiring "nopat" to be used. The ipath driver is on its way out of the kenrel now through staging, and ivtv, well I am not aware of single human being claiming to use it. The architecture of ivtv actually prohibits us from ever using PAT for write-combining on the framebuffer as the firmware is the only one who knows the write-combining area and hides it from us. We might be able to use tools like Coccinelle to perhaps hunt for the use of aliasing on drivers with different cache attribute types to do a full assessment but I really think that will be really hard to accomplish. If we can learn anything from the ioremap_wc() crusade I'd say its that the need for aliasing with different cache types obviously implies we should disable such drivers with PAT as what we'd really need is a proper split in maps, but history shows the split can be really hard. It sounded like you guys were confirming we currently do not allow for aliasing with different attributes on x86, is that the case for all architectures? If aliasing with different cache attributes is not allowed for x86 and if its also rare for other architectures that just leaves the hunt for valid aliasing uses. That still may be hard to hunt for, but I also suspect it may be rare. > > > > The memtype infrastructure of phyisical memory ranges in that case > > > > acts as a security measure, to avoid unintended (not just physically > > > > incompatible) aliasing. I suspect it would be helpful during driver > > > > development as well. > > > > > > The memtype infrastructure does not track caller interfaces. So, it > > > will check against to any map, i.e. kernel & user map. I assume a > > > kernel map is created before user map, though. > > > > I don't understand this, could you give me a specific example? > > Multiple pfn map interfaces use the memtype. So, when ioremap() detects an > aliased map in the memtype, it may be created by ioremap(), > remap_pfn_range(), or any pfn map that is tracked. So, strictly speaking, > we cannot check against other ioremap(). > > > > > the other problem listed: > > > > > > > > > As another problem case, set_memor_*() will not fail on MMIO even > > > > > though set_memor_*() is designed only for RAM. > > > > > > > > So what does this mean exactly? Having WB caching on MMIO area is not > > > > good, but UC, WC and WB sure is still sensible in some cases, right? > > > > > > I responded to Luis in other email that: > > > > Drivers use ioremap family with a right cache type when mapping MMIO > > > > ranges, ex. ioremap_wc(). They do not need to change the type to > > > > MMIO. RAM is different since it's already mapped with WB at boot- > > > > time. set_memory_*() allows us to change the type from WB, and put it > > > > back to WB. > > > > So there's two different uses here: > > > > - set_memory_x() to set the caching attribute > > - set_memory_x() to set any of the RWX access attributes > > > > I'd in fact suggest we split these two uses to avoid confusion: keep > > set_memory_x() APIs for the RWX access attributes uses, and introduce > > a new API that sets caching attributes: > > > > set_cache_attr_wc() > > set_cache_attr_uc() > > set_cache_attr_wb() > > set_cache_attr_wt() > > > > it has the same arguments, so it's basically just a rename initially. > > I think the "set_memory_" prefix implies that their target is regular > memory only. I did not find any driver using set_memory_wc() on MMIO, its a good thing as that does not work it seems even if it returns no error. I'm not sure of the use of other set_memory_*() on MMIO but I would suspect its not used. A manual hunt may suffice to rule these out. I guess what I'm trying to say is I am not sure we have a need for set_cache_attr_*() APIs, unless of course we find such valid use. > > And at that point we could definitely argue that set_cache_attr_*() APIs > > should probably generate a warning for _RAM_, because they mostly make > > sense for MMIO type of physical addresses, right? Regular RAM should > > always be WB. > > > > Are there cases where we change the caching attribute of RAM for valid > > reasons, outside of legacy quirks? > > ati_create_page_map() is one example that it gets a RAM page > by __get_free_page(), and changes it to UC by calling set_memory_uc(). Should we instead have an API that lets it ask for RAM and of UC type? That would seem a bit cleaner. BTW do you happen to know *why* it needs UC RAM types? > Since RAM is already mapped with WB, the driver has to change its attribute > when it needs a different cache type. For MMIO, drivers should be able to > create a map with the right attribute since they own the ranges. From my own work so far I tend to agree here, but that's just because I've seen how drivers can suck badly, and they can quickly think they can use hacks for things which actually really need a full architectural proper solution in driver or firmware. In the worst case we have insane firmware, but I think that's probably really uncommon and likely and its design was flawed given that at that time PAT was probably rare. > > > > Basically if ioremap_uc() and ioremap_wc() is allowed on MMIO areas, > > > > then I see no reason in principle why it should be invalid to change > > > > the area from UC to WC after it has been ioremap()ed. > > > > > > The current implementation does not support MMIO. > > > - It does not track cache attribute correctly for MMIO since it uses > > > __pa(). > > > > Hm, so __pa() works on mmio addresses as well - at least on x86. The > > whole mmtype tree is physical address based - which too is MMIO > > compatible in principle. > > > > The only problem would be if it was fundamentally struct page * based - > > but it's not AFAICS. We have a few APIs that work on struct page * > > arrays, but those are just vectored helper APIs AFAICS. > > > > What am I missing? > > I do not think __pa() returns the right physical address for an ioremap'd > virtual address since its virtual address is allocated from the vmalloc > range. But we can simply use slow_virt_to_phys(), instead. > > > > - It only supports attribute transition of {WB -> NewType -> WB} for > > > RAM. RAM is tracked differently that WB is treated as "no map". So, > > > this transition does not cause a conflict on RAM. This will causes a > > > conflict on MMIO when it is tracked correctly. > > > > That looks like a bug? > > This is by design since set_memory_xx was introduced for RAM only. If we > extend it to MMIO, then we need to change how memtype manages MMIO. I'd be afraid to *want* to support this on MMIO as I would only expect hacks from drivers. Luis ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-16 1:45 ` Luis R. Rodriguez @ 2016-03-16 1:45 ` Luis R. Rodriguez 2016-03-17 22:44 ` Toshi Kani 1 sibling, 0 replies; 28+ messages in thread From: Luis R. Rodriguez @ 2016-03-16 1:45 UTC (permalink / raw) To: Toshi Kani, Julia Lawall Cc: Ingo Molnar, Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Fri, Mar 11, 2016 at 03:13:52PM -0700, Toshi Kani wrote: > Hi Ingo, > > My apology for the delay... > > On Wed, 2016-03-09 at 10:15 +0100, Ingo Molnar wrote: > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > On Tue, 2016-03-08 at 13:16 +0100, Ingo Molnar wrote: > > > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > > > > > > So where is the problem? The memtype implementation and hence > > > > > > most ioremap() users are supposed to be safe. set_memory_*() APIs > > > > > > are supposed to be safe as well, as they too go via the memtype > > > > > > API. > > > > > > > > > > Let me try to summarize... > > > > > > > > > > The original issue Luis brought up was that drivers written to work > > > > > with MTRR may create a single ioremap range covering multiple cache > > > > > attributes since MTRR can overwrite cache attribute of a certain > > > > > range. Converting such drivers with PAT-based ioremap interfaces, > > > > > i.e. ioremap_wc() and ioremap_nocache(), requires a separate > > > > > ioremap map for each cache attribute, which can be challenging as > > > > > it may result in overlapping ioremap ranges (in his term) with > > > > > different cache attributes. > > > > > > > > > > So, Luis asked about 'sematics of overlapping ioremap()' calls. > > > > > Hence, I responded that aliasing mapping itself is supported, but > > > > > alias with different cache attribute is not. We have checks in > > > > > place to detect such condition. Overlapping ioremap calls with a > > > > > different cache attribute either fails or gets redirected to the > > > > > existing cache attribute on x86. > > > > > > > > Ok, fair enough! > > > > > > > > So to go back to the original suggestion from Luis, I've quoted it, > > > > but with a s/overlapping/aliased substitution: > > > > > > > > > I had suggested long ago then that one possible resolution was for > > > > > us to add an API that *enables* aliased ioremap() calls, and only > > > > > use it on select locations in the kernel. This means we only have > > > > > to convert a few users to that call to white list such semantics, > > > > > and by default we'd disable aliased calls. To kick things off -- is > > > > > this strategy agreeable for all other architectures? > > > > > > > > I'd say that since the overwhelming majority of ioremap() calls are > > > > not aliased, ever, thus making it 'harder' to accidentally alias is > > > > probably a good idea. > > > > > > Did you mean 'aliased' or 'aliased with different cache attribute'? > > > The former check might be too strict. > > > > I'd say even 'same attribute' aliasing is probably relatively rare. > > > > And 'different but compatible cache attribute' is in fact more of a sign > > that the driver author does the aliasing for a valid _reason_: to have > > two different types of access methods to the same piece of physical > > address space... > > Right. So, if we change to fail ioremap() on aliased cases, it'd be easier > to start with the different attribute case first. This case should be rare > enough that we can manage to identify such callers and make them use a new > API as necessary. If we go ahead to fail any aliased cases, it'd be > challenging to manage without a regression or two. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-16 1:45 ` Luis R. Rodriguez 2016-03-16 1:45 ` Luis R. Rodriguez @ 2016-03-17 22:44 ` Toshi Kani 2016-04-13 21:16 ` Luis R. Rodriguez 1 sibling, 1 reply; 28+ messages in thread From: Toshi Kani @ 2016-03-17 22:44 UTC (permalink / raw) To: Luis R. Rodriguez, Julia Lawall Cc: Ingo Molnar, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Wed, 2016-03-16 at 02:45 +0100, Luis R. Rodriguez wrote: > On Fri, Mar 11, 2016 at 03:13:52PM -0700, Toshi Kani wrote: > > On Wed, 2016-03-09 at 10:15 +0100, Ingo Molnar wrote: > > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > > > On Tue, 2016-03-08 at 13:16 +0100, Ingo Molnar wrote: > > > > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > : > > > > Did you mean 'aliased' or 'aliased with different cache attribute'? > > > > The former check might be too strict. > > > > > > I'd say even 'same attribute' aliasing is probably relatively rare. > > > > > > And 'different but compatible cache attribute' is in fact more of a > > > sign that the driver author does the aliasing for a valid _reason_: > > > to have two different types of access methods to the same piece of > > > physical address space... > > > > Right. So, if we change to fail ioremap() on aliased cases, it'd be > > easier to start with the different attribute case first. This case > > should be rare enough that we can manage to identify such callers and > > make them use a new API as necessary. If we go ahead to fail any > > aliased cases, it'd be challenging to manage without a regression or > > two. > > From my experience on the ioremap_wc() crusade, I found that the need for > aliasing with different cache types would have been needed in only 3 > drivers. For these 3, the atyfb driver I did the proper split in MMIO and > framebuffer, but that was significant work. I did this work to demo and > document such work. It wasn't easy. For other two, ivtv and ipath we left > as requiring "nopat" to be used. The ipath driver is on its way out of > the kenrel now through staging, and ivtv, well I am not aware of single > human being claiming to use it. The architecture of ivtv actually > prohibits us from ever using PAT for write-combining on the framebuffer > as the firmware is the only one who knows the write-combining area and > hides it from us. At glace, there are 863 references to ioremap(), 329 references to ioremap_nocache(), and only 68 references to ioremap_wc() on x86. There are many more ioremap callers with UC mappings than WC mappings, and it is hard to say that they never get aliased. > We might be able to use tools like Coccinelle to perhaps hunt for > the use of aliasing on drivers with different cache attribute types > to do a full assessment but I really think that will be really hard > to accomplish. > > If we can learn anything from the ioremap_wc() crusade I'd say its that > the need for aliasing with different cache types obviously implies we > should disable such drivers with PAT as what we'd really need is a proper > split in maps, but history shows the split can be really hard. It sounded > like you guys were confirming we currently do not allow for aliasing with > different attributes on x86, is that the case for all architectures? > > If aliasing with different cache attributes is not allowed for x86 and > if its also rare for other architectures that just leaves the hunt for > valid aliasing uses. That still may be hard to hunt for, but I also > suspect it may be rare. Yes, I'd fail the different cache attribute case if we are to place more strict check. : > > > > I think the "set_memory_" prefix implies that their target is regular > > memory only. > > I did not find any driver using set_memory_wc() on MMIO, its a good thing > as that does not work it seems even if it returns no error. I'm not sure > of the use of other set_memory_*() on MMIO but I would suspect its not > used. A manual hunt may suffice to rule these out. It's good to know that you did not find any case on MMIO. The thing is, set_memory_wc() actually works on MMIO today... This is because __pa() returns a bogus address, which skips the alias check in the memtype. > I guess what I'm trying to say is I am not sure we have a need for > set_cache_attr_*() APIs, unless of course we find such valid use. > > > > And at that point we could definitely argue that set_cache_attr_*() > > > APIs should probably generate a warning for _RAM_, because they > > > mostly make sense for MMIO type of physical addresses, right? Regular > > > RAM should always be WB. > > > > > > Are there cases where we change the caching attribute of RAM for > > > valid reasons, outside of legacy quirks? > > > > ati_create_page_map() is one example that it gets a RAM page > > by __get_free_page(), and changes it to UC by calling set_memory_uc(). > > Should we instead have an API that lets it ask for RAM and of UC type? > That would seem a bit cleaner. BTW do you happen to know *why* it needs > UC RAM types? This RAM page is then shared between graphic card and CPU. I think this is because graphic card cannot snoop the cache. > > : > > > > - It only supports attribute transition of {WB -> NewType -> WB} > > > > for RAM. RAM is tracked differently that WB is treated as "no > > > > map". So, this transition does not cause a conflict on RAM. This > > > > will causes a conflict on MMIO when it is tracked correctly. > > > > > > That looks like a bug? > > > > This is by design since set_memory_xx was introduced for RAM only. If > > we extend it to MMIO, then we need to change how memtype manages MMIO. > > I'd be afraid to *want* to support this on MMIO as I would only expect > hacks from drivers. Agreed, with the hope that they are not used on MMIO already... Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-17 22:44 ` Toshi Kani @ 2016-04-13 21:16 ` Luis R. Rodriguez 2016-04-15 14:47 ` Toshi Kani 0 siblings, 1 reply; 28+ messages in thread From: Luis R. Rodriguez @ 2016-04-13 21:16 UTC (permalink / raw) To: Toshi Kani, Maciej W. Rozycki Cc: Luis R. Rodriguez, Julia Lawall, Ingo Molnar, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Thu, Mar 17, 2016 at 04:44:53PM -0600, Toshi Kani wrote: > On Wed, 2016-03-16 at 02:45 +0100, Luis R. Rodriguez wrote: > > On Fri, Mar 11, 2016 at 03:13:52PM -0700, Toshi Kani wrote: > > > On Wed, 2016-03-09 at 10:15 +0100, Ingo Molnar wrote: > > > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > > > > > On Tue, 2016-03-08 at 13:16 +0100, Ingo Molnar wrote: > > > > > > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > > > : > > > > > Did you mean 'aliased' or 'aliased with different cache attribute'? > > > > > The former check might be too strict. > > > > > > > > I'd say even 'same attribute' aliasing is probably relatively rare. > > > > > > > > And 'different but compatible cache attribute' is in fact more of a > > > > sign that the driver author does the aliasing for a valid _reason_: > > > > to have two different types of access methods to the same piece of > > > > physical address space... > > > > > > Right. So, if we change to fail ioremap() on aliased cases, it'd be > > > easier to start with the different attribute case first. This case > > > should be rare enough that we can manage to identify such callers and > > > make them use a new API as necessary. If we go ahead to fail any > > > aliased cases, it'd be challenging to manage without a regression or > > > two. > > > > From my experience on the ioremap_wc() crusade, I found that the need for > > aliasing with different cache types would have been needed in only 3 > > drivers. For these 3, the atyfb driver I did the proper split in MMIO and > > framebuffer, but that was significant work. I did this work to demo and > > document such work. It wasn't easy. For other two, ivtv and ipath we left > > as requiring "nopat" to be used. The ipath driver is on its way out of > > the kenrel now through staging, and ivtv, well I am not aware of single > > human being claiming to use it. The architecture of ivtv actually > > prohibits us from ever using PAT for write-combining on the framebuffer > > as the firmware is the only one who knows the write-combining area and > > hides it from us. > > At glace, there are 863 references to ioremap(), 329 references to > ioremap_nocache(), and only 68 references to ioremap_wc() on x86. There > are many more ioremap callers with UC mappings than WC mappings, and it is > hard to say that they never get aliased. We need to start somewhere. If we really want to vet / white list aliasing we probably will need both semantic analysis but perhaps also manual vetting, and finally a phase where we help WARN on uses we did not white-list. > > We might be able to use tools like Coccinelle to perhaps hunt for > > the use of aliasing on drivers with different cache attribute types > > to do a full assessment but I really think that will be really hard > > to accomplish. > > > > If we can learn anything from the ioremap_wc() crusade I'd say its that > > the need for aliasing with different cache types obviously implies we > > should disable such drivers with PAT as what we'd really need is a proper > > split in maps, but history shows the split can be really hard. It sounded > > like you guys were confirming we currently do not allow for aliasing with > > different attributes on x86, is that the case for all architectures? > > > > If aliasing with different cache attributes is not allowed for x86 and > > if its also rare for other architectures that just leaves the hunt for > > valid aliasing uses. That still may be hard to hunt for, but I also > > suspect it may be rare. > > Yes, I'd fail the different cache attribute case if we are to place more > strict check. OK it seems this is a good starting point. How can we get a general architecture consensus aliasing with different cache attributes is a terrible idea ? Perhaps a patch to WARN/error out and let architectures opt in to this piece of code? > > > I think the "set_memory_" prefix implies that their target is regular > > > memory only. > > > > I did not find any driver using set_memory_wc() on MMIO, its a good thing > > as that does not work it seems even if it returns no error. I'm not sure > > of the use of other set_memory_*() on MMIO but I would suspect its not > > used. A manual hunt may suffice to rule these out. > > It's good to know that you did not find any case on MMIO. The thing is, > set_memory_wc() actually works on MMIO today... This is because __pa() > returns a bogus address, which skips the alias check in the memtype. Ingo, are you happy with that ? I honestly do not see the need for use of set_memory_wc() for the cases I reviewed, I think the case for write-combining can simply be addressed currently with ioremap_wc(). > > I guess what I'm trying to say is I am not sure we have a need for > > set_cache_attr_*() APIs, unless of course we find such valid use. > > > > > > And at that point we could definitely argue that set_cache_attr_*() > > > > APIs should probably generate a warning for _RAM_, because they > > > > mostly make sense for MMIO type of physical addresses, right? Regular > > > > RAM should always be WB. > > > > > > > > Are there cases where we change the caching attribute of RAM for > > > > valid reasons, outside of legacy quirks? > > > > > > ati_create_page_map() is one example that it gets a RAM page > > > by __get_free_page(), and changes it to UC by calling set_memory_uc(). > > > > Should we instead have an API that lets it ask for RAM and of UC type? > > That would seem a bit cleaner. BTW do you happen to know *why* it needs > > UC RAM types? > > This RAM page is then shared between graphic card and CPU. I think this is > because graphic card cannot snoop the cache. Was this reason alone sufficient to open such APIs broadly for RAM? > > > > > - It only supports attribute transition of {WB -> NewType -> WB} > > > > > for RAM. RAM is tracked differently that WB is treated as "no > > > > > map". So, this transition does not cause a conflict on RAM. This > > > > > will causes a conflict on MMIO when it is tracked correctly. > > > > > > > > That looks like a bug? > > > > > > This is by design since set_memory_xx was introduced for RAM only. If > > > we extend it to MMIO, then we need to change how memtype manages MMIO. > > > > I'd be afraid to *want* to support this on MMIO as I would only expect > > hacks from drivers. > > Agreed, with the hope that they are not used on MMIO already... OK we'll need to review this. Luis ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-04-13 21:16 ` Luis R. Rodriguez @ 2016-04-15 14:47 ` Toshi Kani 2016-04-15 14:47 ` Toshi Kani 2016-04-16 9:20 ` Ingo Molnar 0 siblings, 2 replies; 28+ messages in thread From: Toshi Kani @ 2016-04-15 14:47 UTC (permalink / raw) To: Luis R. Rodriguez, Maciej W. Rozycki Cc: Julia Lawall, Ingo Molnar, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Wed, 2016-04-13 at 23:16 +0200, Luis R. Rodriguez wrote: > On Thu, Mar 17, 2016 at 04:44:53PM -0600, Toshi Kani wrote: > > On Wed, 2016-03-16 at 02:45 +0100, Luis R. Rodriguez wrote: > > > On Fri, Mar 11, 2016 at 03:13:52PM -0700, Toshi Kani wrote: : > > > > > > If aliasing with different cache attributes is not allowed for x86 > > > and if its also rare for other architectures that just leaves the > > > hunt for valid aliasing uses. That still may be hard to hunt for, but > > > I also suspect it may be rare. > > > > Yes, I'd fail the different cache attribute case if we are to place > > more strict check. > > OK it seems this is a good starting point. How can we get a general > architecture consensus aliasing with different cache attributes is a > terrible idea ? Perhaps a patch to WARN/error out and let architectures > opt in to this piece of code? I expect aliasing with different cache attributes is a bad idea on most architectures. Given the fact that track_pfn_remap(), track_pfn_insert(), etc. are only implemented on x86, I suspect that other architectures would not be able to implement such check easily, though. On x86, ioremap() and remap_pfn_range() already fail on a conflicting cache type if it is not allowed by the rule defined in is_new_memtype_allowed(). This exception handling is necessary for remap_pfn_range() called by /dev/mem, but I do not think it's necessary for ioremap(). I think we can start from adding a warning message to ioremap(). > > > > > Are there cases where we change the caching attribute of RAM for > > > > > valid reasons, outside of legacy quirks? > > > > > > > > ati_create_page_map() is one example that it gets a RAM page > > > > by __get_free_page(), and changes it to UC by calling set_memory_uc > > > > (). > > > > > > Should we instead have an API that lets it ask for RAM and of UC > > > type? That would seem a bit cleaner. BTW do you happen to know *why* > > > it needs UC RAM types? > > > > This RAM page is then shared between graphic card and CPU. I think > > this is because graphic card cannot snoop the cache. > > Was this reason alone sufficient to open such APIs broadly for RAM? According to commit 75cbade8ea3, such APIs were introduced because drivers previously had to deal with low-level staff. So, I think we need to keep them as long as we have such drivers... Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-04-15 14:47 ` Toshi Kani @ 2016-04-15 14:47 ` Toshi Kani 2016-04-16 9:20 ` Ingo Molnar 1 sibling, 0 replies; 28+ messages in thread From: Toshi Kani @ 2016-04-15 14:47 UTC (permalink / raw) To: Luis R. Rodriguez, Maciej W. Rozycki Cc: Julia Lawall, Ingo Molnar, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Wed, 2016-04-13 at 23:16 +0200, Luis R. Rodriguez wrote: > On Thu, Mar 17, 2016 at 04:44:53PM -0600, Toshi Kani wrote: > > On Wed, 2016-03-16 at 02:45 +0100, Luis R. Rodriguez wrote: > > > On Fri, Mar 11, 2016 at 03:13:52PM -0700, Toshi Kani wrote: : > > > > > > If aliasing with different cache attributes is not allowed for x86 > > > and if its also rare for other architectures that just leaves the > > > hunt for valid aliasing uses. That still may be hard to hunt for, but > > > I also suspect it may be rare. > > > > Yes, I'd fail the different cache attribute case if we are to place > > more strict check. > > OK it seems this is a good starting point. How can we get a general > architecture consensus aliasing with different cache attributes is a > terrible idea ? Perhaps a patch to WARN/error out and let architectures > opt in to this piece of code? I expect aliasing with different cache attributes is a bad idea on most architectures. Given the fact that track_pfn_remap(), track_pfn_insert(), etc. are only implemented on x86, I suspect that other architectures would not be able to implement such check easily, though. On x86, ioremap() and remap_pfn_range() already fail on a conflicting cache type if it is not allowed by the rule defined in is_new_memtype_allowed(). This exception handling is necessary for remap_pfn_range() called by /dev/mem, but I do not think it's necessary for ioremap(). I think we can start from adding a warning message to ioremap(). > > > > > Are there cases where we change the caching attribute of RAM for > > > > > valid reasons, outside of legacy quirks? > > > > > > > > ati_create_page_map() is one example that it gets a RAM page > > > > by __get_free_page(), and changes it to UC by calling set_memory_uc > > > > (). > > > > > > Should we instead have an API that lets it ask for RAM and of UC > > > type? That would seem a bit cleaner. BTW do you happen to know *why* > > > it needs UC RAM types? > > > > This RAM page is then shared between graphic card and CPU. I think > > this is because graphic card cannot snoop the cache. > > Was this reason alone sufficient to open such APIs broadly for RAM? According to commit 75cbade8ea3, such APIs were introduced because drivers previously had to deal with low-level staff. So, I think we need to keep them as long as we have such drivers... Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-04-15 14:47 ` Toshi Kani 2016-04-15 14:47 ` Toshi Kani @ 2016-04-16 9:20 ` Ingo Molnar 2016-04-16 9:20 ` Ingo Molnar 1 sibling, 1 reply; 28+ messages in thread From: Ingo Molnar @ 2016-04-16 9:20 UTC (permalink / raw) To: Toshi Kani Cc: Luis R. Rodriguez, Maciej W. Rozycki, Julia Lawall, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst * Toshi Kani <toshi.kani@hpe.com> wrote: > On x86, ioremap() and remap_pfn_range() already fail on a conflicting cache > type if it is not allowed by the rule defined in is_new_memtype_allowed(). > This exception handling is necessary for remap_pfn_range() called by > /dev/mem, but I do not think it's necessary for ioremap(). I think we can > start from adding a warning message to ioremap(). Agreed, we should warn and map the situation before doing any behavioral change that isn't a fix. Thanks, Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-04-16 9:20 ` Ingo Molnar @ 2016-04-16 9:20 ` Ingo Molnar 0 siblings, 0 replies; 28+ messages in thread From: Ingo Molnar @ 2016-04-16 9:20 UTC (permalink / raw) To: Toshi Kani Cc: Luis R. Rodriguez, Maciej W. Rozycki, Julia Lawall, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst * Toshi Kani <toshi.kani@hpe.com> wrote: > On x86, ioremap() and remap_pfn_range() already fail on a conflicting cache > type if it is not allowed by the rule defined in is_new_memtype_allowed(). > This exception handling is necessary for remap_pfn_range() called by > /dev/mem, but I do not think it's necessary for ioremap(). I think we can > start from adding a warning message to ioremap(). Agreed, we should warn and map the situation before doing any behavioral change that isn't a fix. Thanks, Ingo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-09 9:15 ` Ingo Molnar 2016-03-11 22:13 ` Toshi Kani @ 2016-03-21 17:38 ` Maciej W. Rozycki 2016-04-13 21:03 ` Luis R. Rodriguez 1 sibling, 1 reply; 28+ messages in thread From: Maciej W. Rozycki @ 2016-03-21 17:38 UTC (permalink / raw) To: Ingo Molnar Cc: Toshi Kani, Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Wed, 9 Mar 2016, Ingo Molnar wrote: > > > So to go back to the original suggestion from Luis, I've quoted it, but > > > with a s/overlapping/aliased substitution: > > > > > > > I had suggested long ago then that one possible resolution was for us > > > > to add an API that *enables* aliased ioremap() calls, and only use it > > > > on select locations in the kernel. This means we only have to convert a > > > > few users to that call to white list such semantics, and by default > > > > we'd disable aliased calls. To kick things off -- is this strategy > > > > agreeable for all other architectures? > > > > > > I'd say that since the overwhelming majority of ioremap() calls are not > > > aliased, ever, thus making it 'harder' to accidentally alias is probably > > > a good idea. > > > > Did you mean 'aliased' or 'aliased with different cache attribute'? The former > > check might be too strict. > > I'd say even 'same attribute' aliasing is probably relatively rare. Please note that aliased cached mappings (any kinds of, not necessarily from `ioremap') cause a lot of headache (read: handling trouble) with architectures such as MIPS which support virtually indexed caches which suffer from cache aliasing. There is a risk of data corruption if the same physical memory address space location is accessed through different virtual mappings as not all hardware catches duplicate cache entries created in such a case. We handle it in software for user mappings (although I keep having a feeling something always keeps escaping, due to the vast diversity of cache configurations possible), however I don't think we do for `ioremap', so disallowing aliased `ioremap' mappings by default sounds like a good idea to me. FWIW, Maciej ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-21 17:38 ` Maciej W. Rozycki @ 2016-04-13 21:03 ` Luis R. Rodriguez 0 siblings, 0 replies; 28+ messages in thread From: Luis R. Rodriguez @ 2016-04-13 21:03 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ingo Molnar, Toshi Kani, Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Mon, Mar 21, 2016 at 05:38:40PM +0000, Maciej W. Rozycki wrote: > On Wed, 9 Mar 2016, Ingo Molnar wrote: > > > > > So to go back to the original suggestion from Luis, I've quoted it, but > > > > with a s/overlapping/aliased substitution: > > > > > > > > > I had suggested long ago then that one possible resolution was for us > > > > > to add an API that *enables* aliased ioremap() calls, and only use it > > > > > on select locations in the kernel. This means we only have to convert a > > > > > few users to that call to white list such semantics, and by default > > > > > we'd disable aliased calls. To kick things off -- is this strategy > > > > > agreeable for all other architectures? > > > > > > > > I'd say that since the overwhelming majority of ioremap() calls are not > > > > aliased, ever, thus making it 'harder' to accidentally alias is probably > > > > a good idea. > > > > > > Did you mean 'aliased' or 'aliased with different cache attribute'? The former > > > check might be too strict. > > > > I'd say even 'same attribute' aliasing is probably relatively rare. > > Please note that aliased cached mappings (any kinds of, not necessarily > from `ioremap') cause a lot of headache (read: handling trouble) with > architectures such as MIPS which support virtually indexed caches which > suffer from cache aliasing. There is a risk of data corruption if the > same physical memory address space location is accessed through different > virtual mappings as not all hardware catches duplicate cache entries > created in such a case. > > We handle it in software for user mappings (although I keep having a > feeling something always keeps escaping, due to the vast diversity of > cache configurations possible), however I don't think we do for `ioremap', > so disallowing aliased `ioremap' mappings by default sounds like a good > idea to me. Great, well lets do the work then. Luis ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-07 17:03 ` Toshi Kani 2016-03-07 17:03 ` Toshi Kani 2016-03-08 12:16 ` Ingo Molnar @ 2016-03-11 6:47 ` Andy Lutomirski 2016-03-11 22:36 ` Toshi Kani 2 siblings, 1 reply; 28+ messages in thread From: Andy Lutomirski @ 2016-03-11 6:47 UTC (permalink / raw) To: Toshi Kani Cc: Ingo Molnar, Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Mon, Mar 7, 2016 at 9:03 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > Let me try to summarize... > > The original issue Luis brought up was that drivers written to work with > MTRR may create a single ioremap range covering multiple cache attributes > since MTRR can overwrite cache attribute of a certain range. Converting > such drivers with PAT-based ioremap interfaces, i.e. ioremap_wc() and > ioremap_nocache(), requires a separate ioremap map for each cache > attribute, which can be challenging as it may result in overlapping ioremap > ranges (in his term) with different cache attributes. > > So, Luis asked about 'sematics of overlapping ioremap()' calls. Hence, I > responded that aliasing mapping itself is supported, but alias with > different cache attribute is not. We have checks in place to detect such > condition. Overlapping ioremap calls with a different cache attribute > either fails or gets redirected to the existing cache attribute on x86. A little off-topic, but someone reminded me recently: most recent CPUs have self-snoop. It's poorly documented, but on self-snooping CPUs, I think that a lot of the aliasing issues go away. We may be able to optimize the code quite a bit on these CPUs. I also wonder whether we can drop a bunch of the memtype tracking. After all, if we have aliases of different types on a self-snooping CPU and /dev/mem is locked down hard enough, we could maybe get away with letting self-snoop handle all the conflicts. (We could also make /dev/mem always do UC if it would help.) --Andy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-11 6:47 ` Andy Lutomirski @ 2016-03-11 22:36 ` Toshi Kani 2016-03-13 1:02 ` Andy Lutomirski 0 siblings, 1 reply; 28+ messages in thread From: Toshi Kani @ 2016-03-11 22:36 UTC (permalink / raw) To: Andy Lutomirski Cc: Ingo Molnar, Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Thu, 2016-03-10 at 22:47 -0800, Andy Lutomirski wrote: > On Mon, Mar 7, 2016 at 9:03 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > > Let me try to summarize... > > > > The original issue Luis brought up was that drivers written to work > > with MTRR may create a single ioremap range covering multiple cache > > attributes since MTRR can overwrite cache attribute of a certain > > range. Converting such drivers with PAT-based ioremap interfaces, i.e. > > ioremap_wc() and ioremap_nocache(), requires a separate ioremap map for > > each cache attribute, which can be challenging as it may result in > > overlapping ioremap ranges (in his term) with different cache > > attributes. > > > > So, Luis asked about 'sematics of overlapping ioremap()' calls. Hence, > > I responded that aliasing mapping itself is supported, but alias with > > different cache attribute is not. We have checks in place to detect > > such condition. Overlapping ioremap calls with a different cache > > attribute either fails or gets redirected to the existing cache > > attribute on x86. > > A little off-topic, but someone reminded me recently: most recent CPUs > have self-snoop. It's poorly documented, but on self-snooping CPUs, I > think that a lot of the aliasing issues go away. We may be able to > optimize the code quite a bit on these CPUs. Interesting. I wonder how much we can rely on this feature. Yes, by looking at Intel SDM, it is indeed poorly documented. :-( > I also wonder whether we can drop a bunch of the memtype tracking. > After all, if we have aliases of different types on a self-snooping > CPU and /dev/mem is locked down hard enough, we could maybe get away > with letting self-snoop handle all the conflicts. > > (We could also make /dev/mem always do UC if it would help.) It'd be interesting to know how it performs on an aliased map when it works correctly... Thanks, -Toshi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Overlapping ioremap() calls, set_memory_*() semantics 2016-03-11 22:36 ` Toshi Kani @ 2016-03-13 1:02 ` Andy Lutomirski 0 siblings, 0 replies; 28+ messages in thread From: Andy Lutomirski @ 2016-03-13 1:02 UTC (permalink / raw) To: Toshi Kani Cc: Ingo Molnar, Luis R. Rodriguez, Toshi Kani, Paul McKenney, Dave Airlie, Benjamin Herrenschmidt, linux-kernel, linux-arch, X86 ML, Daniel Vetter, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Borislav Petkov, Linus Torvalds, Andrew Morton, Andy Lutomirski, Brian Gerst On Fri, Mar 11, 2016 at 2:36 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > On Thu, 2016-03-10 at 22:47 -0800, Andy Lutomirski wrote: >> On Mon, Mar 7, 2016 at 9:03 AM, Toshi Kani <toshi.kani@hpe.com> wrote: >> > Let me try to summarize... >> > >> > The original issue Luis brought up was that drivers written to work >> > with MTRR may create a single ioremap range covering multiple cache >> > attributes since MTRR can overwrite cache attribute of a certain >> > range. Converting such drivers with PAT-based ioremap interfaces, i.e. >> > ioremap_wc() and ioremap_nocache(), requires a separate ioremap map for >> > each cache attribute, which can be challenging as it may result in >> > overlapping ioremap ranges (in his term) with different cache >> > attributes. >> > >> > So, Luis asked about 'sematics of overlapping ioremap()' calls. Hence, >> > I responded that aliasing mapping itself is supported, but alias with >> > different cache attribute is not. We have checks in place to detect >> > such condition. Overlapping ioremap calls with a different cache >> > attribute either fails or gets redirected to the existing cache >> > attribute on x86. >> >> A little off-topic, but someone reminded me recently: most recent CPUs >> have self-snoop. It's poorly documented, but on self-snooping CPUs, I >> think that a lot of the aliasing issues go away. We may be able to >> optimize the code quite a bit on these CPUs. > > Interesting. I wonder how much we can rely on this feature. Yes, by > looking at Intel SDM, it is indeed poorly documented. :-( Any Intel people want to give us a hint? --Andy ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2016-04-16 9:20 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-03-03 21:28 Overlapping ioremap() calls, set_memory_*() semantics Luis R. Rodriguez 2016-03-03 21:28 ` Luis R. Rodriguez 2016-03-04 9:44 ` Ingo Molnar 2016-03-04 18:18 ` Toshi Kani 2016-03-04 18:18 ` Toshi Kani 2016-03-04 18:51 ` Luis R. Rodriguez 2016-03-04 21:39 ` Toshi Kani 2016-03-05 11:42 ` Ingo Molnar 2016-03-05 11:40 ` Ingo Molnar 2016-03-07 17:03 ` Toshi Kani 2016-03-07 17:03 ` Toshi Kani 2016-03-08 12:16 ` Ingo Molnar 2016-03-09 0:29 ` Toshi Kani 2016-03-09 9:15 ` Ingo Molnar 2016-03-11 22:13 ` Toshi Kani 2016-03-16 1:45 ` Luis R. Rodriguez 2016-03-16 1:45 ` Luis R. Rodriguez 2016-03-17 22:44 ` Toshi Kani 2016-04-13 21:16 ` Luis R. Rodriguez 2016-04-15 14:47 ` Toshi Kani 2016-04-15 14:47 ` Toshi Kani 2016-04-16 9:20 ` Ingo Molnar 2016-04-16 9:20 ` Ingo Molnar 2016-03-21 17:38 ` Maciej W. Rozycki 2016-04-13 21:03 ` Luis R. Rodriguez 2016-03-11 6:47 ` Andy Lutomirski 2016-03-11 22:36 ` Toshi Kani 2016-03-13 1:02 ` Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).