On Thu, 23 Jul 2015, Vlastimil Babka wrote: > On 07/22/2015 08:43 PM, Eric B Munson wrote: > > On Wed, 22 Jul 2015, Vlastimil Babka wrote: > > > >> > >> Hi, > >> > >> I think you should include a complete description of which > >> transitions for vma states and mlock2/munlock2 flags applied on them > >> are valid and what they do. It will also help with the manpages. > >> You explained some to Jon in the last thread, but I think there > >> should be a canonical description in changelog (if not also > >> Documentation, if mlock is covered there). > >> > >> For example the scenario Jon asked, what happens after a > >> mlock2(MLOCK_ONFAULT) followed by mlock2(MLOCK_LOCKED), and that the > >> answer is "nothing". Your promised code comment for > >> apply_vma_flags() doesn't suffice IMHO (and I'm not sure it's there, > >> anyway?). > > > > I missed adding that comment to the code, will be there in V5 along with > > the description in the changelog. > > Thanks! > > >> > >> But the more I think about the scenario and your new VM_LOCKONFAULT > >> vma flag, it seems awkward to me. Why should munlocking at all care > >> if the vma was mlocked with MLOCK_LOCKED or MLOCK_ONFAULT? In either > >> case the result is that all pages currently populated are munlocked. > >> So the flags for munlock2 should be unnecessary. > > > > Say a user has a large area of interleaved MLOCK_LOCK and MLOCK_ONFAULT > > mappings and they want to unlock only the ones with MLOCK_LOCK. With > > the current implementation, this is possible in a single system call > > that spans the entire region. With your suggestion, the user would have > > to know what regions where locked with MLOCK_LOCK and call munlock() on > > each of them. IMO, the way munlock2() works better mirrors the way > > munlock() currently works when called on a large area of interleaved > > locked and unlocked areas. > > Um OK, that scenario is possible in theory. But I have a hard time imagining > that somebody would really want to do that. I think much more people would > benefit from a simpler API. It wasn't about imagining a scenario, more about keeping parity with something that currently works (unlocking a large area of interleaved locked and unlocked regions). However, there is no reason we can't add the new munlock2 later if it is desired. > > > > >> > >> I also think VM_LOCKONFAULT is unnecessary. VM_LOCKED should be > >> enough - see how you had to handle the new flag in all places that > >> had to handle the old flag? I think the information whether mlock > >> was supposed to fault the whole vma is obsolete at the moment mlock > >> returns. VM_LOCKED should be enough for both modes, and the flag to > >> mlock2 could just control whether the pre-faulting is done. > >> > >> So what should be IMHO enough: > >> - munlock can stay without flags > >> - mlock2 has only one new flag MLOCK_ONFAULT. If specified, > >> pre-faulting is not done, just set VM_LOCKED and mlock pages already > >> present. > >> - same with mmap(MAP_LOCKONFAULT) (need to define what happens when > >> both MAP_LOCKED and MAP_LOCKONFAULT are specified). > >> > >> Now mlockall(MCL_FUTURE) muddles the situation in that it stores the > >> information for future VMA's in current->mm->def_flags, and this > >> def_flags would need to distinguish VM_LOCKED with population and > >> without. But that could be still solvable without introducing a new > >> vma flag everywhere. > > > > With you right up until that last paragraph. I have been staring at > > this a while and I cannot come up a way to handle the > > mlockall(MCL_ONFAULT) without introducing a new vm flag. It doesn't > > have to be VM_LOCKONFAULT, we could use the model that Michal Hocko > > suggested with something like VM_FAULTPOPULATE. However, we can't > > really use this flag anywhere except the mlock code becuase we have to > > be able to distinguish a caller that wants to use MLOCK_LOCK with > > whatever control VM_FAULTPOPULATE might grant outside of mlock and a > > caller that wants MLOCK_ONFAULT. That was a long way of saying we need > > an extra vma flag regardless. However, if that flag only controls if > > mlock pre-populates it would work and it would do away with most of the > > places I had to touch to handle VM_LOCKONFAULT properly. > > Yes, it would be a good way. Adding a new vma flag is probably cleanest after > all, but the flag would be set *in addition* to VM_LOCKED, *just* to prevent > pre-faulting. The places that check VM_LOCKED for the actual page mlocking (i.e. > try_to_unmap_one) would just keep checking VM_LOCKED. The places where VM_LOCKED > is checked to trigger prepopulation, would skip that if VM_LOCKONFAULT is also > set. Having VM_LOCKONFAULT set without also VM_LOCKED itself would be invalid state. > > This should work fine with the simplified API as I proposed so let me reiterate > and try fill in the blanks: > > - mlock2 has only one new flag MLOCK_ONFAULT. If specified, VM_LOCKONFAULT is > set in addition to VM_LOCKED and no prefaulting is done > - old mlock syscall naturally behaves as mlock2 without MLOCK_ONFAULT > - calling mlock/mlock2 on an already-mlocked area (if that's permitted > already?) will add/remove VM_LOCKONFAULT as needed. If it's removing, > prepopulate whole range. Of course adding VM_LOCKONFAULT to a vma that was > already prefaulted doesn't make any difference, but it's consistent with the rest. > - munlock removes both VM_LOCKED and VM_LOCKONFAULT > - mmap could treat MAP_LOCKONFAULT as a modifier to MAP_LOCKED to be consistent? > or not? I'm not sure here, either way subtly differs from mlock API anyway, I > just wish MAP_LOCKED never existed... > - mlockall(MCL_CURRENT) sets or clears VM_LOCKONFAULT depending on > MCL_LOCKONFAULT, mlockall(MCL_FUTURE) does the same on mm->def_flags > - munlockall2 removes both, like munlock. munlockall2(MCL_FUTURE) does that to > def_flags > > > I picked VM_LOCKONFAULT because it is explicit about what it is for and > > there is little risk of someone coming along in 5 years and saying "why > > not overload this flag to do this other thing completely unrelated to > > mlock?". A flag for controling speculative population is more likely to > > be overloaded outside of mlock(). > > Sure, let's make clear the name is related to mlock, but the behavior could > still be additive to MAP_LOCKED. > > > If you have a sane way of handling mlockall(MCL_ONFAULT) without a new > > VMA flag, I am happy to give it a try, but I haven't been able to come > > up with one that doesn't have its own gremlins. > > Well we could store the MCL_FUTURE | MCL_ONFAULT bit elsewhere in mm_struct than > the def_flags field. The VM_LOCKED field is already evaluated specially from all > the other def_flags. We are nearing the full 32bit space for vma flags. I think > all I've proposed above wouldn't change much if we removed per-vma > VM_LOCKONFAULT flag from the equation. Just that re-mlocking area already > mlocked *withouth* MLOCK_ONFAULT wouldn't know that it was alread prepopulated, > and would have to re-populate in either case (I'm not sure, maybe it's already > done by current implementation anyway so it's not a potential performance > regression). > Only mlockall(MCL_FUTURE | MCL_ONFAULT) should really need the ONFAULT info to > "stick" somewhere in mm_struct, but it doesn't have to be def_flags? This all sounds fine and should still cover the usecase that started this adventure. I will include this change in the V5 spin.