On Wed, 2008-10-29 at 16:17 +0900, KOSAKI Motohiro wrote: > > I believe that we still have the lru_drain_all() called from the fault > > path [with mmap_sem held] in clear_page_mlock(). We call > > clear_page_mlock() on COW of an mlocked page in a VM_LOCKED vma to > > ensure that we don't end up with an mlocked page in some other task's > > non-VM_LOCKED vma where we'd then fail to munlock it later. During > > development testing, Rik encountered scenarios where a page would > > encounter a COW fault while it was still making its way to the LRU via > > the pagevecs. So, he added the 'drain_all() and that seemed to avoid > > this scenario. > > Agreed. > > > > Now, in the current upstream version of the unevictable mlocked pages > > patches, we just count any mlocked pages [vmstat] that make their way to > > free*page() instead of BUGging out, as we were doing earlier during > > development. So, maybe we can drop the lru_drain_add()s in the > > unevictable mlocked pages work and live with the occasional freed > > mlocked page, or mlocked page on the active/inactive lists to be dealt > > with by vmscan. > > hm, okey. > maybe, I was wrong. > > I'll make "dropping lru_add_drain_all()" patch soon. > I expect I need few days. > make the patch: 1 day > confirm by stress workload: 2-3 days > > because rik's original problem only happend on heavy wokload, I think. Indeed. It was an ad hoc test program [2 versions attached] written specifically to beat on COW of shared pages mlocked by parent then COWed by parent or child and unmapped explicitly or via exit. We were trying to find all the ways the we could end up freeing mlocked pages--and there were several. Most of these turned out to be genuine coding/design defects [as difficult as that may be to believe :-)], so tracking them down was worthwhile. And, I think that, in general, clearing a page's mlocked state and rescuing from the unevictable lru list on COW--to prevent the mlocked page from ending up mapped into some task's non-VM_LOCKED vma--is a good thing to strive for. Now, looking at the current code [28-rc1] in [__]clear_page_mlock(): We've already cleared the PG_mlocked flag, we've decremented the mlocked pages stats, and we're just trying to rescue the page from the unevictable list to the in/active list. If we fail to isolate the page, then either some other task has it isolated and will return it to an appropriate lru or it resides in a pagevec heading for an in/active lru list. We don't use pagevec for unevictable list. Any other cases? If not, then we can probably dispense with the "try harder" logic--the lru_add_drain()--in __clear_page_mlock(). Do you agree? Or have I missed something? Lee