From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 7F5666B0033 for ; Tue, 21 Nov 2017 13:22:27 -0500 (EST) Received: by mail-wr0-f198.google.com with SMTP id k100so8568677wrc.9 for ; Tue, 21 Nov 2017 10:22:27 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id m66sor463226wmi.11.2017.11.21.10.22.25 for (Google Transport Security); Tue, 21 Nov 2017 10:22:25 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20171121150632.GA23460@cmpxchg.org> References: <20171104224312.145616-1-shakeelb@google.com> <577ab7e8-b079-125b-80ca-6168dd24720a@suse.cz> <20171121150632.GA23460@cmpxchg.org> From: Shakeel Butt Date: Tue, 21 Nov 2017 10:22:23 -0800 Message-ID: Subject: Re: [PATCH] mm, mlock, vmscan: no more skipping pagevecs Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Vlastimil Babka , Huang Ying , Tim Chen , Michal Hocko , Greg Thelen , Andrew Morton , Balbir Singh , Minchan Kim , Shaohua Li , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jan Kara , Nicholas Piggin , Dan Williams , Mel Gorman , Hugh Dickins , Linux MM , LKML On Tue, Nov 21, 2017 at 7:06 AM, Johannes Weiner wrote: > On Tue, Nov 21, 2017 at 01:39:57PM +0100, Vlastimil Babka wrote: >> On 11/04/2017 11:43 PM, Shakeel Butt wrote: >> > When a thread mlocks an address space backed by file, a new >> > page is allocated (assuming file page is not in memory), added >> > to the local pagevec (lru_add_pvec), I/O is triggered and the >> > thread then sleeps on the page. On I/O completion, the thread >> > can wake on a different CPU, the mlock syscall will then sets >> > the PageMlocked() bit of the page but will not be able to put >> > that page in unevictable LRU as the page is on the pagevec of >> > a different CPU. Even on drain, that page will go to evictable >> > LRU because the PageMlocked() bit is not checked on pagevec >> > drain. >> > >> > The page will eventually go to right LRU on reclaim but the >> > LRU stats will remain skewed for a long time. >> > >> > However, this issue does not happen for anon pages on swap >> > because unlike file pages, anon pages are not added to pagevec >> > until they have been fully swapped in. Also the fault handler >> > uses vm_flags to set the PageMlocked() bit of such anon pages >> > even before returning to mlock() syscall and mlocked pages will >> > skip pagevecs and directly be put into unevictable LRU. No such >> > luck for file pages. >> > >> > One way to resolve this issue, is to somehow plumb vm_flags from >> > filemap_fault() to add_to_page_cache_lru() which will then skip >> > the pagevec for pages of VM_LOCKED vma and directly put them to >> > unevictable LRU. However this patch took a different approach. >> > >> > All the pages, even unevictable, will be added to the pagevecs >> > and on the drain, the pages will be added on their LRUs correctly >> > by checking their evictability. This resolves the mlocked file >> > pages on pagevec of other CPUs issue because when those pagevecs >> > will be drained, the mlocked file pages will go to unevictable >> > LRU. Also this makes the race with munlock easier to resolve >> > because the pagevec drains happen in LRU lock. >> > >> > There is one (good) side effect though. Without this patch, the >> > pages allocated for System V shared memory segment are added to >> > evictable LRUs even after shmctl(SHM_LOCK) on that segment. This >> > patch will correctly put such pages to unevictable LRU. >> > >> > Signed-off-by: Shakeel Butt >> >> I like the approach in general, as it seems to make the code simpler, >> and the diffstats support that. I found no bugs, but I can't say that >> with certainty that there aren't any, though. This code is rather >> tricky. But it should be enough for an ack, so. >> >> Acked-by: Vlastimil Babka >> >> A question below, though. >> >> ... >> >> > @@ -883,15 +855,41 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, >> > static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, >> > void *arg) >> > { >> > - int file = page_is_file_cache(page); >> > - int active = PageActive(page); >> > - enum lru_list lru = page_lru(page); >> > + enum lru_list lru; >> > + int was_unevictable = TestClearPageUnevictable(page); >> > >> > VM_BUG_ON_PAGE(PageLRU(page), page); >> > >> > SetPageLRU(page); >> > + /* >> > + * Page becomes evictable in two ways: >> > + * 1) Within LRU lock [munlock_vma_pages() and __munlock_pagevec()]. >> > + * 2) Before acquiring LRU lock to put the page to correct LRU and then >> > + * a) do PageLRU check with lock [check_move_unevictable_pages] >> > + * b) do PageLRU check before lock [isolate_lru_page] >> > + * >> > + * (1) & (2a) are ok as LRU lock will serialize them. For (2b), if the >> > + * other thread does not observe our setting of PG_lru and fails >> > + * isolation, the following page_evictable() check will make us put >> > + * the page in correct LRU. >> > + */ >> > + smp_mb(); >> >> Could you elaborate on the purpose of smp_mb() here? Previously there >> was "The other side is TestClearPageMlocked() or shmem_lock()" in >> putback_lru_page(), which seems rather unclear to me (neither has an >> explicit barrier?). > > The TestClearPageMlocked() is an RMW operation with return value, and > thus an implicit full barrier (see Documentation/atomic_bitops.txt). > > The ordering is between putback and munlock: > > #0 #1 > list_add(&page->lru,...) if (TestClearPageMlock()) > SetPageLRU() __munlock_isolate_lru_page() > smp_mb() > if (page_evictable()) > rescue > > The scenario that the barrier prevents from happening is: > > list_add(&page->lru,...) > if (page_evictable()) > rescue > if (TestClearPageMlock()) > __munlock_isolate_lru_page() // FAILS on !PageLRU > SetPageLRU() > > and now an evictable page is stranded on the unevictable LRU. > > The barrier guarantees that if #0 doesn't see the page evictable yet, > #1 WILL see the PageLRU and succeed in isolation and rescue. > > Shakeel, please don't drop that "the other side" comment. You mention > the places that make the page evictable - which is great, and please > keep that as well - but for barriers it's always good to know exactly > which operation guarantees the ordering on the other side. In fact, it > would be great if you could add comments to the TestClearPageMlocked() > sites that mention how they order against the smp_mb() in LRU putback. Johannes, I have a question. The example you presented is valid before this patch as '#0' was happening outside LRU lock. This patch moves '#0' inside LRU lock and '#1' was already in LRU lock therefore no issue for this particular scenario. However there is still a TestClearPageMlocked() in clear_page_mlock() which happens outside LRU lock and same issue which you have explained can happen even with this patch (but without smp_mb()). So, "the other side" for smp_mb() after this patch will only be the TestClearPageMlock() in clear_page_mlock() because all other TestClearPageMlocked() instances are serialized by LRU lock. Please let me know if I missed something. Thanks, Shakeel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org