From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4BBEC4361B for ; Tue, 15 Dec 2020 20:34:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7FFE222D00 for ; Tue, 15 Dec 2020 20:34:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FFE222D00 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F9E48D000D; Tue, 15 Dec 2020 15:34:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 084C28D000C; Tue, 15 Dec 2020 15:34:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8E5D8D000D; Tue, 15 Dec 2020 15:34:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CE7E58D000C for ; Tue, 15 Dec 2020 15:34:19 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 988D18249980 for ; Tue, 15 Dec 2020 20:34:19 +0000 (UTC) X-FDA: 77596669038.08.horn09_1904aa727426 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 77ADE1819E626 for ; Tue, 15 Dec 2020 20:34:19 +0000 (UTC) X-HE-Tag: horn09_1904aa727426 X-Filterd-Recvd-Size: 9213 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Dec 2020 20:34:18 +0000 (UTC) Date: Tue, 15 Dec 2020 12:34:16 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1608064458; bh=YNnPbEldHd+o/dDMLSqQFxUWG7aTHHGGeMozOZHkt0Y=; h=From:To:Subject:In-Reply-To:From; b=M+708Sx0eNFEwvDPoW2NCqyTlAB6NljhesgoCRFbuaEw0NP5+WGSMqM5KabDmqW4z Jc+1HSb/yatA68D11tmIdqfe1VpMb455w0P538jJRDnyPie88pr73wMQrADf4S9yxI ItkCXuyuZoC7nikT57uDvLez2qNRCvOdKhKNAmxg= From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, alexander.duyck@gmail.com, aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com, jannh@google.com, khlebnikov@yandex-team.ru, kirill.shutemov@linux.intel.com, kirill@shutemov.name, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com, minchan@kernel.org, mm-commits@vger.kernel.org, richard.weiyang@gmail.com, shakeelb@google.com, tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org, yang.shi@linux.alibaba.com, ying.huang@intel.com Subject: [patch 14/19] mm/lru: introduce TestClearPageLRU() Message-ID: <20201215203416.oJg1652MT%akpm@linux-foundation.org> In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =46rom: Alex Shi Subject: mm/lru: introduce TestClearPageLRU() Currently lru_lock still guards both lru list and page's lru bit, that's ok. but if we want to use specific lruvec lock on the page, we need to pin down the page's lruvec/memcg during locking. Just taking lruvec lock first may be undermined by the page's memcg charge/migration. To fix this problem, we will clear the lru bit out of locking and use it as pin down action to block the page isolation in memcg changing. So now a standard steps of page isolation is following: 1, get_page(); #pin the page avoid to be free 2, TestClearPageLRU(); #block other isolation like memcg change 3, spin_lock on lru_lock; #serialize lru list access 4, delete page from lru list; This patch start with the first part: TestClearPageLRU, which combines PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This function will be used as page isolation precondition to prevent other isolations some where else. Then there are may !PageLRU page on lru list, need to remove BUG() checking accordingly. There 2 rules for lru bit now: 1, the lru bit still indicate if a page on lru list, just in some temporary moment(isolating), the page may have no lru bit when it's on lru list. but the page still must be on lru list when the lru bit set. 2, have to remove lru bit before delete it from lru list. As Andrew Morton mentioned this change would dirty cacheline for a page which isn't on the LRU. But the loss would be acceptable in Rong Chen report: https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/ Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi= @linux.alibaba.com Suggested-by: Johannes Weiner Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Michal Hocko Cc: Vladimir Davydov Cc: Alexander Duyck Cc: Andrea Arcangeli Cc: Andrey Ryabinin Cc: Daniel Jordan Cc: "Huang, Ying" Cc: Jann Horn Cc: Joonsoo Kim Cc: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Konstantin Khlebnikov Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Mika Penttil=C3=A4 Cc: Minchan Kim Cc: Shakeel Butt Cc: Tejun Heo Cc: Thomas Gleixner Cc: Wei Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 1=20 mm/mlock.c | 3 -- mm/vmscan.c | 39 +++++++++++++++++------------------ 3 files changed, 21 insertions(+), 22 deletions(-) --- a/include/linux/page-flags.h~mm-lru-introduce-testclearpagelru +++ a/include/linux/page-flags.h @@ -334,6 +334,7 @@ PAGEFLAG(Referenced, referenced, PF_HEAD PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) __CLEARPAGEFLAG(Dirty, dirty, PF_HEAD) PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) + TESTCLEARFLAG(LRU, lru, PF_HEAD) PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) TESTCLEARFLAG(Active, active, PF_HEAD) PAGEFLAG(Workingset, workingset, PF_HEAD) --- a/mm/mlock.c~mm-lru-introduce-testclearpagelru +++ a/mm/mlock.c @@ -276,10 +276,9 @@ static void __munlock_pagevec(struct pag * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (PageLRU(page)) { + if (TestClearPageLRU(page)) { struct lruvec *lruvec; =20 - ClearPageLRU(page); lruvec =3D mem_cgroup_page_lruvec(page, page_pgdat(page)); del_page_from_lru_list(page, lruvec, --- a/mm/vmscan.c~mm-lru-introduce-testclearpagelru +++ a/mm/vmscan.c @@ -1541,7 +1541,7 @@ unsigned int reclaim_clean_pages_from_li */ int __isolate_lru_page(struct page *page, isolate_mode_t mode) { - int ret =3D -EINVAL; + int ret =3D -EBUSY; =20 /* Only take pages on the LRU. */ if (!PageLRU(page)) @@ -1551,8 +1551,6 @@ int __isolate_lru_page(struct page *page if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) return ret; =20 - ret =3D -EBUSY; - /* * To minimise LRU disruption, the caller can indicate that it only * wants to isolate pages it will be able to operate on without @@ -1599,8 +1597,10 @@ int __isolate_lru_page(struct page *page * sure the page is not being freed elsewhere -- the * page release code relies on it. */ - ClearPageLRU(page); - ret =3D 0; + if (TestClearPageLRU(page)) + ret =3D 0; + else + put_page(page); } =20 return ret; @@ -1666,8 +1666,6 @@ static unsigned long isolate_lru_pages(u page =3D lru_to_page(src); prefetchw_prev_lru_page(page, src, flags); =20 - VM_BUG_ON_PAGE(!PageLRU(page), page); - nr_pages =3D compound_nr(page); total_scan +=3D nr_pages; =20 @@ -1764,21 +1762,18 @@ int isolate_lru_page(struct page *page) VM_BUG_ON_PAGE(!page_count(page), page); WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); =20 - if (PageLRU(page)) { + if (TestClearPageLRU(page)) { pg_data_t *pgdat =3D page_pgdat(page); struct lruvec *lruvec; =20 - spin_lock_irq(&pgdat->lru_lock); + get_page(page); lruvec =3D mem_cgroup_page_lruvec(page, pgdat); - if (PageLRU(page)) { - int lru =3D page_lru(page); - get_page(page); - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, lru); - ret =3D 0; - } + spin_lock_irq(&pgdat->lru_lock); + del_page_from_lru_list(page, lruvec, page_lru(page)); spin_unlock_irq(&pgdat->lru_lock); + ret =3D 0; } + return ret; } =20 @@ -4289,6 +4284,10 @@ void check_move_unevictable_pages(struct nr_pages =3D thp_nr_pages(page); pgscanned +=3D nr_pages; =20 + /* block memcg migration during page moving between lru */ + if (!TestClearPageLRU(page)) + continue; + if (pagepgdat !=3D pgdat) { if (pgdat) spin_unlock_irq(&pgdat->lru_lock); @@ -4297,10 +4296,7 @@ void check_move_unevictable_pages(struct } lruvec =3D mem_cgroup_page_lruvec(page, pgdat); =20 - if (!PageLRU(page) || !PageUnevictable(page)) - continue; - - if (page_evictable(page)) { + if (page_evictable(page) && PageUnevictable(page)) { enum lru_list lru =3D page_lru_base_type(page); =20 VM_BUG_ON_PAGE(PageActive(page), page); @@ -4309,12 +4305,15 @@ void check_move_unevictable_pages(struct add_page_to_lru_list(page, lruvec, lru); pgrescued +=3D nr_pages; } + SetPageLRU(page); } =20 if (pgdat) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); spin_unlock_irq(&pgdat->lru_lock); + } else if (pgscanned) { + count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } } EXPORT_SYMBOL_GPL(check_move_unevictable_pages); _