linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, willy@infradead.org,
	hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	shakeelb@google.com, iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com, kirill@shutemov.name,
	alexander.duyck@gmail.com, rong.a.chen@intel.com,
	mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com
Cc: Michal Hocko <mhocko@kernel.org>
Subject: [PATCH v21 14/19] mm/lru: introduce TestClearPageLRU
Date: Thu,  5 Nov 2020 16:55:44 +0800	[thread overview]
Message-ID: <1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com> (raw)
In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com>

Currently lru_lock still guards both lru list and page's lru bit, that's
ok. but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking. Just taking lruvec
lock first may be undermined by the page's memcg charge/migration. To
fix this problem, we will clear the lru bit out of locking and use
it as pin down action to block the page isolation in memcg changing.

So now a standard steps of page isolation is following:
	1, get_page(); 	       #pin the page avoid to be free
	2, TestClearPageLRU(); #block other isolation like memcg change
	3, spin_lock on lru_lock; #serialize lru list access
	4, delete page from lru list;

This patch start with the first part: TestClearPageLRU, which combines
PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This
function will be used as page isolation precondition to prevent other
isolations some where else. Then there are may !PageLRU page on lru
list, need to remove BUG() checking accordingly.

There 2 rules for lru bit now:
1, the lru bit still indicate if a page on lru list, just in some
   temporary moment(isolating), the page may have no lru bit when
   it's on lru list.  but the page still must be on lru list when the
   lru bit set.
2, have to remove lru bit before delete it from lru list.

As Andrew Morton mentioned this change would dirty cacheline for page
isn't on LRU. But the lost would be acceptable in Rong Chen
<rong.a.chen@intel.com> report:
https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
---
 include/linux/page-flags.h |  1 +
 mm/mlock.c                 |  3 +--
 mm/vmscan.c                | 39 +++++++++++++++++++--------------------
 3 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 291dc247dc79..6426f2f03611 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -335,6 +335,7 @@ static inline void page_init_poison(struct page *page, size_t size)
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
 	__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
+	TESTCLEARFLAG(LRU, lru, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
 	TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
diff --git a/mm/mlock.c b/mm/mlock.c
index d487aa864e86..7b0e6334be6f 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -276,10 +276,9 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 			 * We already have pin from follow_page_mask()
 			 * so we can spare the get_page() here.
 			 */
-			if (PageLRU(page)) {
+			if (TestClearPageLRU(page)) {
 				struct lruvec *lruvec;
 
-				ClearPageLRU(page);
 				lruvec = mem_cgroup_page_lruvec(page,
 							page_pgdat(page));
 				del_page_from_lru_list(page, lruvec,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cb2f6256a7d6..ab7a0104d1e1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1542,7 +1542,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
  */
 int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 {
-	int ret = -EINVAL;
+	int ret = -EBUSY;
 
 	/* Only take pages on the LRU. */
 	if (!PageLRU(page))
@@ -1552,8 +1552,6 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 	if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
 		return ret;
 
-	ret = -EBUSY;
-
 	/*
 	 * To minimise LRU disruption, the caller can indicate that it only
 	 * wants to isolate pages it will be able to operate on without
@@ -1600,8 +1598,10 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 		 * sure the page is not being freed elsewhere -- the
 		 * page release code relies on it.
 		 */
-		ClearPageLRU(page);
-		ret = 0;
+		if (TestClearPageLRU(page))
+			ret = 0;
+		else
+			put_page(page);
 	}
 
 	return ret;
@@ -1667,8 +1667,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		page = lru_to_page(src);
 		prefetchw_prev_lru_page(page, src, flags);
 
-		VM_BUG_ON_PAGE(!PageLRU(page), page);
-
 		nr_pages = compound_nr(page);
 		total_scan += nr_pages;
 
@@ -1765,21 +1763,18 @@ int isolate_lru_page(struct page *page)
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
 
-	if (PageLRU(page)) {
+	if (TestClearPageLRU(page)) {
 		pg_data_t *pgdat = page_pgdat(page);
 		struct lruvec *lruvec;
 
-		spin_lock_irq(&pgdat->lru_lock);
+		get_page(page);
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
-		if (PageLRU(page)) {
-			int lru = page_lru(page);
-			get_page(page);
-			ClearPageLRU(page);
-			del_page_from_lru_list(page, lruvec, lru);
-			ret = 0;
-		}
+		spin_lock_irq(&pgdat->lru_lock);
+		del_page_from_lru_list(page, lruvec, page_lru(page));
 		spin_unlock_irq(&pgdat->lru_lock);
+		ret = 0;
 	}
+
 	return ret;
 }
 
@@ -4293,6 +4288,10 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 		nr_pages = thp_nr_pages(page);
 		pgscanned += nr_pages;
 
+		/* block memcg migration during page moving between lru */
+		if (!TestClearPageLRU(page))
+			continue;
+
 		if (pagepgdat != pgdat) {
 			if (pgdat)
 				spin_unlock_irq(&pgdat->lru_lock);
@@ -4301,10 +4300,7 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 		}
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 
-		if (!PageLRU(page) || !PageUnevictable(page))
-			continue;
-
-		if (page_evictable(page)) {
+		if (page_evictable(page) && PageUnevictable(page)) {
 			enum lru_list lru = page_lru_base_type(page);
 
 			VM_BUG_ON_PAGE(PageActive(page), page);
@@ -4313,12 +4309,15 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 			add_page_to_lru_list(page, lruvec, lru);
 			pgrescued += nr_pages;
 		}
+		SetPageLRU(page);
 	}
 
 	if (pgdat) {
 		__count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued);
 		__count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
 		spin_unlock_irq(&pgdat->lru_lock);
+	} else if (pgscanned) {
+		count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
 	}
 }
 EXPORT_SYMBOL_GPL(check_move_unevictable_pages);
-- 
1.8.3.1


  parent reply	other threads:[~2020-11-05  8:57 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-05  8:55 [PATCH v21 00/19] per memcg lru lock Alex Shi
2020-11-05  8:55 ` [PATCH v21 01/19] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-11-05  8:55 ` [PATCH v21 02/19] mm/thp: use head for head page in lru_add_page_tail Alex Shi
2020-11-05  8:55 ` [PATCH v21 03/19] mm/thp: Simplify lru_add_page_tail() Alex Shi
2020-11-05  8:55 ` [PATCH v21 04/19] mm/thp: narrow lru locking Alex Shi
2020-11-05  8:55 ` [PATCH v21 05/19] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-11-11 12:36   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 06/19] mm/rmap: stop store reordering issue on page->mapping Alex Shi
2020-11-06  1:20   ` Alex Shi
2020-11-10 19:06     ` Johannes Weiner
2020-11-11  7:41     ` Hugh Dickins
2020-11-05  8:55 ` [PATCH v21 07/19] mm: page_idle_get_page() does not need lru_lock Alex Shi
2020-11-10 19:01   ` Johannes Weiner
2020-11-11  8:17   ` huang ying
2020-11-11 12:52     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 08/19] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-11-05  8:55 ` [PATCH v21 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-11-05  8:55 ` [PATCH v21 10/19] mm/lru: move lock into lru_note_cost Alex Shi
2020-11-05  8:55 ` [PATCH v21 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Alex Shi
2020-11-05  8:55 ` [PATCH v21 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Alex Shi
2020-11-11 13:03   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 13/19] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-11-11 13:07   ` Vlastimil Babka
2020-11-05  8:55 ` Alex Shi [this message]
2020-11-11 13:36   ` [PATCH v21 14/19] mm/lru: introduce TestClearPageLRU Vlastimil Babka
2020-11-12  2:03     ` Hugh Dickins
2020-11-12 11:24       ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 15/19] mm/compaction: do page isolation first in compaction Alex Shi
2020-11-11 17:12   ` Vlastimil Babka
2020-11-12  2:28     ` Hugh Dickins
2020-11-12  3:35       ` Alex Shi
2020-11-12 11:25       ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-11-11 18:00   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-11-05 13:43   ` Alex Shi
2020-11-06  7:48     ` Alex Shi
2020-11-10 18:54       ` Johannes Weiner
2020-11-11 17:46   ` Vlastimil Babka
2020-11-11 17:59     ` Vlastimil Babka
2020-11-12 12:19   ` Vlastimil Babka
2020-11-12 14:19     ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 18/19] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-11-06  7:50   ` Alex Shi
2020-11-10 18:59     ` Johannes Weiner
2020-11-12 12:31   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 19/19] mm/lru: revise the comments of lru_lock Alex Shi
2020-11-12 12:37   ` Vlastimil Babka
2020-11-10 12:14 ` [PATCH v21 00/19] per memcg lru lock Alex Shi
2020-11-16  3:45 ` Alex Shi
2020-12-15  0:47 ` Andrew Morton
2020-12-15  2:16   ` Hugh Dickins
2020-12-15  2:28     ` Andrew Morton
2021-01-05 19:30 ` Qian Cai
2021-01-05 19:42   ` Shakeel Butt
2021-01-05 20:11     ` Qian Cai
2021-01-05 21:35       ` Hugh Dickins
2021-01-05 22:01         ` Qian Cai
2021-01-06  3:10           ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).