All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: aarcange@redhat.com, akpm@linux-foundation.org,
	alex.shi@linux.alibaba.com, alexander.duyck@gmail.com,
	aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com,
	hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com,
	jannh@google.com, khlebnikov@yandex-team.ru,
	kirill.shutemov@linux.intel.com, kirill@shutemov.name,
	linux-mm@kvack.org, mgorman@techsingularity.net,
	mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com,
	minchan@kernel.org, mm-commits@vger.kernel.org,
	richard.weiyang@gmail.com, shakeelb@google.com,
	tglx@linutronix.de, tj@kernel.org, torvalds@linux-foundation.org,
	vbabka@suse.cz, vdavydov.dev@gmail.com, willy@infradead.org,
	yang.shi@linux.alibaba.com, ying.huang@intel.com
Subject: [patch 14/19] mm/lru: introduce TestClearPageLRU()
Date: Tue, 15 Dec 2020 14:21:09 -0800	[thread overview]
Message-ID: <20201215222109.KtvHQsOCd%akpm@linux-foundation.org> (raw)
In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org>

From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/lru: introduce TestClearPageLRU()

Currently lru_lock still guards both lru list and page's lru bit, that's
ok.  but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking.  Just taking lruvec lock
first may be undermined by the page's memcg charge/migration.  To fix this
problem, we will clear the lru bit out of locking and use it as pin down
action to block the page isolation in memcg changing.

So now a standard steps of page isolation is following:
	1, get_page(); 	       #pin the page avoid to be free
	2, TestClearPageLRU(); #block other isolation like memcg change
	3, spin_lock on lru_lock; #serialize lru list access
	4, delete page from lru list;

This patch start with the first part: TestClearPageLRU, which combines
PageLRU check and ClearPageLRU into a macro func TestClearPageLRU.  This
function will be used as page isolation precondition to prevent other
isolations some where else.  Then there are may !PageLRU page on lru list,
need to remove BUG() checking accordingly.

There 2 rules for lru bit now:
1, the lru bit still indicate if a page on lru list, just in some
   temporary moment(isolating), the page may have no lru bit when
   it's on lru list.  but the page still must be on lru list when the
   lru bit set.
2, have to remove lru bit before delete it from lru list.

As Andrew Morton mentioned this change would dirty cacheline for a page
which isn't on the LRU.  But the loss would be acceptable in Rong Chen
<rong.a.chen@intel.com> report:
https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/

Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/page-flags.h |    1 
 mm/mlock.c                 |    3 --
 mm/vmscan.c                |   39 +++++++++++++++++------------------
 3 files changed, 21 insertions(+), 22 deletions(-)

--- a/include/linux/page-flags.h~mm-lru-introduce-testclearpagelru
+++ a/include/linux/page-flags.h
@@ -334,6 +334,7 @@ PAGEFLAG(Referenced, referenced, PF_HEAD
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
 	__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
+	TESTCLEARFLAG(LRU, lru, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
 	TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
--- a/mm/mlock.c~mm-lru-introduce-testclearpagelru
+++ a/mm/mlock.c
@@ -276,10 +276,9 @@ static void __munlock_pagevec(struct pag
 			 * We already have pin from follow_page_mask()
 			 * so we can spare the get_page() here.
 			 */
-			if (PageLRU(page)) {
+			if (TestClearPageLRU(page)) {
 				struct lruvec *lruvec;
 
-				ClearPageLRU(page);
 				lruvec = mem_cgroup_page_lruvec(page,
 							page_pgdat(page));
 				del_page_from_lru_list(page, lruvec,
--- a/mm/vmscan.c~mm-lru-introduce-testclearpagelru
+++ a/mm/vmscan.c
@@ -1541,7 +1541,7 @@ unsigned int reclaim_clean_pages_from_li
  */
 int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 {
-	int ret = -EINVAL;
+	int ret = -EBUSY;
 
 	/* Only take pages on the LRU. */
 	if (!PageLRU(page))
@@ -1551,8 +1551,6 @@ int __isolate_lru_page(struct page *page
 	if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
 		return ret;
 
-	ret = -EBUSY;
-
 	/*
 	 * To minimise LRU disruption, the caller can indicate that it only
 	 * wants to isolate pages it will be able to operate on without
@@ -1599,8 +1597,10 @@ int __isolate_lru_page(struct page *page
 		 * sure the page is not being freed elsewhere -- the
 		 * page release code relies on it.
 		 */
-		ClearPageLRU(page);
-		ret = 0;
+		if (TestClearPageLRU(page))
+			ret = 0;
+		else
+			put_page(page);
 	}
 
 	return ret;
@@ -1666,8 +1666,6 @@ static unsigned long isolate_lru_pages(u
 		page = lru_to_page(src);
 		prefetchw_prev_lru_page(page, src, flags);
 
-		VM_BUG_ON_PAGE(!PageLRU(page), page);
-
 		nr_pages = compound_nr(page);
 		total_scan += nr_pages;
 
@@ -1764,21 +1762,18 @@ int isolate_lru_page(struct page *page)
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
 
-	if (PageLRU(page)) {
+	if (TestClearPageLRU(page)) {
 		pg_data_t *pgdat = page_pgdat(page);
 		struct lruvec *lruvec;
 
-		spin_lock_irq(&pgdat->lru_lock);
+		get_page(page);
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
-		if (PageLRU(page)) {
-			int lru = page_lru(page);
-			get_page(page);
-			ClearPageLRU(page);
-			del_page_from_lru_list(page, lruvec, lru);
-			ret = 0;
-		}
+		spin_lock_irq(&pgdat->lru_lock);
+		del_page_from_lru_list(page, lruvec, page_lru(page));
 		spin_unlock_irq(&pgdat->lru_lock);
+		ret = 0;
 	}
+
 	return ret;
 }
 
@@ -4289,6 +4284,10 @@ void check_move_unevictable_pages(struct
 		nr_pages = thp_nr_pages(page);
 		pgscanned += nr_pages;
 
+		/* block memcg migration during page moving between lru */
+		if (!TestClearPageLRU(page))
+			continue;
+
 		if (pagepgdat != pgdat) {
 			if (pgdat)
 				spin_unlock_irq(&pgdat->lru_lock);
@@ -4297,10 +4296,7 @@ void check_move_unevictable_pages(struct
 		}
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 
-		if (!PageLRU(page) || !PageUnevictable(page))
-			continue;
-
-		if (page_evictable(page)) {
+		if (page_evictable(page) && PageUnevictable(page)) {
 			enum lru_list lru = page_lru_base_type(page);
 
 			VM_BUG_ON_PAGE(PageActive(page), page);
@@ -4309,12 +4305,15 @@ void check_move_unevictable_pages(struct
 			add_page_to_lru_list(page, lruvec, lru);
 			pgrescued += nr_pages;
 		}
+		SetPageLRU(page);
 	}
 
 	if (pgdat) {
 		__count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued);
 		__count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
 		spin_unlock_irq(&pgdat->lru_lock);
+	} else if (pgscanned) {
+		count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
 	}
 }
 EXPORT_SYMBOL_GPL(check_move_unevictable_pages);
_

  parent reply	other threads:[~2020-12-15 22:22 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 20:32 incoming Andrew Morton
2020-12-15 20:33 ` [patch 01/19] mm/thp: move lru_add_page_tail() to huge_memory.c Andrew Morton
2020-12-15 20:33 ` [patch 02/19] mm/thp: use head for head page in lru_add_page_tail() Andrew Morton
2020-12-15 20:33 ` [patch 03/19] mm/thp: simplify lru_add_page_tail() Andrew Morton
2020-12-15 20:33 ` [patch 04/19] mm/thp: narrow lru locking Andrew Morton
2020-12-15 20:33 ` [patch 05/19] mm/vmscan: remove unnecessary lruvec adding Andrew Morton
2020-12-15 20:33 ` [patch 06/19] mm/rmap: stop store reordering issue on page->mapping Andrew Morton
2020-12-15 20:33 ` [patch 07/19] mm: page_idle_get_page() does not need lru_lock Andrew Morton
2020-12-15 20:33 ` [patch 08/19] mm/memcg: add debug checking in lock_page_memcg Andrew Morton
2020-12-15 20:33 ` [patch 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Andrew Morton
2020-12-15 20:34 ` [patch 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Andrew Morton
2020-12-15 20:34 ` [patch 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Andrew Morton
2020-12-15 20:34 ` [patch 13/19] mm/mlock: remove __munlock_isolate_lru_page() Andrew Morton
2020-12-15 20:34 ` [patch 14/19] mm/lru: introduce TestClearPageLRU() Andrew Morton
2020-12-15 20:34 ` [patch 15/19] mm/compaction: do page isolation first in compaction Andrew Morton
2020-12-15 20:34 ` [patch 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Andrew Morton
2020-12-15 20:34 ` [patch 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Andrew Morton
2020-12-15 20:34 ` [patch 18/19] mm/lru: introduce relock_page_lruvec() Andrew Morton
2020-12-15 20:34 ` [patch 19/19] mm/lru: revise the comments of lru_lock Andrew Morton
2020-12-15 21:00 ` incoming Linus Torvalds
2020-12-15 22:20 ` [patch 01/19] mm/thp: move lru_add_page_tail() to huge_memory.c Andrew Morton
2020-12-15 22:20 ` [patch 02/19] mm/thp: use head for head page in lru_add_page_tail() Andrew Morton
2020-12-15 22:20 ` [patch 03/19] mm/thp: simplify lru_add_page_tail() Andrew Morton
2020-12-15 22:20 ` [patch 04/19] mm/thp: narrow lru locking Andrew Morton
2020-12-15 22:20 ` [patch 05/19] mm/vmscan: remove unnecessary lruvec adding Andrew Morton
2020-12-15 22:20 ` [patch 06/19] mm/rmap: stop store reordering issue on page->mapping Andrew Morton
2020-12-15 22:20 ` [patch 07/19] mm: page_idle_get_page() does not need lru_lock Andrew Morton
2020-12-15 22:20 ` [patch 08/19] mm/memcg: add debug checking in lock_page_memcg Andrew Morton
2020-12-15 22:20 ` [patch 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Andrew Morton
2020-12-15 22:20 ` [patch 10/19] mm/lru: move lock into lru_note_cost Andrew Morton
2020-12-15 22:20 ` [patch 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Andrew Morton
2020-12-15 22:20 ` [patch 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Andrew Morton
2020-12-15 22:21 ` [patch 13/19] mm/mlock: remove __munlock_isolate_lru_page() Andrew Morton
2020-12-15 22:21 ` Andrew Morton [this message]
2020-12-15 22:21 ` [patch 15/19] mm/compaction: do page isolation first in compaction Andrew Morton
2020-12-15 22:21 ` [patch 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Andrew Morton
2020-12-15 22:21 ` [patch 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Andrew Morton
2020-12-15 22:21 ` [patch 18/19] mm/lru: introduce relock_page_lruvec() Andrew Morton
2020-12-15 22:21 ` [patch 19/19] mm/lru: revise the comments of lru_lock Andrew Morton
2020-12-15 22:48 ` incoming Linus Torvalds
2020-12-15 22:49   ` incoming Linus Torvalds
2020-12-15 22:55     ` incoming Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201215222109.KtvHQsOCd%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=alex.shi@linux.alibaba.com \
    --cc=alexander.duyck@gmail.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mika.penttila@nextfour.com \
    --cc=minchan@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=shakeelb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.