All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: aarcange@redhat.com, akpm@linux-foundation.org,
	alex.shi@linux.alibaba.com, alexander.duyck@gmail.com,
	aryabinin@virtuozzo.com, daniel.m.jordan@oracle.com,
	hannes@cmpxchg.org, hughd@google.com, iamjoonsoo.kim@lge.com,
	jannh@google.com, khlebnikov@yandex-team.ru,
	kirill.shutemov@linux.intel.com, kirill@shutemov.name,
	linux-mm@kvack.org, mgorman@techsingularity.net,
	mhocko@kernel.org, mhocko@suse.com, mika.penttila@nextfour.com,
	minchan@kernel.org, mm-commits@vger.kernel.org,
	richard.weiyang@gmail.com, rong.a.chen@intel.com,
	shakeelb@google.com, tglx@linutronix.de, tj@kernel.org,
	torvalds@linux-foundation.org, vbabka@suse.cz,
	vdavydov.dev@gmail.com, willy@infradead.org,
	yang.shi@linux.alibaba.com, ying.huang@intel.com
Subject: [patch 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn
Date: Tue, 15 Dec 2020 12:34:25 -0800	[thread overview]
Message-ID: <20201215203425._MFaJyUPh%akpm@linux-foundation.org> (raw)
In-Reply-To: <20201215123253.954eca9a5ef4c0d52fd381fa@linux-foundation.org>

From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/swap.c: serialize memcg changes in pagevec_lru_move_fn

Hugh Dickins' found a memcg change bug on original version: If we want to
change the pgdat->lru_lock to memcg's lruvec lock, we have to serialize
mem_cgroup_move_account during pagevec_lru_move_fn.  The possible bad
scenario would like:

	cpu 0					cpu 1
lruvec = mem_cgroup_page_lruvec()
					if (!isolate_lru_page())
						mem_cgroup_move_account

spin_lock_irqsave(&lruvec->lru_lock <== wrong lock.

So we need TestClearPageLRU to block isolate_lru_page(), that serializes
the memcg change.  and then removing the PageLRU check in move_fn callee
as the consequence.

__pagevec_lru_add_fn() is different from the others, because the pages it
deals with are, by definition, not yet on the lru.  TestClearPageLRU is
not needed and would not work, so __pagevec_lru_add() goes its own way.

Link: https://lkml.kernel.org/r/1604566549-62481-17-git-send-email-alex.shi@linux.alibaba.com
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swap.c |   44 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 35 insertions(+), 9 deletions(-)

--- a/mm/swap.c~mm-swapc-serialize-memcg-changes-in-pagevec_lru_move_fn
+++ a/mm/swap.c
@@ -222,8 +222,14 @@ static void pagevec_lru_move_fn(struct p
 			spin_lock_irqsave(&pgdat->lru_lock, flags);
 		}
 
+		/* block memcg migration during page moving between lru */
+		if (!TestClearPageLRU(page))
+			continue;
+
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 		(*move_fn)(page, lruvec);
+
+		SetPageLRU(page);
 	}
 	if (pgdat)
 		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
@@ -233,7 +239,7 @@ static void pagevec_lru_move_fn(struct p
 
 static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec)
 {
-	if (PageLRU(page) && !PageUnevictable(page)) {
+	if (!PageUnevictable(page)) {
 		del_page_from_lru_list(page, lruvec, page_lru(page));
 		ClearPageActive(page);
 		add_page_to_lru_list_tail(page, lruvec, page_lru(page));
@@ -306,7 +312,7 @@ void lru_note_cost_page(struct page *pag
 
 static void __activate_page(struct page *page, struct lruvec *lruvec)
 {
-	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
+	if (!PageActive(page) && !PageUnevictable(page)) {
 		int lru = page_lru_base_type(page);
 		int nr_pages = thp_nr_pages(page);
 
@@ -362,7 +368,8 @@ static void activate_page(struct page *p
 
 	page = compound_head(page);
 	spin_lock_irq(&pgdat->lru_lock);
-	__activate_page(page, mem_cgroup_page_lruvec(page, pgdat));
+	if (PageLRU(page))
+		__activate_page(page, mem_cgroup_page_lruvec(page, pgdat));
 	spin_unlock_irq(&pgdat->lru_lock);
 }
 #endif
@@ -519,9 +526,6 @@ static void lru_deactivate_file_fn(struc
 	bool active;
 	int nr_pages = thp_nr_pages(page);
 
-	if (!PageLRU(page))
-		return;
-
 	if (PageUnevictable(page))
 		return;
 
@@ -562,7 +566,7 @@ static void lru_deactivate_file_fn(struc
 
 static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec)
 {
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
+	if (PageActive(page) && !PageUnevictable(page)) {
 		int lru = page_lru_base_type(page);
 		int nr_pages = thp_nr_pages(page);
 
@@ -579,7 +583,7 @@ static void lru_deactivate_fn(struct pag
 
 static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec)
 {
-	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
+	if (PageAnon(page) && PageSwapBacked(page) &&
 	    !PageSwapCache(page) && !PageUnevictable(page)) {
 		bool active = PageActive(page);
 		int nr_pages = thp_nr_pages(page);
@@ -1021,7 +1025,29 @@ static void __pagevec_lru_add_fn(struct
  */
 void __pagevec_lru_add(struct pagevec *pvec)
 {
-	pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn);
+	int i;
+	struct pglist_data *pgdat = NULL;
+	struct lruvec *lruvec;
+	unsigned long flags = 0;
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+		struct pglist_data *pagepgdat = page_pgdat(page);
+
+		if (pagepgdat != pgdat) {
+			if (pgdat)
+				spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+			pgdat = pagepgdat;
+			spin_lock_irqsave(&pgdat->lru_lock, flags);
+		}
+
+		lruvec = mem_cgroup_page_lruvec(page, pgdat);
+		__pagevec_lru_add_fn(page, lruvec);
+	}
+	if (pgdat)
+		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+	release_pages(pvec->pages, pvec->nr);
+	pagevec_reinit(pvec);
 }
 
 /**
_

  parent reply	other threads:[~2020-12-15 20:35 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 20:32 incoming Andrew Morton
2020-12-15 20:33 ` [patch 01/19] mm/thp: move lru_add_page_tail() to huge_memory.c Andrew Morton
2020-12-15 20:33 ` [patch 02/19] mm/thp: use head for head page in lru_add_page_tail() Andrew Morton
2020-12-15 20:33 ` [patch 03/19] mm/thp: simplify lru_add_page_tail() Andrew Morton
2020-12-15 20:33 ` [patch 04/19] mm/thp: narrow lru locking Andrew Morton
2020-12-15 20:33 ` [patch 05/19] mm/vmscan: remove unnecessary lruvec adding Andrew Morton
2020-12-15 20:33 ` [patch 06/19] mm/rmap: stop store reordering issue on page->mapping Andrew Morton
2020-12-15 20:33 ` [patch 07/19] mm: page_idle_get_page() does not need lru_lock Andrew Morton
2020-12-15 20:33 ` [patch 08/19] mm/memcg: add debug checking in lock_page_memcg Andrew Morton
2020-12-15 20:33 ` [patch 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Andrew Morton
2020-12-15 20:34 ` [patch 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Andrew Morton
2020-12-15 20:34 ` [patch 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Andrew Morton
2020-12-15 20:34 ` [patch 13/19] mm/mlock: remove __munlock_isolate_lru_page() Andrew Morton
2020-12-15 20:34 ` [patch 14/19] mm/lru: introduce TestClearPageLRU() Andrew Morton
2020-12-15 20:34 ` [patch 15/19] mm/compaction: do page isolation first in compaction Andrew Morton
2020-12-15 20:34 ` Andrew Morton [this message]
2020-12-15 20:34 ` [patch 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Andrew Morton
2020-12-15 20:34 ` [patch 18/19] mm/lru: introduce relock_page_lruvec() Andrew Morton
2020-12-15 20:34 ` [patch 19/19] mm/lru: revise the comments of lru_lock Andrew Morton
2020-12-15 21:00 ` incoming Linus Torvalds
2020-12-15 22:20 ` [patch 01/19] mm/thp: move lru_add_page_tail() to huge_memory.c Andrew Morton
2020-12-15 22:20 ` [patch 02/19] mm/thp: use head for head page in lru_add_page_tail() Andrew Morton
2020-12-15 22:20 ` [patch 03/19] mm/thp: simplify lru_add_page_tail() Andrew Morton
2020-12-15 22:20 ` [patch 04/19] mm/thp: narrow lru locking Andrew Morton
2020-12-15 22:20 ` [patch 05/19] mm/vmscan: remove unnecessary lruvec adding Andrew Morton
2020-12-15 22:20 ` [patch 06/19] mm/rmap: stop store reordering issue on page->mapping Andrew Morton
2020-12-15 22:20 ` [patch 07/19] mm: page_idle_get_page() does not need lru_lock Andrew Morton
2020-12-15 22:20 ` [patch 08/19] mm/memcg: add debug checking in lock_page_memcg Andrew Morton
2020-12-15 22:20 ` [patch 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Andrew Morton
2020-12-15 22:20 ` [patch 10/19] mm/lru: move lock into lru_note_cost Andrew Morton
2020-12-15 22:20 ` [patch 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Andrew Morton
2020-12-15 22:20 ` [patch 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Andrew Morton
2020-12-15 22:21 ` [patch 13/19] mm/mlock: remove __munlock_isolate_lru_page() Andrew Morton
2020-12-15 22:21 ` [patch 14/19] mm/lru: introduce TestClearPageLRU() Andrew Morton
2020-12-15 22:21 ` [patch 15/19] mm/compaction: do page isolation first in compaction Andrew Morton
2020-12-15 22:21 ` [patch 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Andrew Morton
2020-12-15 22:21 ` [patch 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Andrew Morton
2020-12-15 22:21 ` [patch 18/19] mm/lru: introduce relock_page_lruvec() Andrew Morton
2020-12-15 22:21 ` [patch 19/19] mm/lru: revise the comments of lru_lock Andrew Morton
2020-12-15 22:48 ` incoming Linus Torvalds
2020-12-15 22:49   ` incoming Linus Torvalds
2020-12-15 22:55     ` incoming Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201215203425._MFaJyUPh%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=alex.shi@linux.alibaba.com \
    --cc=alexander.duyck@gmail.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mika.penttila@nextfour.com \
    --cc=minchan@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.