All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Tejun Heo <tj@kernel.org>
Cc: "Alex Shi" <alex.shi@linux.alibaba.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Wei Yang" <richard.weiyang@gmail.com>,
	"Pavel Tatashin" <pasha.tatashin@oracle.com>,
	"Arun KS" <arunks@codeaurora.org>, "Qian Cai" <cai@lca.pw>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"David Rientjes" <rientjes@google.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	swkhack <swkhack@gmail.com>,
	"Potyra, Stefan" <Stefan.Potyra@elektrobit.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"Stephen Rothwell" <sfr@canb.auug.org.au>,
	"Colin Ian King" <colin.king@canonical.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Mauro Carvalho Chehab" <mchehab+samsung@kernel.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Peng Fan" <peng.fan@nxp.com>, "Ira Weiny" <ira.weiny@intel.com>,
	"Kirill Tkhai" <ktkhai@virtuozzo.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Yafang Shao" <laoar.shao@gmail.com>,
	"Yang Shi" <yang.shi@linux.alibaba.com>
Subject: [PATCH 01/14] mm/lru: move pgdat lru_lock into lruvec
Date: Tue, 20 Aug 2019 17:48:24 +0800	[thread overview]
Message-ID: <1566294517-86418-2-git-send-email-alex.shi@linux.alibaba.com> (raw)
In-Reply-To: <1566294517-86418-1-git-send-email-alex.shi@linux.alibaba.com>

This patch move lru_lock into lruvec, give a lru_lock for each of
lruvec, thus bring a lru_lock for each of memcg.

Per memcg lru_lock would ease the lru_lock contention a lot in
this patch series.

In some data center, containers are used widely to deploy different kind
of services, then multiple memcgs share per node pgdat->lru_lock which
cause heavy lock contentions when doing lru operations.
On my 2 socket * 6 cores E5-2630 platform, 24 containers run aim9
simultaneously with mmtests' config:
	# AIM9
	export AIM9_TESTTIME=180
	export AIM9_TESTLIST=page_test,brk_test

perf lock report show much contentions on lru_lock in 20 second snapshot:
        	        Name   acquired  contended   avg wait (ns) total wait (ns)   max wait (ns)   min wait (ns)
	&(ptlock_ptr(pag...         22          0               0	0               0               0
	...
	&(&pgdat->lru_lo...          9          7           12728	89096           26656            1597

With this patch series, lruvec->lru_lock show no contentions
	&(&lruvec->lru_l...          8          0               0	0               0               0

and aim9 page_test/brk_test performance increased 5%~50%.

Now this patch still using per pgdat lru_lock, no function changes yet.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: swkhack <swkhack@gmail.com>
Cc: "Potyra, Stefan" <Stefan.Potyra@elektrobit.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peng Fan <peng.fan@nxp.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/mmzone.h |  4 +++-
 mm/compaction.c        | 10 +++++-----
 mm/huge_memory.c       |  6 +++---
 mm/memcontrol.c        |  6 +++---
 mm/mlock.c             | 10 +++++-----
 mm/mmzone.c            |  1 +
 mm/page_alloc.c        |  1 -
 mm/page_idle.c         |  4 ++--
 mm/swap.c              | 28 ++++++++++++++--------------
 mm/vmscan.c            | 38 +++++++++++++++++++-------------------
 10 files changed, 55 insertions(+), 53 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d77d717c620c..8d0076d084be 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -295,6 +295,9 @@ struct zone_reclaim_stat {
 
 struct lruvec {
 	struct list_head		lists[NR_LRU_LISTS];
+	/* move lru_lock to per lruvec for memcg */
+	spinlock_t			lru_lock;
+
 	struct zone_reclaim_stat	reclaim_stat;
 	/* Evictions & activations on the inactive file list */
 	atomic_long_t			inactive_age;
@@ -744,7 +747,6 @@ struct zonelist {
 
 	/* Write-intensive fields used by page reclaim */
 	ZONE_PADDING(_pad1_)
-	spinlock_t		lru_lock;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 	/*
diff --git a/mm/compaction.c b/mm/compaction.c
index 952dc2fb24e5..9a737f343183 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -846,7 +846,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 		 * a fatal signal is pending.
 		 */
 		if (!(low_pfn % SWAP_CLUSTER_MAX)
-		    && compact_unlock_should_abort(&pgdat->lru_lock,
+		    && compact_unlock_should_abort(&pgdat->lruvec.lru_lock,
 					    flags, &locked, cc)) {
 			low_pfn = 0;
 			goto fatal_pending;
@@ -919,7 +919,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 			if (unlikely(__PageMovable(page)) &&
 					!PageIsolated(page)) {
 				if (locked) {
-					spin_unlock_irqrestore(&pgdat->lru_lock,
+					spin_unlock_irqrestore(&pgdat->lruvec.lru_lock,
 									flags);
 					locked = false;
 				}
@@ -949,7 +949,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 
 		/* If we already hold the lock, we can skip some rechecking */
 		if (!locked) {
-			locked = compact_lock_irqsave(&pgdat->lru_lock,
+			locked = compact_lock_irqsave(&pgdat->lruvec.lru_lock,
 								&flags, cc);
 
 			/* Try get exclusive access under lock */
@@ -1016,7 +1016,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 		 */
 		if (nr_isolated) {
 			if (locked) {
-				spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+				spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 				locked = false;
 			}
 			putback_movable_pages(&cc->migratepages);
@@ -1043,7 +1043,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 
 isolate_abort:
 	if (locked)
-		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+		spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 
 	/*
 	 * Updated the cached scanner pfn once the pageblock has been scanned
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 738065f765ab..3a483deee807 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2529,7 +2529,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_unlock(&head->mapping->i_pages);
 	}
 
-	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+	spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 
 	remap_page(head);
 
@@ -2740,7 +2740,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		lru_add_drain();
 
 	/* prevent PageLRU to go away from under us, and freeze lru stats */
-	spin_lock_irqsave(&pgdata->lru_lock, flags);
+	spin_lock_irqsave(&pgdata->lruvec.lru_lock, flags);
 
 	if (mapping) {
 		XA_STATE(xas, &mapping->i_pages, page_index(head));
@@ -2785,7 +2785,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		spin_unlock(&pgdata->split_queue_lock);
 fail:		if (mapping)
 			xa_unlock(&mapping->i_pages);
-		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+		spin_unlock_irqrestore(&pgdata->lruvec.lru_lock, flags);
 		remap_page(head);
 		ret = -EBUSY;
 	}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6f5c0c517c49..2792b8ed405f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2557,7 +2557,7 @@ static void lock_page_lru(struct page *page, int *isolated)
 {
 	pg_data_t *pgdat = page_pgdat(page);
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	if (PageLRU(page)) {
 		struct lruvec *lruvec;
 
@@ -2581,7 +2581,7 @@ static void unlock_page_lru(struct page *page, int isolated)
 		SetPageLRU(page);
 		add_page_to_lru_list(page, lruvec, page_lru(page));
 	}
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 }
 
 static void commit_charge(struct page *page, struct mem_cgroup *memcg,
@@ -2901,7 +2901,7 @@ void __memcg_kmem_uncharge(struct page *page, int order)
 
 /*
  * Because tail pages are not marked as "used", set it. We're under
- * pgdat->lru_lock and migration entries setup in all page mappings.
+ * pgdat->lruvec.lru_lock and migration entries setup in all page mappings.
  */
 void mem_cgroup_split_huge_fixup(struct page *head)
 {
diff --git a/mm/mlock.c b/mm/mlock.c
index a90099da4fb4..1279684bada0 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -194,7 +194,7 @@ unsigned int munlock_vma_page(struct page *page)
 	 * might otherwise copy PageMlocked to part of the tail pages before
 	 * we clear it in the head page. It also stabilizes hpage_nr_pages().
 	 */
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 
 	if (!TestClearPageMlocked(page)) {
 		/* Potentially, PTE-mapped THP: do not skip the rest PTEs */
@@ -206,14 +206,14 @@ unsigned int munlock_vma_page(struct page *page)
 	__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
 
 	if (__munlock_isolate_lru_page(page, true)) {
-		spin_unlock_irq(&pgdat->lru_lock);
+		spin_unlock_irq(&pgdat->lruvec.lru_lock);
 		__munlock_isolated_page(page);
 		goto out;
 	}
 	__munlock_isolation_failed(page);
 
 unlock_out:
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 out:
 	return nr_pages - 1;
@@ -298,7 +298,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 	pagevec_init(&pvec_putback);
 
 	/* Phase 1: page isolation */
-	spin_lock_irq(&zone->zone_pgdat->lru_lock);
+	spin_lock_irq(&zone->zone_pgdat->lruvec.lru_lock);
 	for (i = 0; i < nr; i++) {
 		struct page *page = pvec->pages[i];
 
@@ -325,7 +325,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 		pvec->pages[i] = NULL;
 	}
 	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
-	spin_unlock_irq(&zone->zone_pgdat->lru_lock);
+	spin_unlock_irq(&zone->zone_pgdat->lruvec.lru_lock);
 
 	/* Now we can release pins of pages that we are not munlocking */
 	pagevec_release(&pvec_putback);
diff --git a/mm/mmzone.c b/mm/mmzone.c
index 4686fdc23bb9..3750a90ed4a0 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -91,6 +91,7 @@ void lruvec_init(struct lruvec *lruvec)
 	enum lru_list lru;
 
 	memset(lruvec, 0, sizeof(struct lruvec));
+	spin_lock_init(&lruvec->lru_lock);
 
 	for_each_lru(lru)
 		INIT_LIST_HEAD(&lruvec->lists[lru]);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 272c6de1bf4e..1b07dcaabbd7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6677,7 +6677,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
 	init_waitqueue_head(&pgdat->pfmemalloc_wait);
 
 	pgdat_page_ext_init(pgdat);
-	spin_lock_init(&pgdat->lru_lock);
 	lruvec_init(node_lruvec(pgdat));
 }
 
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 295512465065..420bc0ac8c1e 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -42,12 +42,12 @@ static struct page *page_idle_get_page(unsigned long pfn)
 		return NULL;
 
 	pgdat = page_pgdat(page);
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	if (unlikely(!PageLRU(page))) {
 		put_page(page);
 		page = NULL;
 	}
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 	return page;
 }
 
diff --git a/mm/swap.c b/mm/swap.c
index ae300397dfda..63f4782af57a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -63,12 +63,12 @@ static void __page_cache_release(struct page *page)
 		struct lruvec *lruvec;
 		unsigned long flags;
 
-		spin_lock_irqsave(&pgdat->lru_lock, flags);
+		spin_lock_irqsave(&pgdat->lruvec.lru_lock, flags);
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 		VM_BUG_ON_PAGE(!PageLRU(page), page);
 		__ClearPageLRU(page);
 		del_page_from_lru_list(page, lruvec, page_off_lru(page));
-		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+		spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 	}
 	__ClearPageWaiters(page);
 	mem_cgroup_uncharge(page);
@@ -201,16 +201,16 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 
 		if (pagepgdat != pgdat) {
 			if (pgdat)
-				spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+				spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 			pgdat = pagepgdat;
-			spin_lock_irqsave(&pgdat->lru_lock, flags);
+			spin_lock_irqsave(&pgdat->lruvec.lru_lock, flags);
 		}
 
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 		(*move_fn)(page, lruvec, arg);
 	}
 	if (pgdat)
-		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+		spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 	release_pages(pvec->pages, pvec->nr);
 	pagevec_reinit(pvec);
 }
@@ -326,9 +326,9 @@ void activate_page(struct page *page)
 	pg_data_t *pgdat = page_pgdat(page);
 
 	page = compound_head(page);
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	__activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL);
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 }
 #endif
 
@@ -733,7 +733,7 @@ void release_pages(struct page **pages, int nr)
 		 * same pgdat. The lock is held only if pgdat != NULL.
 		 */
 		if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) {
-			spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
+			spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock, flags);
 			locked_pgdat = NULL;
 		}
 
@@ -742,7 +742,7 @@ void release_pages(struct page **pages, int nr)
 
 		if (is_zone_device_page(page)) {
 			if (locked_pgdat) {
-				spin_unlock_irqrestore(&locked_pgdat->lru_lock,
+				spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock,
 						       flags);
 				locked_pgdat = NULL;
 			}
@@ -762,7 +762,7 @@ void release_pages(struct page **pages, int nr)
 
 		if (PageCompound(page)) {
 			if (locked_pgdat) {
-				spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
+				spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock, flags);
 				locked_pgdat = NULL;
 			}
 			__put_compound_page(page);
@@ -774,11 +774,11 @@ void release_pages(struct page **pages, int nr)
 
 			if (pgdat != locked_pgdat) {
 				if (locked_pgdat)
-					spin_unlock_irqrestore(&locked_pgdat->lru_lock,
+					spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock,
 									flags);
 				lock_batch = 0;
 				locked_pgdat = pgdat;
-				spin_lock_irqsave(&locked_pgdat->lru_lock, flags);
+				spin_lock_irqsave(&locked_pgdat->lruvec.lru_lock, flags);
 			}
 
 			lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
@@ -794,7 +794,7 @@ void release_pages(struct page **pages, int nr)
 		list_add(&page->lru, &pages_to_free);
 	}
 	if (locked_pgdat)
-		spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
+		spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock, flags);
 
 	mem_cgroup_uncharge_list(&pages_to_free);
 	free_unref_page_list(&pages_to_free);
@@ -832,7 +832,7 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 	VM_BUG_ON_PAGE(PageCompound(page_tail), page);
 	VM_BUG_ON_PAGE(PageLRU(page_tail), page);
-	lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
+	lockdep_assert_held(&lruvec->lru_lock);
 
 	if (!list)
 		SetPageLRU(page_tail);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c77d1e3761a7..c7a228525df0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1805,7 +1805,7 @@ int isolate_lru_page(struct page *page)
 		pg_data_t *pgdat = page_pgdat(page);
 		struct lruvec *lruvec;
 
-		spin_lock_irq(&pgdat->lru_lock);
+		spin_lock_irq(&pgdat->lruvec.lru_lock);
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 		if (PageLRU(page)) {
 			int lru = page_lru(page);
@@ -1814,7 +1814,7 @@ int isolate_lru_page(struct page *page)
 			del_page_from_lru_list(page, lruvec, lru);
 			ret = 0;
 		}
-		spin_unlock_irq(&pgdat->lru_lock);
+		spin_unlock_irq(&pgdat->lruvec.lru_lock);
 	}
 	return ret;
 }
@@ -1890,9 +1890,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
 		VM_BUG_ON_PAGE(PageLRU(page), page);
 		if (unlikely(!page_evictable(page))) {
 			list_del(&page->lru);
-			spin_unlock_irq(&pgdat->lru_lock);
+			spin_unlock_irq(&pgdat->lruvec.lru_lock);
 			putback_lru_page(page);
-			spin_lock_irq(&pgdat->lru_lock);
+			spin_lock_irq(&pgdat->lruvec.lru_lock);
 			continue;
 		}
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
@@ -1910,10 +1910,10 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
 			del_page_from_lru_list(page, lruvec, lru);
 
 			if (unlikely(PageCompound(page))) {
-				spin_unlock_irq(&pgdat->lru_lock);
+				spin_unlock_irq(&pgdat->lruvec.lru_lock);
 				mem_cgroup_uncharge(page);
 				(*get_compound_page_dtor(page))(page);
-				spin_lock_irq(&pgdat->lru_lock);
+				spin_lock_irq(&pgdat->lruvec.lru_lock);
 			} else
 				list_add(&page->lru, &pages_to_free);
 		} else {
@@ -1976,7 +1976,7 @@ static int current_may_throttle(void)
 
 	lru_add_drain();
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list,
 				     &nr_scanned, sc, lru);
@@ -1988,7 +1988,7 @@ static int current_may_throttle(void)
 	if (global_reclaim(sc))
 		__count_vm_events(item, nr_scanned);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	if (nr_taken == 0)
 		return 0;
@@ -1996,7 +1996,7 @@ static int current_may_throttle(void)
 	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0,
 				&stat, false);
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 
 	item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
 	if (global_reclaim(sc))
@@ -2009,7 +2009,7 @@ static int current_may_throttle(void)
 
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
 
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	mem_cgroup_uncharge_list(&page_list);
 	free_unref_page_list(&page_list);
@@ -2062,7 +2062,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	lru_add_drain();
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold,
 				     &nr_scanned, sc, lru);
@@ -2073,7 +2073,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	__count_vm_events(PGREFILL, nr_scanned);
 	__count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned);
 
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	while (!list_empty(&l_hold)) {
 		cond_resched();
@@ -2119,7 +2119,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	/*
 	 * Move pages back to the lru list.
 	 */
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
 	 * even though only some of them are actually re-activated.  This
@@ -2137,7 +2137,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	__count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
 
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	mem_cgroup_uncharge_list(&l_active);
 	free_unref_page_list(&l_active);
@@ -2373,7 +2373,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
 		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
 		reclaim_stat->recent_scanned[0] /= 2;
 		reclaim_stat->recent_rotated[0] /= 2;
@@ -2394,7 +2394,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 
 	fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	fraction[0] = ap;
 	fraction[1] = fp;
@@ -4263,9 +4263,9 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 		pgscanned++;
 		if (pagepgdat != pgdat) {
 			if (pgdat)
-				spin_unlock_irq(&pgdat->lru_lock);
+				spin_unlock_irq(&pgdat->lruvec.lru_lock);
 			pgdat = pagepgdat;
-			spin_lock_irq(&pgdat->lru_lock);
+			spin_lock_irq(&pgdat->lruvec.lru_lock);
 		}
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 
@@ -4286,7 +4286,7 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 	if (pgdat) {
 		__count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued);
 		__count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
-		spin_unlock_irq(&pgdat->lru_lock);
+		spin_unlock_irq(&pgdat->lruvec.lru_lock);
 	}
 }
 EXPORT_SYMBOL_GPL(check_move_unevictable_pages);
-- 
1.8.3.1


WARNING: multiple messages have this Message-ID (diff)
From: Alex Shi <alex.shi@linux.alibaba.com>
To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Tejun Heo <tj@kernel.org>
Cc: "Alex Shi" <alex.shi@linux.alibaba.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Vladimir Davydov" <vdavydov.dev@gmail.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Wei Yang" <richard.weiyang@gmail.com>,
	"Pavel Tatashin" <pasha.tatashin@oracle.com>,
	"Arun KS" <arunks@codeaurora.org>, "Qian Cai" <cai@lca.pw>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	"David Rientjes" <rientjes@google.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>
Subject: [PATCH 01/14] mm/lru: move pgdat lru_lock into lruvec
Date: Tue, 20 Aug 2019 17:48:24 +0800	[thread overview]
Message-ID: <1566294517-86418-2-git-send-email-alex.shi@linux.alibaba.com> (raw)
In-Reply-To: <1566294517-86418-1-git-send-email-alex.shi@linux.alibaba.com>

This patch move lru_lock into lruvec, give a lru_lock for each of
lruvec, thus bring a lru_lock for each of memcg.

Per memcg lru_lock would ease the lru_lock contention a lot in
this patch series.

In some data center, containers are used widely to deploy different kind
of services, then multiple memcgs share per node pgdat->lru_lock which
cause heavy lock contentions when doing lru operations.
On my 2 socket * 6 cores E5-2630 platform, 24 containers run aim9
simultaneously with mmtests' config:
	# AIM9
	export AIM9_TESTTIME=180
	export AIM9_TESTLIST=page_test,brk_test

perf lock report show much contentions on lru_lock in 20 second snapshot:
        	        Name   acquired  contended   avg wait (ns) total wait (ns)   max wait (ns)   min wait (ns)
	&(ptlock_ptr(pag...         22          0               0	0               0               0
	...
	&(&pgdat->lru_lo...          9          7           12728	89096           26656            1597

With this patch series, lruvec->lru_lock show no contentions
	&(&lruvec->lru_l...          8          0               0	0               0               0

and aim9 page_test/brk_test performance increased 5%~50%.

Now this patch still using per pgdat lru_lock, no function changes yet.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: swkhack <swkhack@gmail.com>
Cc: "Potyra, Stefan" <Stefan.Potyra@elektrobit.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peng Fan <peng.fan@nxp.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/mmzone.h |  4 +++-
 mm/compaction.c        | 10 +++++-----
 mm/huge_memory.c       |  6 +++---
 mm/memcontrol.c        |  6 +++---
 mm/mlock.c             | 10 +++++-----
 mm/mmzone.c            |  1 +
 mm/page_alloc.c        |  1 -
 mm/page_idle.c         |  4 ++--
 mm/swap.c              | 28 ++++++++++++++--------------
 mm/vmscan.c            | 38 +++++++++++++++++++-------------------
 10 files changed, 55 insertions(+), 53 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d77d717c620c..8d0076d084be 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -295,6 +295,9 @@ struct zone_reclaim_stat {
 
 struct lruvec {
 	struct list_head		lists[NR_LRU_LISTS];
+	/* move lru_lock to per lruvec for memcg */
+	spinlock_t			lru_lock;
+
 	struct zone_reclaim_stat	reclaim_stat;
 	/* Evictions & activations on the inactive file list */
 	atomic_long_t			inactive_age;
@@ -744,7 +747,6 @@ struct zonelist {
 
 	/* Write-intensive fields used by page reclaim */
 	ZONE_PADDING(_pad1_)
-	spinlock_t		lru_lock;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 	/*
diff --git a/mm/compaction.c b/mm/compaction.c
index 952dc2fb24e5..9a737f343183 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -846,7 +846,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 		 * a fatal signal is pending.
 		 */
 		if (!(low_pfn % SWAP_CLUSTER_MAX)
-		    && compact_unlock_should_abort(&pgdat->lru_lock,
+		    && compact_unlock_should_abort(&pgdat->lruvec.lru_lock,
 					    flags, &locked, cc)) {
 			low_pfn = 0;
 			goto fatal_pending;
@@ -919,7 +919,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 			if (unlikely(__PageMovable(page)) &&
 					!PageIsolated(page)) {
 				if (locked) {
-					spin_unlock_irqrestore(&pgdat->lru_lock,
+					spin_unlock_irqrestore(&pgdat->lruvec.lru_lock,
 									flags);
 					locked = false;
 				}
@@ -949,7 +949,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 
 		/* If we already hold the lock, we can skip some rechecking */
 		if (!locked) {
-			locked = compact_lock_irqsave(&pgdat->lru_lock,
+			locked = compact_lock_irqsave(&pgdat->lruvec.lru_lock,
 								&flags, cc);
 
 			/* Try get exclusive access under lock */
@@ -1016,7 +1016,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 		 */
 		if (nr_isolated) {
 			if (locked) {
-				spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+				spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 				locked = false;
 			}
 			putback_movable_pages(&cc->migratepages);
@@ -1043,7 +1043,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
 
 isolate_abort:
 	if (locked)
-		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+		spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 
 	/*
 	 * Updated the cached scanner pfn once the pageblock has been scanned
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 738065f765ab..3a483deee807 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2529,7 +2529,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_unlock(&head->mapping->i_pages);
 	}
 
-	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+	spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 
 	remap_page(head);
 
@@ -2740,7 +2740,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		lru_add_drain();
 
 	/* prevent PageLRU to go away from under us, and freeze lru stats */
-	spin_lock_irqsave(&pgdata->lru_lock, flags);
+	spin_lock_irqsave(&pgdata->lruvec.lru_lock, flags);
 
 	if (mapping) {
 		XA_STATE(xas, &mapping->i_pages, page_index(head));
@@ -2785,7 +2785,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		spin_unlock(&pgdata->split_queue_lock);
 fail:		if (mapping)
 			xa_unlock(&mapping->i_pages);
-		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+		spin_unlock_irqrestore(&pgdata->lruvec.lru_lock, flags);
 		remap_page(head);
 		ret = -EBUSY;
 	}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6f5c0c517c49..2792b8ed405f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2557,7 +2557,7 @@ static void lock_page_lru(struct page *page, int *isolated)
 {
 	pg_data_t *pgdat = page_pgdat(page);
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	if (PageLRU(page)) {
 		struct lruvec *lruvec;
 
@@ -2581,7 +2581,7 @@ static void unlock_page_lru(struct page *page, int isolated)
 		SetPageLRU(page);
 		add_page_to_lru_list(page, lruvec, page_lru(page));
 	}
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 }
 
 static void commit_charge(struct page *page, struct mem_cgroup *memcg,
@@ -2901,7 +2901,7 @@ void __memcg_kmem_uncharge(struct page *page, int order)
 
 /*
  * Because tail pages are not marked as "used", set it. We're under
- * pgdat->lru_lock and migration entries setup in all page mappings.
+ * pgdat->lruvec.lru_lock and migration entries setup in all page mappings.
  */
 void mem_cgroup_split_huge_fixup(struct page *head)
 {
diff --git a/mm/mlock.c b/mm/mlock.c
index a90099da4fb4..1279684bada0 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -194,7 +194,7 @@ unsigned int munlock_vma_page(struct page *page)
 	 * might otherwise copy PageMlocked to part of the tail pages before
 	 * we clear it in the head page. It also stabilizes hpage_nr_pages().
 	 */
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 
 	if (!TestClearPageMlocked(page)) {
 		/* Potentially, PTE-mapped THP: do not skip the rest PTEs */
@@ -206,14 +206,14 @@ unsigned int munlock_vma_page(struct page *page)
 	__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
 
 	if (__munlock_isolate_lru_page(page, true)) {
-		spin_unlock_irq(&pgdat->lru_lock);
+		spin_unlock_irq(&pgdat->lruvec.lru_lock);
 		__munlock_isolated_page(page);
 		goto out;
 	}
 	__munlock_isolation_failed(page);
 
 unlock_out:
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 out:
 	return nr_pages - 1;
@@ -298,7 +298,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 	pagevec_init(&pvec_putback);
 
 	/* Phase 1: page isolation */
-	spin_lock_irq(&zone->zone_pgdat->lru_lock);
+	spin_lock_irq(&zone->zone_pgdat->lruvec.lru_lock);
 	for (i = 0; i < nr; i++) {
 		struct page *page = pvec->pages[i];
 
@@ -325,7 +325,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 		pvec->pages[i] = NULL;
 	}
 	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
-	spin_unlock_irq(&zone->zone_pgdat->lru_lock);
+	spin_unlock_irq(&zone->zone_pgdat->lruvec.lru_lock);
 
 	/* Now we can release pins of pages that we are not munlocking */
 	pagevec_release(&pvec_putback);
diff --git a/mm/mmzone.c b/mm/mmzone.c
index 4686fdc23bb9..3750a90ed4a0 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -91,6 +91,7 @@ void lruvec_init(struct lruvec *lruvec)
 	enum lru_list lru;
 
 	memset(lruvec, 0, sizeof(struct lruvec));
+	spin_lock_init(&lruvec->lru_lock);
 
 	for_each_lru(lru)
 		INIT_LIST_HEAD(&lruvec->lists[lru]);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 272c6de1bf4e..1b07dcaabbd7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6677,7 +6677,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
 	init_waitqueue_head(&pgdat->pfmemalloc_wait);
 
 	pgdat_page_ext_init(pgdat);
-	spin_lock_init(&pgdat->lru_lock);
 	lruvec_init(node_lruvec(pgdat));
 }
 
diff --git a/mm/page_idle.c b/mm/page_idle.c
index 295512465065..420bc0ac8c1e 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -42,12 +42,12 @@ static struct page *page_idle_get_page(unsigned long pfn)
 		return NULL;
 
 	pgdat = page_pgdat(page);
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	if (unlikely(!PageLRU(page))) {
 		put_page(page);
 		page = NULL;
 	}
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 	return page;
 }
 
diff --git a/mm/swap.c b/mm/swap.c
index ae300397dfda..63f4782af57a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -63,12 +63,12 @@ static void __page_cache_release(struct page *page)
 		struct lruvec *lruvec;
 		unsigned long flags;
 
-		spin_lock_irqsave(&pgdat->lru_lock, flags);
+		spin_lock_irqsave(&pgdat->lruvec.lru_lock, flags);
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 		VM_BUG_ON_PAGE(!PageLRU(page), page);
 		__ClearPageLRU(page);
 		del_page_from_lru_list(page, lruvec, page_off_lru(page));
-		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+		spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 	}
 	__ClearPageWaiters(page);
 	mem_cgroup_uncharge(page);
@@ -201,16 +201,16 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 
 		if (pagepgdat != pgdat) {
 			if (pgdat)
-				spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+				spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 			pgdat = pagepgdat;
-			spin_lock_irqsave(&pgdat->lru_lock, flags);
+			spin_lock_irqsave(&pgdat->lruvec.lru_lock, flags);
 		}
 
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 		(*move_fn)(page, lruvec, arg);
 	}
 	if (pgdat)
-		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+		spin_unlock_irqrestore(&pgdat->lruvec.lru_lock, flags);
 	release_pages(pvec->pages, pvec->nr);
 	pagevec_reinit(pvec);
 }
@@ -326,9 +326,9 @@ void activate_page(struct page *page)
 	pg_data_t *pgdat = page_pgdat(page);
 
 	page = compound_head(page);
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	__activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL);
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 }
 #endif
 
@@ -733,7 +733,7 @@ void release_pages(struct page **pages, int nr)
 		 * same pgdat. The lock is held only if pgdat != NULL.
 		 */
 		if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) {
-			spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
+			spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock, flags);
 			locked_pgdat = NULL;
 		}
 
@@ -742,7 +742,7 @@ void release_pages(struct page **pages, int nr)
 
 		if (is_zone_device_page(page)) {
 			if (locked_pgdat) {
-				spin_unlock_irqrestore(&locked_pgdat->lru_lock,
+				spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock,
 						       flags);
 				locked_pgdat = NULL;
 			}
@@ -762,7 +762,7 @@ void release_pages(struct page **pages, int nr)
 
 		if (PageCompound(page)) {
 			if (locked_pgdat) {
-				spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
+				spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock, flags);
 				locked_pgdat = NULL;
 			}
 			__put_compound_page(page);
@@ -774,11 +774,11 @@ void release_pages(struct page **pages, int nr)
 
 			if (pgdat != locked_pgdat) {
 				if (locked_pgdat)
-					spin_unlock_irqrestore(&locked_pgdat->lru_lock,
+					spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock,
 									flags);
 				lock_batch = 0;
 				locked_pgdat = pgdat;
-				spin_lock_irqsave(&locked_pgdat->lru_lock, flags);
+				spin_lock_irqsave(&locked_pgdat->lruvec.lru_lock, flags);
 			}
 
 			lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
@@ -794,7 +794,7 @@ void release_pages(struct page **pages, int nr)
 		list_add(&page->lru, &pages_to_free);
 	}
 	if (locked_pgdat)
-		spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
+		spin_unlock_irqrestore(&locked_pgdat->lruvec.lru_lock, flags);
 
 	mem_cgroup_uncharge_list(&pages_to_free);
 	free_unref_page_list(&pages_to_free);
@@ -832,7 +832,7 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 	VM_BUG_ON_PAGE(PageCompound(page_tail), page);
 	VM_BUG_ON_PAGE(PageLRU(page_tail), page);
-	lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
+	lockdep_assert_held(&lruvec->lru_lock);
 
 	if (!list)
 		SetPageLRU(page_tail);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c77d1e3761a7..c7a228525df0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1805,7 +1805,7 @@ int isolate_lru_page(struct page *page)
 		pg_data_t *pgdat = page_pgdat(page);
 		struct lruvec *lruvec;
 
-		spin_lock_irq(&pgdat->lru_lock);
+		spin_lock_irq(&pgdat->lruvec.lru_lock);
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 		if (PageLRU(page)) {
 			int lru = page_lru(page);
@@ -1814,7 +1814,7 @@ int isolate_lru_page(struct page *page)
 			del_page_from_lru_list(page, lruvec, lru);
 			ret = 0;
 		}
-		spin_unlock_irq(&pgdat->lru_lock);
+		spin_unlock_irq(&pgdat->lruvec.lru_lock);
 	}
 	return ret;
 }
@@ -1890,9 +1890,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
 		VM_BUG_ON_PAGE(PageLRU(page), page);
 		if (unlikely(!page_evictable(page))) {
 			list_del(&page->lru);
-			spin_unlock_irq(&pgdat->lru_lock);
+			spin_unlock_irq(&pgdat->lruvec.lru_lock);
 			putback_lru_page(page);
-			spin_lock_irq(&pgdat->lru_lock);
+			spin_lock_irq(&pgdat->lruvec.lru_lock);
 			continue;
 		}
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
@@ -1910,10 +1910,10 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
 			del_page_from_lru_list(page, lruvec, lru);
 
 			if (unlikely(PageCompound(page))) {
-				spin_unlock_irq(&pgdat->lru_lock);
+				spin_unlock_irq(&pgdat->lruvec.lru_lock);
 				mem_cgroup_uncharge(page);
 				(*get_compound_page_dtor(page))(page);
-				spin_lock_irq(&pgdat->lru_lock);
+				spin_lock_irq(&pgdat->lruvec.lru_lock);
 			} else
 				list_add(&page->lru, &pages_to_free);
 		} else {
@@ -1976,7 +1976,7 @@ static int current_may_throttle(void)
 
 	lru_add_drain();
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list,
 				     &nr_scanned, sc, lru);
@@ -1988,7 +1988,7 @@ static int current_may_throttle(void)
 	if (global_reclaim(sc))
 		__count_vm_events(item, nr_scanned);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	if (nr_taken == 0)
 		return 0;
@@ -1996,7 +1996,7 @@ static int current_may_throttle(void)
 	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0,
 				&stat, false);
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 
 	item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
 	if (global_reclaim(sc))
@@ -2009,7 +2009,7 @@ static int current_may_throttle(void)
 
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
 
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	mem_cgroup_uncharge_list(&page_list);
 	free_unref_page_list(&page_list);
@@ -2062,7 +2062,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	lru_add_drain();
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold,
 				     &nr_scanned, sc, lru);
@@ -2073,7 +2073,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	__count_vm_events(PGREFILL, nr_scanned);
 	__count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned);
 
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	while (!list_empty(&l_hold)) {
 		cond_resched();
@@ -2119,7 +2119,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	/*
 	 * Move pages back to the lru list.
 	 */
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
 	 * even though only some of them are actually re-activated.  This
@@ -2137,7 +2137,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	__count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
 
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	mem_cgroup_uncharge_list(&l_active);
 	free_unref_page_list(&l_active);
@@ -2373,7 +2373,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
 		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
 
-	spin_lock_irq(&pgdat->lru_lock);
+	spin_lock_irq(&pgdat->lruvec.lru_lock);
 	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
 		reclaim_stat->recent_scanned[0] /= 2;
 		reclaim_stat->recent_rotated[0] /= 2;
@@ -2394,7 +2394,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 
 	fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
-	spin_unlock_irq(&pgdat->lru_lock);
+	spin_unlock_irq(&pgdat->lruvec.lru_lock);
 
 	fraction[0] = ap;
 	fraction[1] = fp;
@@ -4263,9 +4263,9 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 		pgscanned++;
 		if (pagepgdat != pgdat) {
 			if (pgdat)
-				spin_unlock_irq(&pgdat->lru_lock);
+				spin_unlock_irq(&pgdat->lruvec.lru_lock);
 			pgdat = pagepgdat;
-			spin_lock_irq(&pgdat->lru_lock);
+			spin_lock_irq(&pgdat->lruvec.lru_lock);
 		}
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
 
@@ -4286,7 +4286,7 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 	if (pgdat) {
 		__count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued);
 		__count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
-		spin_unlock_irq(&pgdat->lru_lock);
+		spin_unlock_irq(&pgdat->lruvec.lru_lock);
 	}
 }
 EXPORT_SYMBOL_GPL(check_move_unevictable_pages);
-- 
1.8.3.1


  reply	other threads:[~2019-08-20  9:49 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-20  9:48 [PATCH 00/14] per memcg lru_lock Alex Shi
2019-08-20  9:48 ` Alex Shi [this message]
2019-08-20  9:48   ` [PATCH 01/14] mm/lru: move pgdat lru_lock into lruvec Alex Shi
2019-08-20 13:40   ` Matthew Wilcox
2019-08-20 13:40     ` Matthew Wilcox
2019-08-20 14:11     ` Alex Shi
2019-08-20 14:11       ` Alex Shi
2019-08-20  9:48 ` [PATCH 02/14] lru/memcg: move the lruvec->pgdat sync out lru_lock Alex Shi
2019-08-20  9:48 ` [PATCH 03/14] lru/memcg: using per lruvec lock in un/lock_page_lru Alex Shi
2019-08-26  8:30   ` Konstantin Khlebnikov
2019-08-26 14:16     ` Alex Shi
2019-08-20  9:48 ` [PATCH 04/14] lru/compaction: use per lruvec lock in isolate_migratepages_block Alex Shi
2019-08-20  9:48 ` [PATCH 05/14] lru/huge_page: use per lruvec lock in __split_huge_page Alex Shi
2019-08-20  9:48 ` [PATCH 06/14] lru/mlock: using per lruvec lock in munlock Alex Shi
2019-08-20  9:48 ` [PATCH 07/14] lru/swap: using per lruvec lock in page_cache_release Alex Shi
2019-08-20  9:48 ` [PATCH 08/14] lru/swap: uer lruvec lock in activate_page Alex Shi
2019-08-20  9:48 ` [PATCH 09/14] lru/swap: uer per lruvec lock in pagevec_lru_move_fn Alex Shi
2019-08-20  9:48 ` [PATCH 10/14] lru/swap: use per lruvec lock in release_pages Alex Shi
2019-08-20  9:48 ` [PATCH 11/14] lru/vmscan: using per lruvec lock in lists shrinking Alex Shi
2019-08-20  9:48 ` [PATCH 12/14] lru/vmscan: use pre lruvec lock in check_move_unevictable_pages Alex Shi
2019-08-20  9:48 ` [PATCH 13/14] lru/vmscan: using per lruvec lru_lock in get_scan_count Alex Shi
2019-08-20  9:48 ` [PATCH 14/14] mm/lru: fix the comments of lru_lock Alex Shi
2019-08-20  9:48   ` Alex Shi
2019-08-20 14:00   ` Matthew Wilcox
2019-08-20 14:00     ` Matthew Wilcox
2019-08-20 14:21     ` Alex Shi
2019-08-20 14:21       ` Alex Shi
2019-08-20 10:45 ` [PATCH 00/14] per memcg lru_lock Michal Hocko
2019-08-20 16:48   ` Shakeel Butt
2019-08-20 16:48     ` Shakeel Butt
2019-08-20 18:24     ` Hugh Dickins
2019-08-20 18:24       ` Hugh Dickins
2019-08-21  1:21       ` Alex Shi
2019-08-21  2:00       ` Alex Shi
2019-08-24  1:59         ` Hugh Dickins
2019-08-24  1:59           ` Hugh Dickins
2019-08-26 14:35           ` Alex Shi
2019-08-21 18:00 ` Daniel Jordan
2019-08-22 11:56   ` Alex Shi
2019-08-22 15:20     ` Daniel Jordan
2019-08-26  8:39       ` Konstantin Khlebnikov
2019-08-26 14:22         ` Alex Shi
2019-08-26 14:25       ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1566294517-86418-2-git-send-email-alex.shi@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=Stefan.Potyra@elektrobit.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=arunks@codeaurora.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=cai@lca.pw \
    --cc=cgroups@vger.kernel.org \
    --cc=colin.king@canonical.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@oracle.com \
    --cc=peng.fan@nxp.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=swkhack@gmail.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.