linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, willy@infradead.org,
	hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	shakeelb@google.com, iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com, kirill@shutemov.name,
	alexander.duyck@gmail.com, rong.a.chen@intel.com,
	mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Subject: [PATCH v18 32/32] mm: Split release_pages work into 3 passes
Date: Mon, 24 Aug 2020 20:55:05 +0800	[thread overview]
Message-ID: <1598273705-69124-33-git-send-email-alex.shi@linux.alibaba.com> (raw)
In-Reply-To: <1598273705-69124-1-git-send-email-alex.shi@linux.alibaba.com>

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

The release_pages function has a number of paths that end up with the
LRU lock having to be released and reacquired. Such an example would be the
freeing of THP pages as it requires releasing the LRU lock so that it can
be potentially reacquired by __put_compound_page.

In order to avoid that we can split the work into 3 passes, the first
without the LRU lock to go through and sort out those pages that are not in
the LRU so they can be freed immediately from those that can't. The second
pass will then go through removing those pages from the LRU in batches as
large as a pagevec can hold before freeing the LRU lock. Once the pages have
been removed from the LRU we can then proceed to free the remaining pages
without needing to worry about if they are in the LRU any further.

The general idea is to avoid bouncing the LRU lock between pages and to
hopefully aggregate the lock for up to the full page vector worth of pages.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
---
 mm/swap.c | 109 ++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index fe53449fa1b8..b405f81b2c60 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -795,6 +795,54 @@ void lru_add_drain_all(void)
 }
 #endif
 
+static void __release_page(struct page *page, struct list_head *pages_to_free)
+{
+	if (PageCompound(page)) {
+		__put_compound_page(page);
+	} else {
+		/* Clear Active bit in case of parallel mark_page_accessed */
+		__ClearPageActive(page);
+		__ClearPageWaiters(page);
+
+		list_add(&page->lru, pages_to_free);
+	}
+}
+
+static void __release_lru_pages(struct pagevec *pvec,
+				struct list_head *pages_to_free)
+{
+	struct lruvec *lruvec = NULL;
+	unsigned long flags = 0;
+	int i;
+
+	/*
+	 * The pagevec at this point should contain a set of pages with
+	 * their reference count at 0 and the LRU flag set. We will now
+	 * need to pull the pages from their LRU lists.
+	 *
+	 * We walk the list backwards here since that way we are starting at
+	 * the pages that should be warmest in the cache.
+	 */
+	for (i = pagevec_count(pvec); i--;) {
+		struct page *page = pvec->pages[i];
+
+		lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags);
+		VM_BUG_ON_PAGE(!PageLRU(page), page);
+		__ClearPageLRU(page);
+		del_page_from_lru_list(page, lruvec, page_off_lru(page));
+	}
+
+	unlock_page_lruvec_irqrestore(lruvec, flags);
+
+	/*
+	 * A batch of pages are no longer on the LRU list. Go through and
+	 * start the final process of returning the deferred pages to their
+	 * appropriate freelists.
+	 */
+	for (i = pagevec_count(pvec); i--;)
+		__release_page(pvec->pages[i], pages_to_free);
+}
+
 /**
  * release_pages - batched put_page()
  * @pages: array of pages to release
@@ -806,32 +854,24 @@ void lru_add_drain_all(void)
 void release_pages(struct page **pages, int nr)
 {
 	int i;
+	struct pagevec pvec;
 	LIST_HEAD(pages_to_free);
-	struct lruvec *lruvec = NULL;
-	unsigned long flags;
-	unsigned int lock_batch;
 
+	pagevec_init(&pvec);
+
+	/*
+	 * We need to first walk through the list cleaning up the low hanging
+	 * fruit and clearing those pages that either cannot be freed or that
+	 * are non-LRU. We will store the LRU pages in a pagevec so that we
+	 * can get to them in the next pass.
+	 */
 	for (i = 0; i < nr; i++) {
 		struct page *page = pages[i];
 
-		/*
-		 * Make sure the IRQ-safe lock-holding time does not get
-		 * excessive with a continuous string of pages from the
-		 * same lruvec. The lock is held only if lruvec != NULL.
-		 */
-		if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) {
-			unlock_page_lruvec_irqrestore(lruvec, flags);
-			lruvec = NULL;
-		}
-
 		if (is_huge_zero_page(page))
 			continue;
 
 		if (is_zone_device_page(page)) {
-			if (lruvec) {
-				unlock_page_lruvec_irqrestore(lruvec, flags);
-				lruvec = NULL;
-			}
 			/*
 			 * ZONE_DEVICE pages that return 'false' from
 			 * put_devmap_managed_page() do not require special
@@ -848,36 +888,21 @@ void release_pages(struct page **pages, int nr)
 		if (!put_page_testzero(page))
 			continue;
 
-		if (PageCompound(page)) {
-			if (lruvec) {
-				unlock_page_lruvec_irqrestore(lruvec, flags);
-				lruvec = NULL;
-			}
-			__put_compound_page(page);
+		if (!PageLRU(page)) {
+			__release_page(page, &pages_to_free);
 			continue;
 		}
 
-		if (PageLRU(page)) {
-			struct lruvec *prev_lruvec = lruvec;
-
-			lruvec = relock_page_lruvec_irqsave(page, lruvec,
-									&flags);
-			if (prev_lruvec != lruvec)
-				lock_batch = 0;
-
-			VM_BUG_ON_PAGE(!PageLRU(page), page);
-			__ClearPageLRU(page);
-			del_page_from_lru_list(page, lruvec, page_off_lru(page));
+		/* record page so we can get it in the next pass */
+		if (!pagevec_add(&pvec, page)) {
+			__release_lru_pages(&pvec, &pages_to_free);
+			pagevec_reinit(&pvec);
 		}
-
-		/* Clear Active bit in case of parallel mark_page_accessed */
-		__ClearPageActive(page);
-		__ClearPageWaiters(page);
-
-		list_add(&page->lru, &pages_to_free);
 	}
-	if (lruvec)
-		unlock_page_lruvec_irqrestore(lruvec, flags);
+
+	/* flush any remaining LRU pages that need to be processed */
+	if (pagevec_count(&pvec))
+		__release_lru_pages(&pvec, &pages_to_free);
 
 	mem_cgroup_uncharge_list(&pages_to_free);
 	free_unref_page_list(&pages_to_free);
-- 
1.8.3.1



  parent reply	other threads:[~2020-08-24 13:01 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 12:54 [PATCH v18 00/32] per memcg lru_lock Alex Shi
2020-08-24 12:54 ` [PATCH v18 01/32] mm/memcg: warning on !memcg after readahead page charged Alex Shi
2020-08-24 12:54 ` [PATCH v18 02/32] mm/memcg: bail out early from swap accounting when memcg is disabled Alex Shi
2020-08-24 12:54 ` [PATCH v18 03/32] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-08-24 12:54 ` [PATCH v18 04/32] mm/thp: clean up lru_add_page_tail Alex Shi
2020-08-24 12:54 ` [PATCH v18 05/32] mm/thp: remove code path which never got into Alex Shi
2020-08-24 12:54 ` [PATCH v18 06/32] mm/thp: narrow lru locking Alex Shi
2020-09-10 13:49   ` Matthew Wilcox
2020-09-11  3:37     ` Alex Shi
2020-09-13 15:27       ` Matthew Wilcox
2020-09-19  1:00         ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 07/32] mm/swap.c: stop deactivate_file_page if page not on lru Alex Shi
2020-08-24 12:54 ` [PATCH v18 08/32] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-08-24 12:54 ` [PATCH v18 09/32] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-08-24 12:54 ` [PATCH v18 10/32] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-08-24 12:54 ` [PATCH v18 11/32] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-08-24 12:54 ` [PATCH v18 12/32] mm/memcg: optimize mem_cgroup_page_lruvec Alex Shi
2020-08-24 12:54 ` [PATCH v18 13/32] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-08-24 12:54 ` [PATCH v18 14/32] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-08-24 12:54 ` [PATCH v18 15/32] mm/lru: move lock into lru_note_cost Alex Shi
2020-09-21 21:36   ` Hugh Dickins
2020-09-21 22:03     ` Hugh Dickins
2020-09-22  3:39       ` Alex Shi
2020-09-22  3:38     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 16/32] mm/lru: introduce TestClearPageLRU Alex Shi
2020-09-21 23:16   ` Hugh Dickins
2020-09-22  3:53     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 17/32] mm/compaction: do page isolation first in compaction Alex Shi
2020-09-21 23:49   ` Hugh Dickins
2020-09-22  4:57     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 18/32] mm/thp: add tail pages into lru anyway in split_huge_page() Alex Shi
2020-08-24 12:54 ` [PATCH v18 19/32] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-09-22  0:42   ` Hugh Dickins
2020-09-22  5:00     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 20/32] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-09-22  5:27   ` Hugh Dickins
2020-09-22  8:58     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 21/32] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-09-22  5:40   ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 22/32] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-09-22  5:44   ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 23/32] mm/lru: revise the comments of lru_lock Alex Shi
2020-09-22  5:48   ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 24/32] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-09-22  5:53   ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 25/32] mm/mlock: remove lru_lock on TestClearPageMlocked in munlock_vma_page Alex Shi
2020-08-26  5:52   ` Alex Shi
2020-09-22  6:13   ` Hugh Dickins
2020-09-23  1:58     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 26/32] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-08-24 12:55 ` [PATCH v18 27/32] mm/swap.c: optimizing __pagevec_lru_add lru_lock Alex Shi
2020-08-26  9:07   ` Alex Shi
2020-08-24 12:55 ` [PATCH v18 28/32] mm/compaction: Drop locked from isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 29/32] mm: Identify compound pages sooner in isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 30/32] mm: Drop use of test_and_set_skip in favor of just setting skip Alex Shi
2020-08-24 12:55 ` [PATCH v18 31/32] mm: Add explicit page decrement in exception path for isolate_lru_pages Alex Shi
2020-09-09  1:01   ` Matthew Wilcox
2020-09-09 15:43     ` Alexander Duyck
2020-09-09 17:07       ` Matthew Wilcox
2020-09-09 18:24       ` Hugh Dickins
2020-09-09 20:15         ` Matthew Wilcox
2020-09-09 21:05           ` Hugh Dickins
2020-09-09 21:17         ` Alexander Duyck
2020-08-24 12:55 ` Alex Shi [this message]
2020-08-24 18:42 ` [PATCH v18 00/32] per memcg lru_lock Andrew Morton
2020-08-24 20:24   ` Hugh Dickins
2020-08-25  1:56     ` Daniel Jordan
2020-08-25  3:26       ` Alex Shi
2020-08-25 11:39         ` Matthew Wilcox
2020-08-26  1:19         ` Daniel Jordan
2020-08-26  8:59           ` Alex Shi
2020-08-28  1:40             ` Daniel Jordan
2020-08-28  5:22               ` Alex Shi
2020-09-09  2:44               ` Aaron Lu
2020-09-09 11:40                 ` Michal Hocko
2020-08-25  8:52       ` Alex Shi
2020-08-25 13:00         ` Alex Shi
2020-08-27  7:01     ` Hugh Dickins
2020-08-27 12:20       ` Race between freeing and waking page Matthew Wilcox
2020-09-08 23:41       ` [PATCH v18 00/32] per memcg lru_lock: reviews Hugh Dickins
2020-09-09  2:24         ` Wei Yang
2020-09-09 15:08         ` Alex Shi
2020-09-09 23:16           ` Hugh Dickins
2020-09-11  2:50             ` Alex Shi
2020-09-12  2:13               ` Hugh Dickins
2020-09-13 14:21                 ` Alex Shi
2020-09-15  8:21                   ` Hugh Dickins
2020-09-15 16:58                     ` Daniel Jordan
2020-09-16 12:44                       ` Alex Shi
2020-09-17  2:37                       ` Alex Shi
2020-09-17 14:35                         ` Daniel Jordan
2020-09-17 15:39                           ` Alexander Duyck
2020-09-17 16:48                             ` Daniel Jordan
2020-09-12  8:38           ` Hugh Dickins
2020-09-13 14:22             ` Alex Shi
2020-09-09 16:11         ` Alexander Duyck
2020-09-10  0:32           ` Hugh Dickins
2020-09-10 14:24             ` Alexander Duyck
2020-09-12  5:12               ` Hugh Dickins
2020-08-25  7:21   ` [PATCH v18 00/32] per memcg lru_lock Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1598273705-69124-33-git-send-email-alex.shi@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).