linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: hughd@google.com, kirill.shutemov@linux.intel.com,
	aarcange@redhat.com, akpm@linux-foundation.org
Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially
Date: Wed,  4 Dec 2019 08:42:54 +0800	[thread overview]
Message-ID: <1575420174-19171-1-git-send-email-yang.shi@linux.alibaba.com> (raw)

Currently when truncating shmem file, if the range is partial of THP
(start or end is in the middle of THP), the pages actually will just get
cleared rather than being freed unless the range cover the whole THP.
Even though all the subpages are truncated (randomly or sequentially),
the THP may still be kept in page cache.  This might be fine for some
usecases which prefer preserving THP.

But, when doing balloon inflation in QEMU, QEMU actually does hole punch
or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
So, when using shmem THP as memory backend QEMU inflation actually doesn't
work as expected since it doesn't free memory.  But, the inflation
usecase really needs get the memory freed.  Anonymous THP will not get
freed right away too but it will be freed eventually when all subpages are
unmapped, but shmem THP would still stay in page cache.

Split THP right away when doing partial hole punch, and if split fails
just clear the page so that read to the hole punched area would return
zero.

Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v2: * Adopted the comment from Kirill.
    * Dropped fallocate mode flag, THP split is the default behavior.
    * Blended Huge's implementation with my v1 patch. TBH I'm not very keen to
      Hugh's find_get_entries() hack (basically neutral), but without that hack
      we have to rely on pagevec_release() to release extra pins and play with
      goto. This version does in this way. The patch is bigger than Hugh's due
      to extra comments to make the flow clear.

 mm/shmem.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 83 insertions(+), 37 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 220be9f..1ae0c7f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	long nr_swaps_freed = 0;
 	pgoff_t index;
 	int i;
+	bool split = false;
+	struct page *page = NULL;
 
 	if (lend == -1)
 		end = -1;	/* unsigned, so actually very big */
 
 	pagevec_init(&pvec);
 	index = start;
+retry:
 	while (index < end) {
 		pvec.nr = find_get_entries(mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE),
@@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 		if (!pvec.nr)
 			break;
 		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
+			split = false;
+			page = pvec.pages[i];
 
 			index = indices[i];
 			if (index >= end)
@@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			if (!trylock_page(page))
 				continue;
 
-			if (PageTransTail(page)) {
-				/* Middle of THP: zero out the page */
-				clear_highpage(page);
-				unlock_page(page);
-				continue;
-			} else if (PageTransHuge(page)) {
-				if (index == round_down(end, HPAGE_PMD_NR)) {
+			if (PageTransCompound(page) && !unfalloc) {
+				if (PageHead(page) &&
+				    index != round_down(end, HPAGE_PMD_NR)) {
 					/*
-					 * Range ends in the middle of THP:
-					 * zero out the page
+					 * Fall through when punching whole
+					 * THP.
 					 */
-					clear_highpage(page);
-					unlock_page(page);
-					continue;
+					index += HPAGE_PMD_NR - 1;
+					i += HPAGE_PMD_NR - 1;
+				} else {
+					/*
+					 * Split THP for any partial hole
+					 * punch.
+					 */
+					get_page(page);
+					split = true;
+					goto split;
 				}
-				index += HPAGE_PMD_NR - 1;
-				i += HPAGE_PMD_NR - 1;
 			}
 
 			if (!unfalloc || !PageUptodate(page)) {
@@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			}
 			unlock_page(page);
 		}
+split:
 		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		cond_resched();
+
+		if (split) {
+			/*
+			 * The pagevec_release() released all extra pins
+			 * from pagevec lookup.  And we hold an extra pin
+			 * and still have the page locked under us.
+			 */
+			if (!split_huge_page(page)) {
+				unlock_page(page);
+				put_page(page);
+				/* Re-lookup page cache from current index */
+				goto retry;
+			}
+
+			/* Fail to split THP, move to next index */
+			unlock_page(page);
+			put_page(page);
+		}
+
 		index++;
 	}
 
@@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 		return;
 
 	index = start;
+again:
 	while (index < end) {
 		cond_resched();
 
@@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			continue;
 		}
 		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
+			split = false;
+			page = pvec.pages[i];
 
 			index = indices[i];
 			if (index >= end)
@@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 
 			lock_page(page);
 
-			if (PageTransTail(page)) {
-				/* Middle of THP: zero out the page */
-				clear_highpage(page);
-				unlock_page(page);
-				/*
-				 * Partial thp truncate due 'start' in middle
-				 * of THP: don't need to look on these pages
-				 * again on !pvec.nr restart.
-				 */
-				if (index != round_down(end, HPAGE_PMD_NR))
-					start++;
-				continue;
-			} else if (PageTransHuge(page)) {
-				if (index == round_down(end, HPAGE_PMD_NR)) {
+			if (PageTransCompound(page) && !unfalloc) {
+				if (PageHead(page) &&
+				    index != round_down(end, HPAGE_PMD_NR)) {
 					/*
-					 * Range ends in the middle of THP:
-					 * zero out the page
+					 * Fall through when punching whole
+					 * THP.
 					 */
-					clear_highpage(page);
-					unlock_page(page);
-					continue;
+					index += HPAGE_PMD_NR - 1;
+					i += HPAGE_PMD_NR - 1;
+				} else {
+					/*
+					 * Split THP for any partial hole
+					 * punch.
+					 */
+					get_page(page);
+					split = true;
+					goto rescan_split;
 				}
-				index += HPAGE_PMD_NR - 1;
-				i += HPAGE_PMD_NR - 1;
 			}
 
 			if (!unfalloc || !PageUptodate(page)) {
@@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			}
 			unlock_page(page);
 		}
+rescan_split:
 		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
+
+		if (split) {
+			/*
+			 * The pagevec_release() released all extra pins
+			 * from pagevec lookup.  And we hold an extra pin
+			 * and still have the page locked under us.
+			 */
+			if (!split_huge_page(page)) {
+				unlock_page(page);
+				put_page(page);
+				/* Re-lookup page cache from current index */
+				goto again;
+			}
+
+			/*
+			 * Split fail, clear the page then move to next
+			 * index.
+			 */
+			clear_highpage(page);
+
+			unlock_page(page);
+			put_page(page);
+		}
+
 		index++;
 	}
 
-- 
1.8.3.1


             reply	other threads:[~2019-12-04  0:43 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04  0:42 Yang Shi [this message]
2019-12-05  0:15 ` [v2 PATCH] mm: shmem: allow split THP when truncating THP partially Hugh Dickins
2019-12-05  0:50   ` Yang Shi
2020-01-14 19:28   ` Yang Shi
2020-02-04 23:27     ` Yang Shi
2020-02-14  0:38       ` Yang Shi
2020-02-14 15:40         ` Kirill A. Shutemov
2020-02-14 17:17           ` Yang Shi
2020-02-25  3:46     ` Hugh Dickins
2020-02-25 18:02       ` David Hildenbrand
2020-02-25 20:31         ` Hugh Dickins
2020-02-26 17:43       ` Yang Shi
2020-02-27  1:16         ` Matthew Wilcox
2020-02-27  1:47           ` Hugh Dickins
2020-02-27  1:37         ` Hugh Dickins
2020-02-20 18:16 ` Alexander Duyck
2020-02-21  9:07   ` Michael S. Tsirkin
2020-02-21  9:36     ` David Hildenbrand
2020-02-22  0:39       ` Alexander Duyck
2020-02-24 10:22         ` David Hildenbrand
2020-02-25  0:13           ` Alexander Duyck
2020-02-25  8:09             ` David Hildenbrand
2020-02-25 16:42               ` Alexander Duyck
2020-02-21 18:24   ` Yang Shi
2020-02-22  0:24     ` Alexander Duyck
2020-02-26 17:31       ` Yang Shi
2020-02-26 17:45         ` David Hildenbrand
2020-02-26 18:00           ` Yang Shi
2020-02-27  0:56         ` Hugh Dickins
2020-02-27  1:14           ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1575420174-19171-1-git-send-email-yang.shi@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).