linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: kirill.shutemov@linux.intel.com, aarcange@redhat.com,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially
Date: Tue, 14 Jan 2020 11:28:41 -0800	[thread overview]
Message-ID: <00f0bb7d-3c25-a65f-ea94-3e2de8e9bcdd@linux.alibaba.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1912041601270.12930@eggly.anvils>



On 12/4/19 4:15 PM, Hugh Dickins wrote:
> On Wed, 4 Dec 2019, Yang Shi wrote:
>
>> Currently when truncating shmem file, if the range is partial of THP
>> (start or end is in the middle of THP), the pages actually will just get
>> cleared rather than being freed unless the range cover the whole THP.
>> Even though all the subpages are truncated (randomly or sequentially),
>> the THP may still be kept in page cache.  This might be fine for some
>> usecases which prefer preserving THP.
>>
>> But, when doing balloon inflation in QEMU, QEMU actually does hole punch
>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
>> So, when using shmem THP as memory backend QEMU inflation actually doesn't
>> work as expected since it doesn't free memory.  But, the inflation
>> usecase really needs get the memory freed.  Anonymous THP will not get
>> freed right away too but it will be freed eventually when all subpages are
>> unmapped, but shmem THP would still stay in page cache.
>>
>> Split THP right away when doing partial hole punch, and if split fails
>> just clear the page so that read to the hole punched area would return
>> zero.
>>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>> v2: * Adopted the comment from Kirill.
>>      * Dropped fallocate mode flag, THP split is the default behavior.
>>      * Blended Huge's implementation with my v1 patch. TBH I'm not very keen to
>>        Hugh's find_get_entries() hack (basically neutral), but without that hack
> Thanks for giving it a try.  I'm not neutral about my find_get_entries()
> hack: it surely had to go (without it, I'd have just pushed my own patch).
> I've not noticed anything wrong with your patch, and it's in the right
> direction, but I'm still not thrilled with it.  I also remember that I
> got the looping wrong in my first internal attempt (fixed in what I sent),
> and need to be very sure of the try-again-versus-move-on-to-next conditions
> before agreeing to anything.  No rush, I'll come back to this in days or
> month ahead: I'll try to find a less gotoey blend of yours and mine.

Hi Hugh,

Any update on this one?

Thanks,
Yang

>
> Hugh
>
>>        we have to rely on pagevec_release() to release extra pins and play with
>>        goto. This version does in this way. The patch is bigger than Hugh's due
>>        to extra comments to make the flow clear.
>>
>>   mm/shmem.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------
>>   1 file changed, 83 insertions(+), 37 deletions(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 220be9f..1ae0c7f 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   	long nr_swaps_freed = 0;
>>   	pgoff_t index;
>>   	int i;
>> +	bool split = false;
>> +	struct page *page = NULL;
>>   
>>   	if (lend == -1)
>>   		end = -1;	/* unsigned, so actually very big */
>>   
>>   	pagevec_init(&pvec);
>>   	index = start;
>> +retry:
>>   	while (index < end) {
>>   		pvec.nr = find_get_entries(mapping, index,
>>   			min(end - index, (pgoff_t)PAGEVEC_SIZE),
>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   		if (!pvec.nr)
>>   			break;
>>   		for (i = 0; i < pagevec_count(&pvec); i++) {
>> -			struct page *page = pvec.pages[i];
>> +			split = false;
>> +			page = pvec.pages[i];
>>   
>>   			index = indices[i];
>>   			if (index >= end)
>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			if (!trylock_page(page))
>>   				continue;
>>   
>> -			if (PageTransTail(page)) {
>> -				/* Middle of THP: zero out the page */
>> -				clear_highpage(page);
>> -				unlock_page(page);
>> -				continue;
>> -			} else if (PageTransHuge(page)) {
>> -				if (index == round_down(end, HPAGE_PMD_NR)) {
>> +			if (PageTransCompound(page) && !unfalloc) {
>> +				if (PageHead(page) &&
>> +				    index != round_down(end, HPAGE_PMD_NR)) {
>>   					/*
>> -					 * Range ends in the middle of THP:
>> -					 * zero out the page
>> +					 * Fall through when punching whole
>> +					 * THP.
>>   					 */
>> -					clear_highpage(page);
>> -					unlock_page(page);
>> -					continue;
>> +					index += HPAGE_PMD_NR - 1;
>> +					i += HPAGE_PMD_NR - 1;
>> +				} else {
>> +					/*
>> +					 * Split THP for any partial hole
>> +					 * punch.
>> +					 */
>> +					get_page(page);
>> +					split = true;
>> +					goto split;
>>   				}
>> -				index += HPAGE_PMD_NR - 1;
>> -				i += HPAGE_PMD_NR - 1;
>>   			}
>>   
>>   			if (!unfalloc || !PageUptodate(page)) {
>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			}
>>   			unlock_page(page);
>>   		}
>> +split:
>>   		pagevec_remove_exceptionals(&pvec);
>>   		pagevec_release(&pvec);
>>   		cond_resched();
>> +
>> +		if (split) {
>> +			/*
>> +			 * The pagevec_release() released all extra pins
>> +			 * from pagevec lookup.  And we hold an extra pin
>> +			 * and still have the page locked under us.
>> +			 */
>> +			if (!split_huge_page(page)) {
>> +				unlock_page(page);
>> +				put_page(page);
>> +				/* Re-lookup page cache from current index */
>> +				goto retry;
>> +			}
>> +
>> +			/* Fail to split THP, move to next index */
>> +			unlock_page(page);
>> +			put_page(page);
>> +		}
>> +
>>   		index++;
>>   	}
>>   
>> @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   		return;
>>   
>>   	index = start;
>> +again:
>>   	while (index < end) {
>>   		cond_resched();
>>   
>> @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			continue;
>>   		}
>>   		for (i = 0; i < pagevec_count(&pvec); i++) {
>> -			struct page *page = pvec.pages[i];
>> +			split = false;
>> +			page = pvec.pages[i];
>>   
>>   			index = indices[i];
>>   			if (index >= end)
>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   
>>   			lock_page(page);
>>   
>> -			if (PageTransTail(page)) {
>> -				/* Middle of THP: zero out the page */
>> -				clear_highpage(page);
>> -				unlock_page(page);
>> -				/*
>> -				 * Partial thp truncate due 'start' in middle
>> -				 * of THP: don't need to look on these pages
>> -				 * again on !pvec.nr restart.
>> -				 */
>> -				if (index != round_down(end, HPAGE_PMD_NR))
>> -					start++;
>> -				continue;
>> -			} else if (PageTransHuge(page)) {
>> -				if (index == round_down(end, HPAGE_PMD_NR)) {
>> +			if (PageTransCompound(page) && !unfalloc) {
>> +				if (PageHead(page) &&
>> +				    index != round_down(end, HPAGE_PMD_NR)) {
>>   					/*
>> -					 * Range ends in the middle of THP:
>> -					 * zero out the page
>> +					 * Fall through when punching whole
>> +					 * THP.
>>   					 */
>> -					clear_highpage(page);
>> -					unlock_page(page);
>> -					continue;
>> +					index += HPAGE_PMD_NR - 1;
>> +					i += HPAGE_PMD_NR - 1;
>> +				} else {
>> +					/*
>> +					 * Split THP for any partial hole
>> +					 * punch.
>> +					 */
>> +					get_page(page);
>> +					split = true;
>> +					goto rescan_split;
>>   				}
>> -				index += HPAGE_PMD_NR - 1;
>> -				i += HPAGE_PMD_NR - 1;
>>   			}
>>   
>>   			if (!unfalloc || !PageUptodate(page)) {
>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			}
>>   			unlock_page(page);
>>   		}
>> +rescan_split:
>>   		pagevec_remove_exceptionals(&pvec);
>>   		pagevec_release(&pvec);
>> +
>> +		if (split) {
>> +			/*
>> +			 * The pagevec_release() released all extra pins
>> +			 * from pagevec lookup.  And we hold an extra pin
>> +			 * and still have the page locked under us.
>> +			 */
>> +			if (!split_huge_page(page)) {
>> +				unlock_page(page);
>> +				put_page(page);
>> +				/* Re-lookup page cache from current index */
>> +				goto again;
>> +			}
>> +
>> +			/*
>> +			 * Split fail, clear the page then move to next
>> +			 * index.
>> +			 */
>> +			clear_highpage(page);
>> +
>> +			unlock_page(page);
>> +			put_page(page);
>> +		}
>> +
>>   		index++;
>>   	}
>>   
>> -- 
>> 1.8.3.1
>>
>>



  parent reply	other threads:[~2020-01-14 19:28 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04  0:42 [v2 PATCH] mm: shmem: allow split THP when truncating THP partially Yang Shi
2019-12-05  0:15 ` Hugh Dickins
2019-12-05  0:50   ` Yang Shi
2020-01-14 19:28   ` Yang Shi [this message]
2020-02-04 23:27     ` Yang Shi
2020-02-14  0:38       ` Yang Shi
2020-02-14 15:40         ` Kirill A. Shutemov
2020-02-14 17:17           ` Yang Shi
2020-02-25  3:46     ` Hugh Dickins
2020-02-25 18:02       ` David Hildenbrand
2020-02-25 20:31         ` Hugh Dickins
2020-02-26 17:43       ` Yang Shi
2020-02-27  1:16         ` Matthew Wilcox
2020-02-27  1:47           ` Hugh Dickins
2020-02-27  1:37         ` Hugh Dickins
2020-02-20 18:16 ` Alexander Duyck
2020-02-21  9:07   ` Michael S. Tsirkin
2020-02-21  9:36     ` David Hildenbrand
2020-02-22  0:39       ` Alexander Duyck
2020-02-24 10:22         ` David Hildenbrand
2020-02-25  0:13           ` Alexander Duyck
2020-02-25  8:09             ` David Hildenbrand
2020-02-25 16:42               ` Alexander Duyck
2020-02-21 18:24   ` Yang Shi
2020-02-22  0:24     ` Alexander Duyck
2020-02-26 17:31       ` Yang Shi
2020-02-26 17:45         ` David Hildenbrand
2020-02-26 18:00           ` Yang Shi
2020-02-27  0:56         ` Hugh Dickins
2020-02-27  1:14           ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00f0bb7d-3c25-a65f-ea94-3e2de8e9bcdd@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).