From: Yang Shi <yang.shi@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: kirill.shutemov@linux.intel.com, aarcange@redhat.com,
akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially
Date: Wed, 4 Dec 2019 16:50:49 -0800 [thread overview]
Message-ID: <c0f134f1-caf8-0baa-5f0e-87f2c530c631@linux.alibaba.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1912041601270.12930@eggly.anvils>
On 12/4/19 4:15 PM, Hugh Dickins wrote:
> On Wed, 4 Dec 2019, Yang Shi wrote:
>
>> Currently when truncating shmem file, if the range is partial of THP
>> (start or end is in the middle of THP), the pages actually will just get
>> cleared rather than being freed unless the range cover the whole THP.
>> Even though all the subpages are truncated (randomly or sequentially),
>> the THP may still be kept in page cache. This might be fine for some
>> usecases which prefer preserving THP.
>>
>> But, when doing balloon inflation in QEMU, QEMU actually does hole punch
>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
>> So, when using shmem THP as memory backend QEMU inflation actually doesn't
>> work as expected since it doesn't free memory. But, the inflation
>> usecase really needs get the memory freed. Anonymous THP will not get
>> freed right away too but it will be freed eventually when all subpages are
>> unmapped, but shmem THP would still stay in page cache.
>>
>> Split THP right away when doing partial hole punch, and if split fails
>> just clear the page so that read to the hole punched area would return
>> zero.
>>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>> v2: * Adopted the comment from Kirill.
>> * Dropped fallocate mode flag, THP split is the default behavior.
>> * Blended Huge's implementation with my v1 patch. TBH I'm not very keen to
>> Hugh's find_get_entries() hack (basically neutral), but without that hack
> Thanks for giving it a try. I'm not neutral about my find_get_entries()
> hack: it surely had to go (without it, I'd have just pushed my own patch).
We are on the same page :-)
> I've not noticed anything wrong with your patch, and it's in the right
> direction, but I'm still not thrilled with it. I also remember that I
> got the looping wrong in my first internal attempt (fixed in what I sent),
> and need to be very sure of the try-again-versus-move-on-to-next conditions
> before agreeing to anything. No rush, I'll come back to this in days or
> month ahead: I'll try to find a less gotoey blend of yours and mine.
Yes, those goto look a little bit convoluted so I added a lot comments
to improve the readability. Thanks for your time.
>
> Hugh
>
>> we have to rely on pagevec_release() to release extra pins and play with
>> goto. This version does in this way. The patch is bigger than Hugh's due
>> to extra comments to make the flow clear.
>>
>> mm/shmem.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------
>> 1 file changed, 83 insertions(+), 37 deletions(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 220be9f..1ae0c7f 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> long nr_swaps_freed = 0;
>> pgoff_t index;
>> int i;
>> + bool split = false;
>> + struct page *page = NULL;
>>
>> if (lend == -1)
>> end = -1; /* unsigned, so actually very big */
>>
>> pagevec_init(&pvec);
>> index = start;
>> +retry:
>> while (index < end) {
>> pvec.nr = find_get_entries(mapping, index,
>> min(end - index, (pgoff_t)PAGEVEC_SIZE),
>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> if (!pvec.nr)
>> break;
>> for (i = 0; i < pagevec_count(&pvec); i++) {
>> - struct page *page = pvec.pages[i];
>> + split = false;
>> + page = pvec.pages[i];
>>
>> index = indices[i];
>> if (index >= end)
>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> if (!trylock_page(page))
>> continue;
>>
>> - if (PageTransTail(page)) {
>> - /* Middle of THP: zero out the page */
>> - clear_highpage(page);
>> - unlock_page(page);
>> - continue;
>> - } else if (PageTransHuge(page)) {
>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>> + if (PageTransCompound(page) && !unfalloc) {
>> + if (PageHead(page) &&
>> + index != round_down(end, HPAGE_PMD_NR)) {
>> /*
>> - * Range ends in the middle of THP:
>> - * zero out the page
>> + * Fall through when punching whole
>> + * THP.
>> */
>> - clear_highpage(page);
>> - unlock_page(page);
>> - continue;
>> + index += HPAGE_PMD_NR - 1;
>> + i += HPAGE_PMD_NR - 1;
>> + } else {
>> + /*
>> + * Split THP for any partial hole
>> + * punch.
>> + */
>> + get_page(page);
>> + split = true;
>> + goto split;
>> }
>> - index += HPAGE_PMD_NR - 1;
>> - i += HPAGE_PMD_NR - 1;
>> }
>>
>> if (!unfalloc || !PageUptodate(page)) {
>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> }
>> unlock_page(page);
>> }
>> +split:
>> pagevec_remove_exceptionals(&pvec);
>> pagevec_release(&pvec);
>> cond_resched();
>> +
>> + if (split) {
>> + /*
>> + * The pagevec_release() released all extra pins
>> + * from pagevec lookup. And we hold an extra pin
>> + * and still have the page locked under us.
>> + */
>> + if (!split_huge_page(page)) {
>> + unlock_page(page);
>> + put_page(page);
>> + /* Re-lookup page cache from current index */
>> + goto retry;
>> + }
>> +
>> + /* Fail to split THP, move to next index */
>> + unlock_page(page);
>> + put_page(page);
>> + }
>> +
>> index++;
>> }
>>
>> @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> return;
>>
>> index = start;
>> +again:
>> while (index < end) {
>> cond_resched();
>>
>> @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> continue;
>> }
>> for (i = 0; i < pagevec_count(&pvec); i++) {
>> - struct page *page = pvec.pages[i];
>> + split = false;
>> + page = pvec.pages[i];
>>
>> index = indices[i];
>> if (index >= end)
>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>
>> lock_page(page);
>>
>> - if (PageTransTail(page)) {
>> - /* Middle of THP: zero out the page */
>> - clear_highpage(page);
>> - unlock_page(page);
>> - /*
>> - * Partial thp truncate due 'start' in middle
>> - * of THP: don't need to look on these pages
>> - * again on !pvec.nr restart.
>> - */
>> - if (index != round_down(end, HPAGE_PMD_NR))
>> - start++;
>> - continue;
>> - } else if (PageTransHuge(page)) {
>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>> + if (PageTransCompound(page) && !unfalloc) {
>> + if (PageHead(page) &&
>> + index != round_down(end, HPAGE_PMD_NR)) {
>> /*
>> - * Range ends in the middle of THP:
>> - * zero out the page
>> + * Fall through when punching whole
>> + * THP.
>> */
>> - clear_highpage(page);
>> - unlock_page(page);
>> - continue;
>> + index += HPAGE_PMD_NR - 1;
>> + i += HPAGE_PMD_NR - 1;
>> + } else {
>> + /*
>> + * Split THP for any partial hole
>> + * punch.
>> + */
>> + get_page(page);
>> + split = true;
>> + goto rescan_split;
>> }
>> - index += HPAGE_PMD_NR - 1;
>> - i += HPAGE_PMD_NR - 1;
>> }
>>
>> if (!unfalloc || !PageUptodate(page)) {
>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> }
>> unlock_page(page);
>> }
>> +rescan_split:
>> pagevec_remove_exceptionals(&pvec);
>> pagevec_release(&pvec);
>> +
>> + if (split) {
>> + /*
>> + * The pagevec_release() released all extra pins
>> + * from pagevec lookup. And we hold an extra pin
>> + * and still have the page locked under us.
>> + */
>> + if (!split_huge_page(page)) {
>> + unlock_page(page);
>> + put_page(page);
>> + /* Re-lookup page cache from current index */
>> + goto again;
>> + }
>> +
>> + /*
>> + * Split fail, clear the page then move to next
>> + * index.
>> + */
>> + clear_highpage(page);
>> +
>> + unlock_page(page);
>> + put_page(page);
>> + }
>> +
>> index++;
>> }
>>
>> --
>> 1.8.3.1
>>
>>
next prev parent reply other threads:[~2019-12-05 0:50 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-04 0:42 [v2 PATCH] mm: shmem: allow split THP when truncating THP partially Yang Shi
2019-12-05 0:15 ` Hugh Dickins
2019-12-05 0:50 ` Yang Shi [this message]
2020-01-14 19:28 ` Yang Shi
2020-02-04 23:27 ` Yang Shi
2020-02-14 0:38 ` Yang Shi
2020-02-14 15:40 ` Kirill A. Shutemov
2020-02-14 17:17 ` Yang Shi
2020-02-25 3:46 ` Hugh Dickins
2020-02-25 18:02 ` David Hildenbrand
2020-02-25 20:31 ` Hugh Dickins
2020-02-26 17:43 ` Yang Shi
2020-02-27 1:16 ` Matthew Wilcox
2020-02-27 1:47 ` Hugh Dickins
2020-02-27 1:37 ` Hugh Dickins
2020-02-20 18:16 ` Alexander Duyck
2020-02-21 9:07 ` Michael S. Tsirkin
2020-02-21 9:36 ` David Hildenbrand
2020-02-22 0:39 ` Alexander Duyck
2020-02-24 10:22 ` David Hildenbrand
2020-02-25 0:13 ` Alexander Duyck
2020-02-25 8:09 ` David Hildenbrand
2020-02-25 16:42 ` Alexander Duyck
2020-02-21 18:24 ` Yang Shi
2020-02-22 0:24 ` Alexander Duyck
2020-02-26 17:31 ` Yang Shi
2020-02-26 17:45 ` David Hildenbrand
2020-02-26 18:00 ` Yang Shi
2020-02-27 0:56 ` Hugh Dickins
2020-02-27 1:14 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c0f134f1-caf8-0baa-5f0e-87f2c530c631@linux.alibaba.com \
--to=yang.shi@linux.alibaba.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).