linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: kirill.shutemov@linux.intel.com, aarcange@redhat.com,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially
Date: Tue, 4 Feb 2020 15:27:25 -0800	[thread overview]
Message-ID: <33768a7e-837d-3bcd-fb98-19727921d6fd@linux.alibaba.com> (raw)
In-Reply-To: <00f0bb7d-3c25-a65f-ea94-3e2de8e9bcdd@linux.alibaba.com>



On 1/14/20 11:28 AM, Yang Shi wrote:
>
>
> On 12/4/19 4:15 PM, Hugh Dickins wrote:
>> On Wed, 4 Dec 2019, Yang Shi wrote:
>>
>>> Currently when truncating shmem file, if the range is partial of THP
>>> (start or end is in the middle of THP), the pages actually will just 
>>> get
>>> cleared rather than being freed unless the range cover the whole THP.
>>> Even though all the subpages are truncated (randomly or sequentially),
>>> the THP may still be kept in page cache.  This might be fine for some
>>> usecases which prefer preserving THP.
>>>
>>> But, when doing balloon inflation in QEMU, QEMU actually does hole 
>>> punch
>>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not 
>>> used.
>>> So, when using shmem THP as memory backend QEMU inflation actually 
>>> doesn't
>>> work as expected since it doesn't free memory.  But, the inflation
>>> usecase really needs get the memory freed.  Anonymous THP will not get
>>> freed right away too but it will be freed eventually when all 
>>> subpages are
>>> unmapped, but shmem THP would still stay in page cache.
>>>
>>> Split THP right away when doing partial hole punch, and if split fails
>>> just clear the page so that read to the hole punched area would return
>>> zero.
>>>
>>> Cc: Hugh Dickins <hughd@google.com>
>>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>>> ---
>>> v2: * Adopted the comment from Kirill.
>>>      * Dropped fallocate mode flag, THP split is the default behavior.
>>>      * Blended Huge's implementation with my v1 patch. TBH I'm not 
>>> very keen to
>>>        Hugh's find_get_entries() hack (basically neutral), but 
>>> without that hack
>> Thanks for giving it a try.  I'm not neutral about my find_get_entries()
>> hack: it surely had to go (without it, I'd have just pushed my own 
>> patch).
>> I've not noticed anything wrong with your patch, and it's in the right
>> direction, but I'm still not thrilled with it.  I also remember that I
>> got the looping wrong in my first internal attempt (fixed in what I 
>> sent),
>> and need to be very sure of the try-again-versus-move-on-to-next 
>> conditions
>> before agreeing to anything.  No rush, I'll come back to this in days or
>> month ahead: I'll try to find a less gotoey blend of yours and mine.
>
> Hi Hugh,
>
> Any update on this one?
>
> Thanks,
> Yang

Hi Hugh,

Ping. Any comment on this? I really hope it can make v5.7.

Thanks,
Yang

>
>>
>> Hugh
>>
>>>        we have to rely on pagevec_release() to release extra pins 
>>> and play with
>>>        goto. This version does in this way. The patch is bigger than 
>>> Hugh's due
>>>        to extra comments to make the flow clear.
>>>
>>>   mm/shmem.c | 120 
>>> ++++++++++++++++++++++++++++++++++++++++++-------------------
>>>   1 file changed, 83 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 220be9f..1ae0c7f 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode 
>>> *inode, loff_t lstart, loff_t lend,
>>>       long nr_swaps_freed = 0;
>>>       pgoff_t index;
>>>       int i;
>>> +    bool split = false;
>>> +    struct page *page = NULL;
>>>         if (lend == -1)
>>>           end = -1;    /* unsigned, so actually very big */
>>>         pagevec_init(&pvec);
>>>       index = start;
>>> +retry:
>>>       while (index < end) {
>>>           pvec.nr = find_get_entries(mapping, index,
>>>               min(end - index, (pgoff_t)PAGEVEC_SIZE),
>>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode 
>>> *inode, loff_t lstart, loff_t lend,
>>>           if (!pvec.nr)
>>>               break;
>>>           for (i = 0; i < pagevec_count(&pvec); i++) {
>>> -            struct page *page = pvec.pages[i];
>>> +            split = false;
>>> +            page = pvec.pages[i];
>>>                 index = indices[i];
>>>               if (index >= end)
>>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode 
>>> *inode, loff_t lstart, loff_t lend,
>>>               if (!trylock_page(page))
>>>                   continue;
>>>   -            if (PageTransTail(page)) {
>>> -                /* Middle of THP: zero out the page */
>>> -                clear_highpage(page);
>>> -                unlock_page(page);
>>> -                continue;
>>> -            } else if (PageTransHuge(page)) {
>>> -                if (index == round_down(end, HPAGE_PMD_NR)) {
>>> +            if (PageTransCompound(page) && !unfalloc) {
>>> +                if (PageHead(page) &&
>>> +                    index != round_down(end, HPAGE_PMD_NR)) {
>>>                       /*
>>> -                     * Range ends in the middle of THP:
>>> -                     * zero out the page
>>> +                     * Fall through when punching whole
>>> +                     * THP.
>>>                        */
>>> -                    clear_highpage(page);
>>> -                    unlock_page(page);
>>> -                    continue;
>>> +                    index += HPAGE_PMD_NR - 1;
>>> +                    i += HPAGE_PMD_NR - 1;
>>> +                } else {
>>> +                    /*
>>> +                     * Split THP for any partial hole
>>> +                     * punch.
>>> +                     */
>>> +                    get_page(page);
>>> +                    split = true;
>>> +                    goto split;
>>>                   }
>>> -                index += HPAGE_PMD_NR - 1;
>>> -                i += HPAGE_PMD_NR - 1;
>>>               }
>>>                 if (!unfalloc || !PageUptodate(page)) {
>>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode 
>>> *inode, loff_t lstart, loff_t lend,
>>>               }
>>>               unlock_page(page);
>>>           }
>>> +split:
>>>           pagevec_remove_exceptionals(&pvec);
>>>           pagevec_release(&pvec);
>>>           cond_resched();
>>> +
>>> +        if (split) {
>>> +            /*
>>> +             * The pagevec_release() released all extra pins
>>> +             * from pagevec lookup.  And we hold an extra pin
>>> +             * and still have the page locked under us.
>>> +             */
>>> +            if (!split_huge_page(page)) {
>>> +                unlock_page(page);
>>> +                put_page(page);
>>> +                /* Re-lookup page cache from current index */
>>> +                goto retry;
>>> +            }
>>> +
>>> +            /* Fail to split THP, move to next index */
>>> +            unlock_page(page);
>>> +            put_page(page);
>>> +        }
>>> +
>>>           index++;
>>>       }
>>>   @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode 
>>> *inode, loff_t lstart, loff_t lend,
>>>           return;
>>>         index = start;
>>> +again:
>>>       while (index < end) {
>>>           cond_resched();
>>>   @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode 
>>> *inode, loff_t lstart, loff_t lend,
>>>               continue;
>>>           }
>>>           for (i = 0; i < pagevec_count(&pvec); i++) {
>>> -            struct page *page = pvec.pages[i];
>>> +            split = false;
>>> +            page = pvec.pages[i];
>>>                 index = indices[i];
>>>               if (index >= end)
>>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode 
>>> *inode, loff_t lstart, loff_t lend,
>>>                 lock_page(page);
>>>   -            if (PageTransTail(page)) {
>>> -                /* Middle of THP: zero out the page */
>>> -                clear_highpage(page);
>>> -                unlock_page(page);
>>> -                /*
>>> -                 * Partial thp truncate due 'start' in middle
>>> -                 * of THP: don't need to look on these pages
>>> -                 * again on !pvec.nr restart.
>>> -                 */
>>> -                if (index != round_down(end, HPAGE_PMD_NR))
>>> -                    start++;
>>> -                continue;
>>> -            } else if (PageTransHuge(page)) {
>>> -                if (index == round_down(end, HPAGE_PMD_NR)) {
>>> +            if (PageTransCompound(page) && !unfalloc) {
>>> +                if (PageHead(page) &&
>>> +                    index != round_down(end, HPAGE_PMD_NR)) {
>>>                       /*
>>> -                     * Range ends in the middle of THP:
>>> -                     * zero out the page
>>> +                     * Fall through when punching whole
>>> +                     * THP.
>>>                        */
>>> -                    clear_highpage(page);
>>> -                    unlock_page(page);
>>> -                    continue;
>>> +                    index += HPAGE_PMD_NR - 1;
>>> +                    i += HPAGE_PMD_NR - 1;
>>> +                } else {
>>> +                    /*
>>> +                     * Split THP for any partial hole
>>> +                     * punch.
>>> +                     */
>>> +                    get_page(page);
>>> +                    split = true;
>>> +                    goto rescan_split;
>>>                   }
>>> -                index += HPAGE_PMD_NR - 1;
>>> -                i += HPAGE_PMD_NR - 1;
>>>               }
>>>                 if (!unfalloc || !PageUptodate(page)) {
>>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode 
>>> *inode, loff_t lstart, loff_t lend,
>>>               }
>>>               unlock_page(page);
>>>           }
>>> +rescan_split:
>>>           pagevec_remove_exceptionals(&pvec);
>>>           pagevec_release(&pvec);
>>> +
>>> +        if (split) {
>>> +            /*
>>> +             * The pagevec_release() released all extra pins
>>> +             * from pagevec lookup.  And we hold an extra pin
>>> +             * and still have the page locked under us.
>>> +             */
>>> +            if (!split_huge_page(page)) {
>>> +                unlock_page(page);
>>> +                put_page(page);
>>> +                /* Re-lookup page cache from current index */
>>> +                goto again;
>>> +            }
>>> +
>>> +            /*
>>> +             * Split fail, clear the page then move to next
>>> +             * index.
>>> +             */
>>> +            clear_highpage(page);
>>> +
>>> +            unlock_page(page);
>>> +            put_page(page);
>>> +        }
>>> +
>>>           index++;
>>>       }
>>>   --
>>> 1.8.3.1
>>>
>>>
>



  reply	other threads:[~2020-02-04 23:27 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04  0:42 [v2 PATCH] mm: shmem: allow split THP when truncating THP partially Yang Shi
2019-12-05  0:15 ` Hugh Dickins
2019-12-05  0:50   ` Yang Shi
2020-01-14 19:28   ` Yang Shi
2020-02-04 23:27     ` Yang Shi [this message]
2020-02-14  0:38       ` Yang Shi
2020-02-14 15:40         ` Kirill A. Shutemov
2020-02-14 17:17           ` Yang Shi
2020-02-25  3:46     ` Hugh Dickins
2020-02-25 18:02       ` David Hildenbrand
2020-02-25 20:31         ` Hugh Dickins
2020-02-26 17:43       ` Yang Shi
2020-02-27  1:16         ` Matthew Wilcox
2020-02-27  1:47           ` Hugh Dickins
2020-02-27  1:37         ` Hugh Dickins
2020-02-20 18:16 ` Alexander Duyck
2020-02-21  9:07   ` Michael S. Tsirkin
2020-02-21  9:36     ` David Hildenbrand
2020-02-22  0:39       ` Alexander Duyck
2020-02-24 10:22         ` David Hildenbrand
2020-02-25  0:13           ` Alexander Duyck
2020-02-25  8:09             ` David Hildenbrand
2020-02-25 16:42               ` Alexander Duyck
2020-02-21 18:24   ` Yang Shi
2020-02-22  0:24     ` Alexander Duyck
2020-02-26 17:31       ` Yang Shi
2020-02-26 17:45         ` David Hildenbrand
2020-02-26 18:00           ` Yang Shi
2020-02-27  0:56         ` Hugh Dickins
2020-02-27  1:14           ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33768a7e-837d-3bcd-fb98-19727921d6fd@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).