linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>, kirill.shutemov@linux.intel.com
Cc: aarcange@redhat.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially
Date: Thu, 13 Feb 2020 16:38:01 -0800	[thread overview]
Message-ID: <cd21f6a6-32a5-7d31-3bcd-4fc3f6cc0a84@linux.alibaba.com> (raw)
In-Reply-To: <33768a7e-837d-3bcd-fb98-19727921d6fd@linux.alibaba.com>

Hi Kirill,


Would you please help review this patch? I don't know why Hugh didn't 
response though I pinged him twice.


Thanks,

Yang



On 2/4/20 3:27 PM, Yang Shi wrote:
>
>
> On 1/14/20 11:28 AM, Yang Shi wrote:
>>
>>
>> On 12/4/19 4:15 PM, Hugh Dickins wrote:
>>> On Wed, 4 Dec 2019, Yang Shi wrote:
>>>
>>>> Currently when truncating shmem file, if the range is partial of THP
>>>> (start or end is in the middle of THP), the pages actually will 
>>>> just get
>>>> cleared rather than being freed unless the range cover the whole THP.
>>>> Even though all the subpages are truncated (randomly or sequentially),
>>>> the THP may still be kept in page cache.  This might be fine for some
>>>> usecases which prefer preserving THP.
>>>>
>>>> But, when doing balloon inflation in QEMU, QEMU actually does hole 
>>>> punch
>>>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not 
>>>> used.
>>>> So, when using shmem THP as memory backend QEMU inflation actually 
>>>> doesn't
>>>> work as expected since it doesn't free memory.  But, the inflation
>>>> usecase really needs get the memory freed.  Anonymous THP will not get
>>>> freed right away too but it will be freed eventually when all 
>>>> subpages are
>>>> unmapped, but shmem THP would still stay in page cache.
>>>>
>>>> Split THP right away when doing partial hole punch, and if split fails
>>>> just clear the page so that read to the hole punched area would return
>>>> zero.
>>>>
>>>> Cc: Hugh Dickins <hughd@google.com>
>>>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>>>> ---
>>>> v2: * Adopted the comment from Kirill.
>>>>      * Dropped fallocate mode flag, THP split is the default behavior.
>>>>      * Blended Huge's implementation with my v1 patch. TBH I'm not 
>>>> very keen to
>>>>        Hugh's find_get_entries() hack (basically neutral), but 
>>>> without that hack
>>> Thanks for giving it a try.  I'm not neutral about my 
>>> find_get_entries()
>>> hack: it surely had to go (without it, I'd have just pushed my own 
>>> patch).
>>> I've not noticed anything wrong with your patch, and it's in the right
>>> direction, but I'm still not thrilled with it.  I also remember that I
>>> got the looping wrong in my first internal attempt (fixed in what I 
>>> sent),
>>> and need to be very sure of the try-again-versus-move-on-to-next 
>>> conditions
>>> before agreeing to anything.  No rush, I'll come back to this in 
>>> days or
>>> month ahead: I'll try to find a less gotoey blend of yours and mine.
>>
>> Hi Hugh,
>>
>> Any update on this one?
>>
>> Thanks,
>> Yang
>
> Hi Hugh,
>
> Ping. Any comment on this? I really hope it can make v5.7.
>
> Thanks,
> Yang
>
>>
>>>
>>> Hugh
>>>
>>>>        we have to rely on pagevec_release() to release extra pins 
>>>> and play with
>>>>        goto. This version does in this way. The patch is bigger 
>>>> than Hugh's due
>>>>        to extra comments to make the flow clear.
>>>>
>>>>   mm/shmem.c | 120 
>>>> ++++++++++++++++++++++++++++++++++++++++++-------------------
>>>>   1 file changed, 83 insertions(+), 37 deletions(-)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 220be9f..1ae0c7f 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>       long nr_swaps_freed = 0;
>>>>       pgoff_t index;
>>>>       int i;
>>>> +    bool split = false;
>>>> +    struct page *page = NULL;
>>>>         if (lend == -1)
>>>>           end = -1;    /* unsigned, so actually very big */
>>>>         pagevec_init(&pvec);
>>>>       index = start;
>>>> +retry:
>>>>       while (index < end) {
>>>>           pvec.nr = find_get_entries(mapping, index,
>>>>               min(end - index, (pgoff_t)PAGEVEC_SIZE),
>>>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>           if (!pvec.nr)
>>>>               break;
>>>>           for (i = 0; i < pagevec_count(&pvec); i++) {
>>>> -            struct page *page = pvec.pages[i];
>>>> +            split = false;
>>>> +            page = pvec.pages[i];
>>>>                 index = indices[i];
>>>>               if (index >= end)
>>>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>               if (!trylock_page(page))
>>>>                   continue;
>>>>   -            if (PageTransTail(page)) {
>>>> -                /* Middle of THP: zero out the page */
>>>> -                clear_highpage(page);
>>>> -                unlock_page(page);
>>>> -                continue;
>>>> -            } else if (PageTransHuge(page)) {
>>>> -                if (index == round_down(end, HPAGE_PMD_NR)) {
>>>> +            if (PageTransCompound(page) && !unfalloc) {
>>>> +                if (PageHead(page) &&
>>>> +                    index != round_down(end, HPAGE_PMD_NR)) {
>>>>                       /*
>>>> -                     * Range ends in the middle of THP:
>>>> -                     * zero out the page
>>>> +                     * Fall through when punching whole
>>>> +                     * THP.
>>>>                        */
>>>> -                    clear_highpage(page);
>>>> -                    unlock_page(page);
>>>> -                    continue;
>>>> +                    index += HPAGE_PMD_NR - 1;
>>>> +                    i += HPAGE_PMD_NR - 1;
>>>> +                } else {
>>>> +                    /*
>>>> +                     * Split THP for any partial hole
>>>> +                     * punch.
>>>> +                     */
>>>> +                    get_page(page);
>>>> +                    split = true;
>>>> +                    goto split;
>>>>                   }
>>>> -                index += HPAGE_PMD_NR - 1;
>>>> -                i += HPAGE_PMD_NR - 1;
>>>>               }
>>>>                 if (!unfalloc || !PageUptodate(page)) {
>>>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>               }
>>>>               unlock_page(page);
>>>>           }
>>>> +split:
>>>>           pagevec_remove_exceptionals(&pvec);
>>>>           pagevec_release(&pvec);
>>>>           cond_resched();
>>>> +
>>>> +        if (split) {
>>>> +            /*
>>>> +             * The pagevec_release() released all extra pins
>>>> +             * from pagevec lookup.  And we hold an extra pin
>>>> +             * and still have the page locked under us.
>>>> +             */
>>>> +            if (!split_huge_page(page)) {
>>>> +                unlock_page(page);
>>>> +                put_page(page);
>>>> +                /* Re-lookup page cache from current index */
>>>> +                goto retry;
>>>> +            }
>>>> +
>>>> +            /* Fail to split THP, move to next index */
>>>> +            unlock_page(page);
>>>> +            put_page(page);
>>>> +        }
>>>> +
>>>>           index++;
>>>>       }
>>>>   @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>           return;
>>>>         index = start;
>>>> +again:
>>>>       while (index < end) {
>>>>           cond_resched();
>>>>   @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>               continue;
>>>>           }
>>>>           for (i = 0; i < pagevec_count(&pvec); i++) {
>>>> -            struct page *page = pvec.pages[i];
>>>> +            split = false;
>>>> +            page = pvec.pages[i];
>>>>                 index = indices[i];
>>>>               if (index >= end)
>>>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>                 lock_page(page);
>>>>   -            if (PageTransTail(page)) {
>>>> -                /* Middle of THP: zero out the page */
>>>> -                clear_highpage(page);
>>>> -                unlock_page(page);
>>>> -                /*
>>>> -                 * Partial thp truncate due 'start' in middle
>>>> -                 * of THP: don't need to look on these pages
>>>> -                 * again on !pvec.nr restart.
>>>> -                 */
>>>> -                if (index != round_down(end, HPAGE_PMD_NR))
>>>> -                    start++;
>>>> -                continue;
>>>> -            } else if (PageTransHuge(page)) {
>>>> -                if (index == round_down(end, HPAGE_PMD_NR)) {
>>>> +            if (PageTransCompound(page) && !unfalloc) {
>>>> +                if (PageHead(page) &&
>>>> +                    index != round_down(end, HPAGE_PMD_NR)) {
>>>>                       /*
>>>> -                     * Range ends in the middle of THP:
>>>> -                     * zero out the page
>>>> +                     * Fall through when punching whole
>>>> +                     * THP.
>>>>                        */
>>>> -                    clear_highpage(page);
>>>> -                    unlock_page(page);
>>>> -                    continue;
>>>> +                    index += HPAGE_PMD_NR - 1;
>>>> +                    i += HPAGE_PMD_NR - 1;
>>>> +                } else {
>>>> +                    /*
>>>> +                     * Split THP for any partial hole
>>>> +                     * punch.
>>>> +                     */
>>>> +                    get_page(page);
>>>> +                    split = true;
>>>> +                    goto rescan_split;
>>>>                   }
>>>> -                index += HPAGE_PMD_NR - 1;
>>>> -                i += HPAGE_PMD_NR - 1;
>>>>               }
>>>>                 if (!unfalloc || !PageUptodate(page)) {
>>>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>               }
>>>>               unlock_page(page);
>>>>           }
>>>> +rescan_split:
>>>>           pagevec_remove_exceptionals(&pvec);
>>>>           pagevec_release(&pvec);
>>>> +
>>>> +        if (split) {
>>>> +            /*
>>>> +             * The pagevec_release() released all extra pins
>>>> +             * from pagevec lookup.  And we hold an extra pin
>>>> +             * and still have the page locked under us.
>>>> +             */
>>>> +            if (!split_huge_page(page)) {
>>>> +                unlock_page(page);
>>>> +                put_page(page);
>>>> +                /* Re-lookup page cache from current index */
>>>> +                goto again;
>>>> +            }
>>>> +
>>>> +            /*
>>>> +             * Split fail, clear the page then move to next
>>>> +             * index.
>>>> +             */
>>>> +            clear_highpage(page);
>>>> +
>>>> +            unlock_page(page);
>>>> +            put_page(page);
>>>> +        }
>>>> +
>>>>           index++;
>>>>       }
>>>>   --
>>>> 1.8.3.1
>>>>
>>>>
>>
>


  reply	other threads:[~2020-02-14  0:38 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04  0:42 [v2 PATCH] mm: shmem: allow split THP when truncating THP partially Yang Shi
2019-12-05  0:15 ` Hugh Dickins
2019-12-05  0:50   ` Yang Shi
2020-01-14 19:28   ` Yang Shi
2020-02-04 23:27     ` Yang Shi
2020-02-14  0:38       ` Yang Shi [this message]
2020-02-14 15:40         ` Kirill A. Shutemov
2020-02-14 17:17           ` Yang Shi
2020-02-25  3:46     ` Hugh Dickins
2020-02-25 18:02       ` David Hildenbrand
2020-02-25 20:31         ` Hugh Dickins
2020-02-26 17:43       ` Yang Shi
2020-02-27  1:16         ` Matthew Wilcox
2020-02-27  1:47           ` Hugh Dickins
2020-02-27  1:37         ` Hugh Dickins
2020-02-20 18:16 ` Alexander Duyck
2020-02-21  9:07   ` Michael S. Tsirkin
2020-02-21  9:36     ` David Hildenbrand
2020-02-22  0:39       ` Alexander Duyck
2020-02-24 10:22         ` David Hildenbrand
2020-02-25  0:13           ` Alexander Duyck
2020-02-25  8:09             ` David Hildenbrand
2020-02-25 16:42               ` Alexander Duyck
2020-02-21 18:24   ` Yang Shi
2020-02-22  0:24     ` Alexander Duyck
2020-02-26 17:31       ` Yang Shi
2020-02-26 17:45         ` David Hildenbrand
2020-02-26 18:00           ` Yang Shi
2020-02-27  0:56         ` Hugh Dickins
2020-02-27  1:14           ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd21f6a6-32a5-7d31-3bcd-4fc3f6cc0a84@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).