From: Yang Shi <yang.shi@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: kirill.shutemov@linux.intel.com, aarcange@redhat.com,
akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially
Date: Tue, 4 Feb 2020 15:27:25 -0800 [thread overview]
Message-ID: <33768a7e-837d-3bcd-fb98-19727921d6fd@linux.alibaba.com> (raw)
In-Reply-To: <00f0bb7d-3c25-a65f-ea94-3e2de8e9bcdd@linux.alibaba.com>
On 1/14/20 11:28 AM, Yang Shi wrote:
>
>
> On 12/4/19 4:15 PM, Hugh Dickins wrote:
>> On Wed, 4 Dec 2019, Yang Shi wrote:
>>
>>> Currently when truncating shmem file, if the range is partial of THP
>>> (start or end is in the middle of THP), the pages actually will just
>>> get
>>> cleared rather than being freed unless the range cover the whole THP.
>>> Even though all the subpages are truncated (randomly or sequentially),
>>> the THP may still be kept in page cache. This might be fine for some
>>> usecases which prefer preserving THP.
>>>
>>> But, when doing balloon inflation in QEMU, QEMU actually does hole
>>> punch
>>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not
>>> used.
>>> So, when using shmem THP as memory backend QEMU inflation actually
>>> doesn't
>>> work as expected since it doesn't free memory. But, the inflation
>>> usecase really needs get the memory freed. Anonymous THP will not get
>>> freed right away too but it will be freed eventually when all
>>> subpages are
>>> unmapped, but shmem THP would still stay in page cache.
>>>
>>> Split THP right away when doing partial hole punch, and if split fails
>>> just clear the page so that read to the hole punched area would return
>>> zero.
>>>
>>> Cc: Hugh Dickins <hughd@google.com>
>>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>>> ---
>>> v2: * Adopted the comment from Kirill.
>>> * Dropped fallocate mode flag, THP split is the default behavior.
>>> * Blended Huge's implementation with my v1 patch. TBH I'm not
>>> very keen to
>>> Hugh's find_get_entries() hack (basically neutral), but
>>> without that hack
>> Thanks for giving it a try. I'm not neutral about my find_get_entries()
>> hack: it surely had to go (without it, I'd have just pushed my own
>> patch).
>> I've not noticed anything wrong with your patch, and it's in the right
>> direction, but I'm still not thrilled with it. I also remember that I
>> got the looping wrong in my first internal attempt (fixed in what I
>> sent),
>> and need to be very sure of the try-again-versus-move-on-to-next
>> conditions
>> before agreeing to anything. No rush, I'll come back to this in days or
>> month ahead: I'll try to find a less gotoey blend of yours and mine.
>
> Hi Hugh,
>
> Any update on this one?
>
> Thanks,
> Yang
Hi Hugh,
Ping. Any comment on this? I really hope it can make v5.7.
Thanks,
Yang
>
>>
>> Hugh
>>
>>> we have to rely on pagevec_release() to release extra pins
>>> and play with
>>> goto. This version does in this way. The patch is bigger than
>>> Hugh's due
>>> to extra comments to make the flow clear.
>>>
>>> mm/shmem.c | 120
>>> ++++++++++++++++++++++++++++++++++++++++++-------------------
>>> 1 file changed, 83 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 220be9f..1ae0c7f 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> long nr_swaps_freed = 0;
>>> pgoff_t index;
>>> int i;
>>> + bool split = false;
>>> + struct page *page = NULL;
>>> if (lend == -1)
>>> end = -1; /* unsigned, so actually very big */
>>> pagevec_init(&pvec);
>>> index = start;
>>> +retry:
>>> while (index < end) {
>>> pvec.nr = find_get_entries(mapping, index,
>>> min(end - index, (pgoff_t)PAGEVEC_SIZE),
>>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> if (!pvec.nr)
>>> break;
>>> for (i = 0; i < pagevec_count(&pvec); i++) {
>>> - struct page *page = pvec.pages[i];
>>> + split = false;
>>> + page = pvec.pages[i];
>>> index = indices[i];
>>> if (index >= end)
>>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> if (!trylock_page(page))
>>> continue;
>>> - if (PageTransTail(page)) {
>>> - /* Middle of THP: zero out the page */
>>> - clear_highpage(page);
>>> - unlock_page(page);
>>> - continue;
>>> - } else if (PageTransHuge(page)) {
>>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>>> + if (PageTransCompound(page) && !unfalloc) {
>>> + if (PageHead(page) &&
>>> + index != round_down(end, HPAGE_PMD_NR)) {
>>> /*
>>> - * Range ends in the middle of THP:
>>> - * zero out the page
>>> + * Fall through when punching whole
>>> + * THP.
>>> */
>>> - clear_highpage(page);
>>> - unlock_page(page);
>>> - continue;
>>> + index += HPAGE_PMD_NR - 1;
>>> + i += HPAGE_PMD_NR - 1;
>>> + } else {
>>> + /*
>>> + * Split THP for any partial hole
>>> + * punch.
>>> + */
>>> + get_page(page);
>>> + split = true;
>>> + goto split;
>>> }
>>> - index += HPAGE_PMD_NR - 1;
>>> - i += HPAGE_PMD_NR - 1;
>>> }
>>> if (!unfalloc || !PageUptodate(page)) {
>>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> }
>>> unlock_page(page);
>>> }
>>> +split:
>>> pagevec_remove_exceptionals(&pvec);
>>> pagevec_release(&pvec);
>>> cond_resched();
>>> +
>>> + if (split) {
>>> + /*
>>> + * The pagevec_release() released all extra pins
>>> + * from pagevec lookup. And we hold an extra pin
>>> + * and still have the page locked under us.
>>> + */
>>> + if (!split_huge_page(page)) {
>>> + unlock_page(page);
>>> + put_page(page);
>>> + /* Re-lookup page cache from current index */
>>> + goto retry;
>>> + }
>>> +
>>> + /* Fail to split THP, move to next index */
>>> + unlock_page(page);
>>> + put_page(page);
>>> + }
>>> +
>>> index++;
>>> }
>>> @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> return;
>>> index = start;
>>> +again:
>>> while (index < end) {
>>> cond_resched();
>>> @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> continue;
>>> }
>>> for (i = 0; i < pagevec_count(&pvec); i++) {
>>> - struct page *page = pvec.pages[i];
>>> + split = false;
>>> + page = pvec.pages[i];
>>> index = indices[i];
>>> if (index >= end)
>>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> lock_page(page);
>>> - if (PageTransTail(page)) {
>>> - /* Middle of THP: zero out the page */
>>> - clear_highpage(page);
>>> - unlock_page(page);
>>> - /*
>>> - * Partial thp truncate due 'start' in middle
>>> - * of THP: don't need to look on these pages
>>> - * again on !pvec.nr restart.
>>> - */
>>> - if (index != round_down(end, HPAGE_PMD_NR))
>>> - start++;
>>> - continue;
>>> - } else if (PageTransHuge(page)) {
>>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>>> + if (PageTransCompound(page) && !unfalloc) {
>>> + if (PageHead(page) &&
>>> + index != round_down(end, HPAGE_PMD_NR)) {
>>> /*
>>> - * Range ends in the middle of THP:
>>> - * zero out the page
>>> + * Fall through when punching whole
>>> + * THP.
>>> */
>>> - clear_highpage(page);
>>> - unlock_page(page);
>>> - continue;
>>> + index += HPAGE_PMD_NR - 1;
>>> + i += HPAGE_PMD_NR - 1;
>>> + } else {
>>> + /*
>>> + * Split THP for any partial hole
>>> + * punch.
>>> + */
>>> + get_page(page);
>>> + split = true;
>>> + goto rescan_split;
>>> }
>>> - index += HPAGE_PMD_NR - 1;
>>> - i += HPAGE_PMD_NR - 1;
>>> }
>>> if (!unfalloc || !PageUptodate(page)) {
>>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> }
>>> unlock_page(page);
>>> }
>>> +rescan_split:
>>> pagevec_remove_exceptionals(&pvec);
>>> pagevec_release(&pvec);
>>> +
>>> + if (split) {
>>> + /*
>>> + * The pagevec_release() released all extra pins
>>> + * from pagevec lookup. And we hold an extra pin
>>> + * and still have the page locked under us.
>>> + */
>>> + if (!split_huge_page(page)) {
>>> + unlock_page(page);
>>> + put_page(page);
>>> + /* Re-lookup page cache from current index */
>>> + goto again;
>>> + }
>>> +
>>> + /*
>>> + * Split fail, clear the page then move to next
>>> + * index.
>>> + */
>>> + clear_highpage(page);
>>> +
>>> + unlock_page(page);
>>> + put_page(page);
>>> + }
>>> +
>>> index++;
>>> }
>>> --
>>> 1.8.3.1
>>>
>>>
>
next prev parent reply other threads:[~2020-02-04 23:27 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-04 0:42 [v2 PATCH] mm: shmem: allow split THP when truncating THP partially Yang Shi
2019-12-05 0:15 ` Hugh Dickins
2019-12-05 0:50 ` Yang Shi
2020-01-14 19:28 ` Yang Shi
2020-02-04 23:27 ` Yang Shi [this message]
2020-02-14 0:38 ` Yang Shi
2020-02-14 15:40 ` Kirill A. Shutemov
2020-02-14 17:17 ` Yang Shi
2020-02-25 3:46 ` Hugh Dickins
2020-02-25 18:02 ` David Hildenbrand
2020-02-25 20:31 ` Hugh Dickins
2020-02-26 17:43 ` Yang Shi
2020-02-27 1:16 ` Matthew Wilcox
2020-02-27 1:47 ` Hugh Dickins
2020-02-27 1:37 ` Hugh Dickins
2020-02-20 18:16 ` Alexander Duyck
2020-02-21 9:07 ` Michael S. Tsirkin
2020-02-21 9:36 ` David Hildenbrand
2020-02-22 0:39 ` Alexander Duyck
2020-02-24 10:22 ` David Hildenbrand
2020-02-25 0:13 ` Alexander Duyck
2020-02-25 8:09 ` David Hildenbrand
2020-02-25 16:42 ` Alexander Duyck
2020-02-21 18:24 ` Yang Shi
2020-02-22 0:24 ` Alexander Duyck
2020-02-26 17:31 ` Yang Shi
2020-02-26 17:45 ` David Hildenbrand
2020-02-26 18:00 ` Yang Shi
2020-02-27 0:56 ` Hugh Dickins
2020-02-27 1:14 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=33768a7e-837d-3bcd-fb98-19727921d6fd@linux.alibaba.com \
--to=yang.shi@linux.alibaba.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).