All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Mike Kravetz <mike.kravetz@oracle.com>, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 2/2] mm/hugetlb: refactor subpage recording
Date: Tue, 26 Jan 2021 19:21:37 +0000	[thread overview]
Message-ID: <1ae0313d-de9b-4553-1f68-04c4f5a3f7eb@oracle.com> (raw)
In-Reply-To: <3d34111f-8365-ab95-af11-aaf4825204be@oracle.com>

On 1/26/21 6:08 PM, Mike Kravetz wrote:
> On 1/25/21 12:57 PM, Joao Martins wrote:
>> For a given hugepage backing a VA, there's a rather ineficient
>> loop which is solely responsible for storing subpages in the passed
>> pages/vmas array. For each subpage we check whether it's within
>> range or size of @pages and keep incrementing @pfn_offset and a couple
>> other variables per subpage iteration.
>>
>> Simplify this logic and minimize ops per iteration to just
>> store the output page/vma. Instead of incrementing number of @refs
>> iteratively, we do it through a precalculation of @refs and having
>> only a tight loop for storing pinned subpages/vmas.
>>
>> pinning consequently improves considerably, bringing us close to
>> {pin,get}_user_pages_fast:
>>
>>  - 16G with 1G huge page size
>>  gup_test -f /mnt/huge/file -m 16384 -r 10 -L -S -n 512 -w
>>
>>  PIN_LONGTERM_BENCHMARK: ~11k us -> ~4400 us
>>  PIN_FAST_BENCHMARK: ~3700 us
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  mm/hugetlb.c | 49 ++++++++++++++++++++++++++++---------------------
>>  1 file changed, 28 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 016addc8e413..1f7a95bc7c87 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4789,6 +4789,20 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>>  	goto out;
>>  }
>>  
>> +static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
>> +				 int refs, struct page **pages,
>> +				 struct vm_area_struct **vmas)
>> +{
>> +	int nr;
>> +
>> +	for (nr = 0; nr < refs; nr++) {
>> +		if (likely(pages))
>> +			pages[nr] = page++;
>> +		if (vmas)
>> +			vmas[nr] = vma;
>> +	}
>> +}
>> +
>>  long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>>  			 struct page **pages, struct vm_area_struct **vmas,
>>  			 unsigned long *position, unsigned long *nr_pages,
>> @@ -4918,28 +4932,16 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>>  			continue;
>>  		}
>>  
>> -		refs = 0;
>> +		refs = min3(pages_per_huge_page(h) - pfn_offset,
>> +			    (vma->vm_end - vaddr) >> PAGE_SHIFT, remainder);
>>  
>> -same_page:
>> -		if (pages)
>> -			pages[i] = mem_map_offset(page, pfn_offset);
>> +		if (pages || vmas)
>> +			record_subpages_vmas(mem_map_offset(page, pfn_offset),
> 
> The assumption made here is that mem_map is contiguous for the range of
> pages in the hugetlb page.  I do not believe you can make this assumption
> for (gigantic) hugetlb pages which are > MAX_ORDER_NR_PAGES.  For example,
> 

That would mean get_user_pages_fast() and put_user_pages_fast() are broken for anything
handling PUDs or above? See record_subpages() in gup_huge_pud() or even gup_huge_pgd().
It's using the same page++.

This adjustment below probably is what you're trying to suggest.

Also, nth_page() is slightly more expensive and so the numbers above change from ~4.4k
usecs to ~7.8k usecs.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1f7a95bc7c87..cf66f8c2f92a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4789,15 +4789,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
        goto out;
 }

-static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
+static void record_subpages_vmas(struct page *page, unsigned long pfn_offset,
+                                struct vm_area_struct *vma,
                                 int refs, struct page **pages,
                                 struct vm_area_struct **vmas)
 {
-       int nr;
+       unsigned long nr;

        for (nr = 0; nr < refs; nr++) {
                if (likely(pages))
-                       pages[nr] = page++;
+                       pages[nr] = mem_map_offset(page, pfn_offset + nr);
                if (vmas)
                        vmas[nr] = vma;
        }
@@ -4936,8 +4937,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct
*vma,
                            (vma->vm_end - vaddr) >> PAGE_SHIFT, remainder);

                if (pages || vmas)
-                       record_subpages_vmas(mem_map_offset(page, pfn_offset),
-                                            vma, refs,
+                       record_subpages_vmas(page, pfn_offset, vma, refs,
                                             likely(pages) ? pages + i : NULL,
                                             vmas ? vmas + i : NULL);

  reply	other threads:[~2021-01-27  6:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25 20:57 [PATCH 0/2] mm/hugetlb: follow_hugetlb_page() improvements Joao Martins
2021-01-25 20:57 ` [PATCH 1/2] mm/hugetlb: grab head page refcount once per group of subpages Joao Martins
2021-01-26  2:14   ` Mike Kravetz
2021-01-25 20:57 ` [PATCH 2/2] mm/hugetlb: refactor subpage recording Joao Martins
2021-01-26 18:08   ` Mike Kravetz
2021-01-26 19:21     ` Joao Martins [this message]
2021-01-26 19:24       ` Joao Martins
2021-01-26 21:21       ` Mike Kravetz
2021-01-26 23:20         ` Joao Martins
2021-01-27  0:07         ` Jason Gunthorpe
2021-01-27  1:58           ` Mike Kravetz
2021-01-27  2:10             ` Jason Gunthorpe
2021-02-13 21:12               ` Mike Kravetz
2021-01-27  2:24           ` Matthew Wilcox
2021-01-27  2:50             ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ae0313d-de9b-4553-1f68-04c4f5a3f7eb@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.