linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Muchun Song <songmuchun@bytedance.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	viro@zeniv.linux.org.uk,
	Andrew Morton <akpm@linux-foundation.org>,
	paulmck@kernel.org, mchehab+huawei@kernel.org,
	pawan.kumar.gupta@linux.intel.com,
	Randy Dunlap <rdunlap@infradead.org>,
	oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
	Mina Almasry <almasrymina@google.com>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Oscar Salvador <osalvador@suse.de>,
	Michal Hocko <mhocko@suse.com>,
	Xiongchun duan <duanxiongchun@bytedance.com>,
	linux-doc@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [External] Re: [PATCH v3 00/21] Free some vmemmap pages of hugetlb page
Date: Wed, 11 Nov 2020 11:21:09 +0800	[thread overview]
Message-ID: <CAMZfGtVvBk6eHRRBcyKxQGx5HG7K0xD8LL7hC2f=bK1cizC2VA@mail.gmail.com> (raw)
In-Reply-To: <78b4cb8b-6511-d50e-7018-ea52c50e4b07@oracle.com>

On Wed, Nov 11, 2020 at 3:23 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>
> Thanks for continuing to work this Muchun!
>
> On 11/8/20 6:10 AM, Muchun Song wrote:
> ...
> > For tail pages, the value of compound_head is the same. So we can reuse
> > first page of tail page structs. We map the virtual addresses of the
> > remaining 6 pages of tail page structs to the first tail page struct,
> > and then free these 6 pages. Therefore, we need to reserve at least 2
> > pages as vmemmap areas.
> >
> > When a hugetlbpage is freed to the buddy system, we should allocate six
> > pages for vmemmap pages and restore the previous mapping relationship.
> >
> > If we uses the 1G hugetlbpage, we can save 4095 pages. This is a very
> > substantial gain.
>
> Is that 4095 number accurate?  Are we not using two pages of struct pages
> as in the 2MB case?

Oh, yeah, here should be 4094 and subtract page tables. For a 1GB
HugeTLB page, it should be 4086 pages. Thanks for pointing out
this problem.

>
> Also, because we are splitting the huge page mappings in the vmemmap
> additional PTE pages will need to be allocated.  Therefore, some additional
> page table pages may need to be allocated so that we can free the pages
> of struct pages.  The net savings may be less than what is stated above.
>
> Perhaps this should mention that allocation of additional page table pages
> may be required?

Yeah, you are right. In the later patch, I will rework the analysis
here. Make it
more clear and accurate.

>
> ...
> > Because there are vmemmap page tables reconstruction on the freeing/allocating
> > path, it increases some overhead. Here are some overhead analysis.
> >
> > 1) Allocating 10240 2MB hugetlb pages.
> >
> >    a) With this patch series applied:
> >    # time echo 10240 > /proc/sys/vm/nr_hugepages
> >
> >    real     0m0.166s
> >    user     0m0.000s
> >    sys      0m0.166s
> >
> >    # bpftrace -e 'kprobe:alloc_fresh_huge_page { @start[tid] = nsecs; } kretprobe:alloc_fresh_huge_page /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'
> >    Attaching 2 probes...
> >
> >    @latency:
> >    [8K, 16K)           8360 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> >    [16K, 32K)          1868 |@@@@@@@@@@@                                         |
> >    [32K, 64K)            10 |                                                    |
> >    [64K, 128K)            2 |                                                    |
> >
> >    b) Without this patch series:
> >    # time echo 10240 > /proc/sys/vm/nr_hugepages
> >
> >    real     0m0.066s
> >    user     0m0.000s
> >    sys      0m0.066s
> >
> >    # bpftrace -e 'kprobe:alloc_fresh_huge_page { @start[tid] = nsecs; } kretprobe:alloc_fresh_huge_page /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'
> >    Attaching 2 probes...
> >
> >    @latency:
> >    [4K, 8K)           10176 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> >    [8K, 16K)             62 |                                                    |
> >    [16K, 32K)             2 |                                                    |
> >
> >    Summarize: this feature is about ~2x slower than before.
> >
> > 2) Freeing 10240 @MB hugetlb pages.
> >
> >    a) With this patch series applied:
> >    # time echo 0 > /proc/sys/vm/nr_hugepages
> >
> >    real     0m0.004s
> >    user     0m0.000s
> >    sys      0m0.002s
> >
> >    # bpftrace -e 'kprobe:__free_hugepage { @start[tid] = nsecs; } kretprobe:__free_hugepage /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'
> >    Attaching 2 probes...
> >
> >    @latency:
> >    [16K, 32K)         10240 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> >
> >    b) Without this patch series:
> >    # time echo 0 > /proc/sys/vm/nr_hugepages
> >
> >    real     0m0.077s
> >    user     0m0.001s
> >    sys      0m0.075s
> >
> >    # bpftrace -e 'kprobe:__free_hugepage { @start[tid] = nsecs; } kretprobe:__free_hugepage /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'
> >    Attaching 2 probes...
> >
> >    @latency:
> >    [4K, 8K)            9950 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> >    [8K, 16K)            287 |@                                                   |
> >    [16K, 32K)             3 |                                                    |
> >
> >    Summarize: The overhead of __free_hugepage is about ~2-4x slower than before.
> >               But according to the allocation test above, I think that here is
> >             also ~2x slower than before.
> >
> >               But why the 'real' time of patched is smaller than before? Because
> >             In this patch series, the freeing hugetlb is asynchronous(through
> >             kwoker).
> >
> > Although the overhead has increased. But the overhead is not on the
> > allocating/freeing of each hugetlb page, it is only once when we reserve
> > some hugetlb pages through /proc/sys/vm/nr_hugepages. Once the reservation
> > is successful, the subsequent allocating, freeing and using are the same
> > as before (not patched). So I think that the overhead is acceptable.
>
> Thank you for benchmarking.  There are still some instances where huge pages
> are allocated 'on the fly' instead of being pulled from the pool.  Michal
> pointed out the case of page migration.  It is also possible for someone to
> use hugetlbfs without pre-allocating huge pages to the pool.  I remember the
> use case pointed out in commit 099730d67417.  It says, "I have a hugetlbfs
> user which is never explicitly allocating huge pages with 'nr_hugepages'.
> They only set 'nr_overcommit_hugepages' and then let the pages be allocated
> from the buddy allocator at fault time."  In this case, I suspect they were
> using 'page fault' allocation for initialization much like someone using
> /proc/sys/vm/nr_hugepages.  So, the overhead may not be as noticeable.

Thanks for pointing out this using case.

>
> --
> Mike Kravetz



-- 
Yours,
Muchun

      reply	other threads:[~2020-11-11  3:21 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-08 14:10 [PATCH v3 00/21] Free some vmemmap pages of hugetlb page Muchun Song
2020-11-08 14:10 ` [PATCH v3 01/21] mm/memory_hotplug: Move bootmem info registration API to bootmem_info.c Muchun Song
2020-11-08 14:10 ` [PATCH v3 02/21] mm/memory_hotplug: Move {get,put}_page_bootmem() " Muchun Song
2020-11-08 14:10 ` [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2020-11-09 13:52   ` Oscar Salvador
2020-11-09 14:20     ` [External] " Muchun Song
2020-11-10 19:31     ` Mike Kravetz
2020-11-10 19:50       ` Matthew Wilcox
2020-11-10 20:30         ` Mike Kravetz
2020-11-17 15:35         ` [External] " Muchun Song
2020-11-11  3:28       ` Muchun Song
2020-11-08 14:10 ` [PATCH v3 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2020-11-09 16:48   ` Oscar Salvador
2020-11-10  2:42     ` [External] " Muchun Song
2020-11-10 19:38       ` Mike Kravetz
2020-11-11  3:22         ` Muchun Song
2020-11-08 14:10 ` [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers Muchun Song
2020-11-09 17:21   ` Oscar Salvador
2020-11-10  3:49     ` [External] " Muchun Song
2020-11-10  5:42       ` Oscar Salvador
2020-11-10  6:08         ` Muchun Song
2020-11-10  6:33           ` Oscar Salvador
2020-11-10  7:10             ` Muchun Song
2020-11-11  0:47   ` Mike Kravetz
2020-11-11  3:41     ` [External] " Muchun Song
2020-11-13  0:35       ` Mike Kravetz
2020-11-13  1:02         ` Mike Kravetz
2020-11-13  4:18         ` Muchun Song
2020-11-08 14:10 ` [PATCH v3 06/21] mm/bootmem_info: Introduce {free,prepare}_vmemmap_page() Muchun Song
2020-11-08 14:10 ` [PATCH v3 07/21] mm/bootmem_info: Combine bootmem info and type into page->freelist Muchun Song
2020-11-08 14:11 ` [PATCH v3 08/21] mm/vmemmap: Initialize page table lock for vmemmap Muchun Song
2020-11-09 18:11   ` Oscar Salvador
2020-11-10  5:17     ` [External] " Muchun Song
2020-11-08 14:11 ` [PATCH v3 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page Muchun Song
2020-11-09 18:51   ` Oscar Salvador
2020-11-10  6:40     ` [External] " Muchun Song
2020-11-10  9:48       ` Oscar Salvador
2020-11-10 10:47         ` Muchun Song
2020-11-10 13:52           ` Oscar Salvador
2020-11-10 14:00             ` Muchun Song
2020-11-08 14:11 ` [PATCH v3 10/21] mm/hugetlb: Defer freeing of hugetlb pages Muchun Song
2020-11-08 14:11 ` [PATCH v3 11/21] mm/hugetlb: Allocate the vmemmap pages associated with each hugetlb page Muchun Song
2020-11-08 14:11 ` [PATCH v3 12/21] mm/hugetlb: Introduce remap_huge_page_pmd_vmemmap helper Muchun Song
2020-11-08 14:11 ` [PATCH v3 13/21] mm/hugetlb: Use PG_slab to indicate split pmd Muchun Song
2020-11-08 14:11 ` [PATCH v3 14/21] mm/hugetlb: Support freeing vmemmap pages of gigantic page Muchun Song
2020-11-08 14:11 ` [PATCH v3 15/21] mm/hugetlb: Add a BUILD_BUG_ON to check if struct page size is a power of two Muchun Song
2020-11-08 14:11 ` [PATCH v3 16/21] mm/hugetlb: Set the PageHWPoison to the raw error page Muchun Song
2020-11-08 14:11 ` [PATCH v3 17/21] mm/hugetlb: Flush work when dissolving hugetlb page Muchun Song
2020-11-08 14:11 ` [PATCH v3 18/21] mm/hugetlb: Add a kernel parameter hugetlb_free_vmemmap Muchun Song
2020-11-08 14:11 ` [PATCH v3 19/21] mm/hugetlb: Merge pte to huge pmd only for gigantic page Muchun Song
2020-11-08 14:11 ` [PATCH v3 20/21] mm/hugetlb: Gather discrete indexes of tail page Muchun Song
2020-11-08 14:11 ` [PATCH v3 21/21] mm/hugetlb: Add BUILD_BUG_ON to catch invalid usage of tail struct page Muchun Song
2020-11-10 19:23 ` [PATCH v3 00/21] Free some vmemmap pages of hugetlb page Mike Kravetz
2020-11-11  3:21   ` Muchun Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMZfGtVvBk6eHRRBcyKxQGx5HG7K0xD8LL7hC2f=bK1cizC2VA@mail.gmail.com' \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).