linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: "corbet@lwn.net" <corbet@lwn.net>,
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"luto@kernel.org" <luto@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"paulmck@kernel.org" <paulmck@kernel.org>,
	"mchehab+huawei@kernel.org" <mchehab+huawei@kernel.org>,
	"pawan.kumar.gupta@linux.intel.com" 
	<pawan.kumar.gupta@linux.intel.com>,
	"rdunlap@infradead.org" <rdunlap@infradead.org>,
	"oneukum@suse.com" <oneukum@suse.com>,
	"anshuman.khandual@arm.com" <anshuman.khandual@arm.com>,
	"jroedel@suse.de" <jroedel@suse.de>,
	"almasrymina@google.com" <almasrymina@google.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"osalvador@suse.de" <osalvador@suse.de>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"duanxiongchun@bytedance.com" <duanxiongchun@bytedance.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: RE: [External] RE: [PATCH v4 00/21] Free some vmemmap pages of hugetlb page
Date: Tue, 17 Nov 2020 11:07:57 +0000	[thread overview]
Message-ID: <714ae7d701d446259ab269f14a030fe9@hisilicon.com> (raw)
In-Reply-To: <CAMZfGtUVDJ4QHYRCKnPTkgcKGJ38s2aOOktH+8Urz7oiVfimww@mail.gmail.com>



> -----Original Message-----
> From: Muchun Song [mailto:songmuchun@bytedance.com]
> Sent: Tuesday, November 17, 2020 11:50 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: corbet@lwn.net; mike.kravetz@oracle.com; tglx@linutronix.de;
> mingo@redhat.com; bp@alien8.de; x86@kernel.org; hpa@zytor.com;
> dave.hansen@linux.intel.com; luto@kernel.org; peterz@infradead.org;
> viro@zeniv.linux.org.uk; akpm@linux-foundation.org; paulmck@kernel.org;
> mchehab+huawei@kernel.org; pawan.kumar.gupta@linux.intel.com;
> rdunlap@infradead.org; oneukum@suse.com; anshuman.khandual@arm.com;
> jroedel@suse.de; almasrymina@google.com; rientjes@google.com;
> willy@infradead.org; osalvador@suse.de; mhocko@suse.com;
> duanxiongchun@bytedance.com; linux-doc@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> linux-fsdevel@vger.kernel.org
> Subject: Re: [External] RE: [PATCH v4 00/21] Free some vmemmap pages of
> hugetlb page
> 
> On Tue, Nov 17, 2020 at 6:16 PM Song Bao Hua (Barry Song)
> <song.bao.hua@hisilicon.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On
> > > Behalf Of Muchun Song
> > > Sent: Saturday, November 14, 2020 12:00 AM
> > > To: corbet@lwn.net; mike.kravetz@oracle.com; tglx@linutronix.de;
> > > mingo@redhat.com; bp@alien8.de; x86@kernel.org; hpa@zytor.com;
> > > dave.hansen@linux.intel.com; luto@kernel.org; peterz@infradead.org;
> > > viro@zeniv.linux.org.uk; akpm@linux-foundation.org; paulmck@kernel.org;
> > > mchehab+huawei@kernel.org; pawan.kumar.gupta@linux.intel.com;
> > > rdunlap@infradead.org; oneukum@suse.com;
> anshuman.khandual@arm.com;
> > > jroedel@suse.de; almasrymina@google.com; rientjes@google.com;
> > > willy@infradead.org; osalvador@suse.de; mhocko@suse.com
> > > Cc: duanxiongchun@bytedance.com; linux-doc@vger.kernel.org;
> > > linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > linux-fsdevel@vger.kernel.org; Muchun Song
> <songmuchun@bytedance.com>
> > > Subject: [PATCH v4 00/21] Free some vmemmap pages of hugetlb page
> > >
> > > Hi all,
> > >
> > > This patch series will free some vmemmap pages(struct page structures)
> > > associated with each hugetlbpage when preallocated to save memory.
> > >
> > > Nowadays we track the status of physical page frames using struct page
> > > structures arranged in one or more arrays. And here exists one-to-one
> > > mapping between the physical page frame and the corresponding struct
> page
> > > structure.
> > >
> > > The HugeTLB support is built on top of multiple page size support that
> > > is provided by most modern architectures. For example, x86 CPUs normally
> > > support 4K and 2M (1G if architecturally supported) page sizes. Every
> > > HugeTLB has more than one struct page structure. The 2M HugeTLB has
> 512
> > > struct page structure and 1G HugeTLB has 4096 struct page structures. But
> > > in the core of HugeTLB only uses the first 4 (Use of first 4 struct page
> > > structures comes from HUGETLB_CGROUP_MIN_ORDER.) struct page
> > > structures to
> > > store metadata associated with each HugeTLB. The rest of the struct page
> > > structures are usually read the compound_head field which are all the same
> > > value. If we can free some struct page memory to buddy system so that we
> > > can save a lot of memory.
> > >
> > > When the system boot up, every 2M HugeTLB has 512 struct page
> structures
> > > which size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE).
> > >
> > >    hugetlbpage                  struct pages(8 pages)          page
> > > frame(8 pages)
> > >   +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
> > >   |           |                     |     0     | -------------> |
> 0
> > > |
> > >   |           |                     |     1     | -------------> |
> 1
> > > |
> > >   |           |                     |     2     | -------------> |
> 2
> > > |
> > >   |           |                     |     3     | -------------> |
> 3
> > > |
> > >   |           |                     |     4     | -------------> |
> 4
> > > |
> > >   |     2M    |                     |     5     | -------------> |
> > > 5     |
> > >   |           |                     |     6     | -------------> |
> 6
> > > |
> > >   |           |                     |     7     | -------------> |
> 7
> > > |
> > >   |           |                     +-----------+
> > > +-----------+
> > >   |           |
> > >   |           |
> > >   +-----------+
> > >
> > >
> > > When a hugetlbpage is preallocated, we can change the mapping from
> above
> > > to
> > > bellow.
> > >
> > >    hugetlbpage                  struct pages(8 pages)          page
> > > frame(8 pages)
> > >   +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
> > >   |           |                     |     0     | -------------> |
> 0
> > > |
> > >   |           |                     |     1     | -------------> |
> 1
> > > |
> > >   |           |                     |     2     | ------------->
> > > +-----------+
> > >   |           |                     |     3     | -----------------^ ^
> ^ ^
> > > ^
> > >   |           |                     |     4     | -------------------+
> | |
> > > |
> > >   |     2M    |                     |     5     |
> ---------------------+ |
> > > |
> > >   |           |                     |     6     |
> -----------------------+ |
> > >   |           |                     |     7     |
> -------------------------+
> > >   |           |                     +-----------+
> > >   |           |
> > >   |           |
> > >   +-----------+
> > >
> > > For tail pages, the value of compound_head is the same. So we can reuse
> > > first page of tail page structs. We map the virtual addresses of the
> > > remaining 6 pages of tail page structs to the first tail page struct,
> > > and then free these 6 pages. Therefore, we need to reserve at least 2
> > > pages as vmemmap areas.
> > >
> > > When a hugetlbpage is freed to the buddy system, we should allocate six
> > > pages for vmemmap pages and restore the previous mapping relationship.
> > >
> > > If we uses the 1G hugetlbpage, we can save 4088 pages(There are 4096
> pages
> > > for
> > > struct page structures, we reserve 2 pages for vmemmap and 8 pages for
> page
> > > tables. So we can save 4088 pages). This is a very substantial gain. On our
> > > server, run some SPDK/QEMU applications which will use 1024GB
> hugetlbpage.
> > > With this feature enabled, we can save ~16GB(1G hugepage)/~11GB(2MB
> > > hugepage)
> >
> > Hi Muchun,
> >
> > Do we really save 11GB for 2MB hugepage?
> > How much do we save if we only get one 2MB hugetlb from one 128MB
> mem_section?
> > It seems we need to get at least one page for the PTEs since we are splitting
> PMD of
> > vmemmap into PTE?
> 
> There are 524288(1024GB/2MB) 2MB HugeTLB pages. We can save 6 pages for
> each
> 2MB HugeTLB page. So we can save 3145728 pages. But we need to split PMD
> page
> table for every one 128MB mem_section and every section need one page
> as PTE page
> table. So we need 8192(1024GB/128MB) pages as PTE page tables.
> Finally, we can save
> 3137536(3145728-8192) pages which is 11.97GB.

The worst case I can see is that:
if we get 100 hugetlb with 2MB size, but the 100 hugetlb comes from different
mem_section, we won't save 11.97GB. we only save 5/8 * 16GB=10GB.

Anyway, it seems 11GB is in the middle of 10GB and 11.97GB,
so sounds sensible :-)

ideally, we should be able to free PageTail if we change struct page in some way.
Then we will save much more for 2MB hugetlb. but it seems it is not easy.

Thanks
Barry

  reply	other threads:[~2020-11-17 11:08 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13 10:59 [PATCH v4 00/21] Free some vmemmap pages of hugetlb page Muchun Song
2020-11-13 10:59 ` [PATCH v4 01/21] mm/memory_hotplug: Move bootmem info registration API to bootmem_info.c Muchun Song
2020-11-16 13:50   ` Oscar Salvador
2020-11-13 10:59 ` [PATCH v4 02/21] mm/memory_hotplug: Move {get,put}_page_bootmem() " Muchun Song
2020-11-16 13:52   ` Oscar Salvador
2020-11-13 10:59 ` [PATCH v4 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2020-11-18 22:38   ` Mike Kravetz
2020-11-19  2:57     ` [External] " Muchun Song
2020-11-13 10:59 ` [PATCH v4 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2020-11-16 13:33   ` Oscar Salvador
2020-11-16 15:40     ` [External] " Muchun Song
2020-11-18 22:54     ` Mike Kravetz
2020-11-18 23:48   ` Mike Kravetz
2020-11-19  3:00     ` [External] " Muchun Song
2020-11-13 10:59 ` [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers Muchun Song
2020-11-17 15:06   ` Oscar Salvador
2020-11-17 15:29     ` [External] " Muchun Song
2020-11-19  6:17     ` Muchun Song
2020-11-19 23:21       ` Mike Kravetz
2020-11-20  2:52         ` Muchun Song
2020-11-19 23:37   ` Mike Kravetz
2020-11-13 10:59 ` [PATCH v4 06/21] mm/bootmem_info: Introduce {free,prepare}_vmemmap_page() Muchun Song
2020-11-13 10:59 ` [PATCH v4 07/21] mm/bootmem_info: Combine bootmem info and type into page->freelist Muchun Song
2020-11-13 10:59 ` [PATCH v4 08/21] mm/hugetlb: Initialize page table lock for vmemmap Muchun Song
2020-11-13 10:59 ` [PATCH v4 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page Muchun Song
2020-11-17  9:54   ` Song Bao Hua (Barry Song)
2020-11-17 10:26     ` [External] " Muchun Song
2020-11-18  3:21       ` Song Bao Hua (Barry Song)
2020-11-13 10:59 ` [PATCH v4 10/21] mm/hugetlb: Defer freeing of hugetlb pages Muchun Song
2020-11-13 10:59 ` [PATCH v4 11/21] mm/hugetlb: Allocate the vmemmap pages associated with each hugetlb page Muchun Song
2020-11-13 10:59 ` [PATCH v4 12/21] mm/hugetlb: Introduce remap_huge_page_pmd_vmemmap helper Muchun Song
2020-11-13 10:59 ` [PATCH v4 13/21] mm/hugetlb: Use PG_slab to indicate split pmd Muchun Song
2020-11-13 10:59 ` [PATCH v4 14/21] mm/hugetlb: Support freeing vmemmap pages of gigantic page Muchun Song
2020-11-13 10:59 ` [PATCH v4 15/21] mm/hugetlb: Set the PageHWPoison to the raw error page Muchun Song
2020-11-13 10:59 ` [PATCH v4 16/21] mm/hugetlb: Flush work when dissolving hugetlb page Muchun Song
2020-11-13 10:59 ` [PATCH v4 17/21] mm/hugetlb: Add a kernel parameter hugetlb_free_vmemmap Muchun Song
2020-11-13 10:59 ` [PATCH v4 18/21] mm/hugetlb: Merge pte to huge pmd only for gigantic page Muchun Song
2020-11-13 10:59 ` [PATCH v4 19/21] mm/hugetlb: Gather discrete indexes of tail page Muchun Song
2020-11-13 10:59 ` [PATCH v4 20/21] mm/hugetlb: Add BUILD_BUG_ON to catch invalid usage of tail struct page Muchun Song
2020-11-13 10:59 ` [PATCH v4 21/21] mm/hugetlb: Disable freeing vmemmap if struct page size is not power of two Muchun Song
2020-11-17 10:15 ` [PATCH v4 00/21] Free some vmemmap pages of hugetlb page Song Bao Hua (Barry Song)
2020-11-17 10:49   ` [External] " Muchun Song
2020-11-17 11:07     ` Song Bao Hua (Barry Song) [this message]
2020-11-17 16:29       ` Muchun Song
2020-11-17 19:22         ` Matthew Wilcox
2020-11-18  2:43           ` Muchun Song
2020-11-17 19:45         ` Oscar Salvador
2020-11-18  3:27           ` Muchun Song
2020-11-18  3:27           ` Song Bao Hua (Barry Song)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=714ae7d701d446259ab269f14a030fe9@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).