From: Muchun Song <songmuchun@bytedance.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
Cc: "corbet@lwn.net" <corbet@lwn.net>,
"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"luto@kernel.org" <luto@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"paulmck@kernel.org" <paulmck@kernel.org>,
"mchehab+huawei@kernel.org" <mchehab+huawei@kernel.org>,
"pawan.kumar.gupta@linux.intel.com"
<pawan.kumar.gupta@linux.intel.com>,
"rdunlap@infradead.org" <rdunlap@infradead.org>,
"oneukum@suse.com" <oneukum@suse.com>,
"anshuman.khandual@arm.com" <anshuman.khandual@arm.com>,
"jroedel@suse.de" <jroedel@suse.de>,
"almasrymina@google.com" <almasrymina@google.com>,
"rientjes@google.com" <rientjes@google.com>,
"willy@infradead.org" <willy@infradead.org>,
"osalvador@suse.de" <osalvador@suse.de>,
"mhocko@suse.com" <mhocko@suse.com>,
"duanxiongchun@bytedance.com" <duanxiongchun@bytedance.com>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [External] RE: [PATCH v4 00/21] Free some vmemmap pages of hugetlb page
Date: Wed, 18 Nov 2020 00:29:07 +0800 [thread overview]
Message-ID: <CAMZfGtWNa=abZdN6HmWE1VBFHfGCbsW9D0zrN-F5zrhn6s=ErA@mail.gmail.com> (raw)
In-Reply-To: <714ae7d701d446259ab269f14a030fe9@hisilicon.com>
On Tue, Nov 17, 2020 at 7:08 PM Song Bao Hua (Barry Song)
<song.bao.hua@hisilicon.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Muchun Song [mailto:songmuchun@bytedance.com]
> > Sent: Tuesday, November 17, 2020 11:50 PM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > Cc: corbet@lwn.net; mike.kravetz@oracle.com; tglx@linutronix.de;
> > mingo@redhat.com; bp@alien8.de; x86@kernel.org; hpa@zytor.com;
> > dave.hansen@linux.intel.com; luto@kernel.org; peterz@infradead.org;
> > viro@zeniv.linux.org.uk; akpm@linux-foundation.org; paulmck@kernel.org;
> > mchehab+huawei@kernel.org; pawan.kumar.gupta@linux.intel.com;
> > rdunlap@infradead.org; oneukum@suse.com; anshuman.khandual@arm.com;
> > jroedel@suse.de; almasrymina@google.com; rientjes@google.com;
> > willy@infradead.org; osalvador@suse.de; mhocko@suse.com;
> > duanxiongchun@bytedance.com; linux-doc@vger.kernel.org;
> > linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > linux-fsdevel@vger.kernel.org
> > Subject: Re: [External] RE: [PATCH v4 00/21] Free some vmemmap pages of
> > hugetlb page
> >
> > On Tue, Nov 17, 2020 at 6:16 PM Song Bao Hua (Barry Song)
> > <song.bao.hua@hisilicon.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On
> > > > Behalf Of Muchun Song
> > > > Sent: Saturday, November 14, 2020 12:00 AM
> > > > To: corbet@lwn.net; mike.kravetz@oracle.com; tglx@linutronix.de;
> > > > mingo@redhat.com; bp@alien8.de; x86@kernel.org; hpa@zytor.com;
> > > > dave.hansen@linux.intel.com; luto@kernel.org; peterz@infradead.org;
> > > > viro@zeniv.linux.org.uk; akpm@linux-foundation.org; paulmck@kernel.org;
> > > > mchehab+huawei@kernel.org; pawan.kumar.gupta@linux.intel.com;
> > > > rdunlap@infradead.org; oneukum@suse.com;
> > anshuman.khandual@arm.com;
> > > > jroedel@suse.de; almasrymina@google.com; rientjes@google.com;
> > > > willy@infradead.org; osalvador@suse.de; mhocko@suse.com
> > > > Cc: duanxiongchun@bytedance.com; linux-doc@vger.kernel.org;
> > > > linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > > linux-fsdevel@vger.kernel.org; Muchun Song
> > <songmuchun@bytedance.com>
> > > > Subject: [PATCH v4 00/21] Free some vmemmap pages of hugetlb page
> > > >
> > > > Hi all,
> > > >
> > > > This patch series will free some vmemmap pages(struct page structures)
> > > > associated with each hugetlbpage when preallocated to save memory.
> > > >
> > > > Nowadays we track the status of physical page frames using struct page
> > > > structures arranged in one or more arrays. And here exists one-to-one
> > > > mapping between the physical page frame and the corresponding struct
> > page
> > > > structure.
> > > >
> > > > The HugeTLB support is built on top of multiple page size support that
> > > > is provided by most modern architectures. For example, x86 CPUs normally
> > > > support 4K and 2M (1G if architecturally supported) page sizes. Every
> > > > HugeTLB has more than one struct page structure. The 2M HugeTLB has
> > 512
> > > > struct page structure and 1G HugeTLB has 4096 struct page structures. But
> > > > in the core of HugeTLB only uses the first 4 (Use of first 4 struct page
> > > > structures comes from HUGETLB_CGROUP_MIN_ORDER.) struct page
> > > > structures to
> > > > store metadata associated with each HugeTLB. The rest of the struct page
> > > > structures are usually read the compound_head field which are all the same
> > > > value. If we can free some struct page memory to buddy system so that we
> > > > can save a lot of memory.
> > > >
> > > > When the system boot up, every 2M HugeTLB has 512 struct page
> > structures
> > > > which size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE).
> > > >
> > > > hugetlbpage struct pages(8 pages) page
> > > > frame(8 pages)
> > > > +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
> > > > | | | 0 | -------------> |
> > 0
> > > > |
> > > > | | | 1 | -------------> |
> > 1
> > > > |
> > > > | | | 2 | -------------> |
> > 2
> > > > |
> > > > | | | 3 | -------------> |
> > 3
> > > > |
> > > > | | | 4 | -------------> |
> > 4
> > > > |
> > > > | 2M | | 5 | -------------> |
> > > > 5 |
> > > > | | | 6 | -------------> |
> > 6
> > > > |
> > > > | | | 7 | -------------> |
> > 7
> > > > |
> > > > | | +-----------+
> > > > +-----------+
> > > > | |
> > > > | |
> > > > +-----------+
> > > >
> > > >
> > > > When a hugetlbpage is preallocated, we can change the mapping from
> > above
> > > > to
> > > > bellow.
> > > >
> > > > hugetlbpage struct pages(8 pages) page
> > > > frame(8 pages)
> > > > +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
> > > > | | | 0 | -------------> |
> > 0
> > > > |
> > > > | | | 1 | -------------> |
> > 1
> > > > |
> > > > | | | 2 | ------------->
> > > > +-----------+
> > > > | | | 3 | -----------------^ ^
> > ^ ^
> > > > ^
> > > > | | | 4 | -------------------+
> > | |
> > > > |
> > > > | 2M | | 5 |
> > ---------------------+ |
> > > > |
> > > > | | | 6 |
> > -----------------------+ |
> > > > | | | 7 |
> > -------------------------+
> > > > | | +-----------+
> > > > | |
> > > > | |
> > > > +-----------+
> > > >
> > > > For tail pages, the value of compound_head is the same. So we can reuse
> > > > first page of tail page structs. We map the virtual addresses of the
> > > > remaining 6 pages of tail page structs to the first tail page struct,
> > > > and then free these 6 pages. Therefore, we need to reserve at least 2
> > > > pages as vmemmap areas.
> > > >
> > > > When a hugetlbpage is freed to the buddy system, we should allocate six
> > > > pages for vmemmap pages and restore the previous mapping relationship.
> > > >
> > > > If we uses the 1G hugetlbpage, we can save 4088 pages(There are 4096
> > pages
> > > > for
> > > > struct page structures, we reserve 2 pages for vmemmap and 8 pages for
> > page
> > > > tables. So we can save 4088 pages). This is a very substantial gain. On our
> > > > server, run some SPDK/QEMU applications which will use 1024GB
> > hugetlbpage.
> > > > With this feature enabled, we can save ~16GB(1G hugepage)/~11GB(2MB
> > > > hugepage)
> > >
> > > Hi Muchun,
> > >
> > > Do we really save 11GB for 2MB hugepage?
> > > How much do we save if we only get one 2MB hugetlb from one 128MB
> > mem_section?
> > > It seems we need to get at least one page for the PTEs since we are splitting
> > PMD of
> > > vmemmap into PTE?
> >
> > There are 524288(1024GB/2MB) 2MB HugeTLB pages. We can save 6 pages for
> > each
> > 2MB HugeTLB page. So we can save 3145728 pages. But we need to split PMD
> > page
> > table for every one 128MB mem_section and every section need one page
> > as PTE page
> > table. So we need 8192(1024GB/128MB) pages as PTE page tables.
> > Finally, we can save
> > 3137536(3145728-8192) pages which is 11.97GB.
>
> The worst case I can see is that:
> if we get 100 hugetlb with 2MB size, but the 100 hugetlb comes from different
> mem_section, we won't save 11.97GB. we only save 5/8 * 16GB=10GB.
>
> Anyway, it seems 11GB is in the middle of 10GB and 11.97GB,
> so sounds sensible :-)
>
> ideally, we should be able to free PageTail if we change struct page in some way.
> Then we will save much more for 2MB hugetlb. but it seems it is not easy.
Now for the 2MB HugrTLB page, we only free 6 vmemmap pages.
But your words woke me up. Maybe we really can free 7 vmemmap
pages. In this case, we can see 8 of the 512 struct page structures
has beed set PG_head flag. If we can adjust compound_head()
slightly and make compound_head() return the real head struct
page when the parameter is the tail struct page but with PG_head
flag set. I will start an investigation and a test.
Thanks.
>
> Thanks
> Barry
--
Yours,
Muchun
next prev parent reply other threads:[~2020-11-17 16:30 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-13 10:59 [PATCH v4 00/21] Free some vmemmap pages of hugetlb page Muchun Song
2020-11-13 10:59 ` [PATCH v4 01/21] mm/memory_hotplug: Move bootmem info registration API to bootmem_info.c Muchun Song
2020-11-16 13:50 ` Oscar Salvador
2020-11-13 10:59 ` [PATCH v4 02/21] mm/memory_hotplug: Move {get,put}_page_bootmem() " Muchun Song
2020-11-16 13:52 ` Oscar Salvador
2020-11-13 10:59 ` [PATCH v4 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2020-11-18 22:38 ` Mike Kravetz
2020-11-19 2:57 ` [External] " Muchun Song
2020-11-13 10:59 ` [PATCH v4 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2020-11-16 13:33 ` Oscar Salvador
2020-11-16 15:40 ` [External] " Muchun Song
2020-11-18 22:54 ` Mike Kravetz
2020-11-18 23:48 ` Mike Kravetz
2020-11-19 3:00 ` [External] " Muchun Song
2020-11-13 10:59 ` [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers Muchun Song
2020-11-17 15:06 ` Oscar Salvador
2020-11-17 15:29 ` [External] " Muchun Song
2020-11-19 6:17 ` Muchun Song
2020-11-19 23:21 ` Mike Kravetz
2020-11-20 2:52 ` Muchun Song
2020-11-19 23:37 ` Mike Kravetz
2020-11-13 10:59 ` [PATCH v4 06/21] mm/bootmem_info: Introduce {free,prepare}_vmemmap_page() Muchun Song
2020-11-13 10:59 ` [PATCH v4 07/21] mm/bootmem_info: Combine bootmem info and type into page->freelist Muchun Song
2020-11-13 10:59 ` [PATCH v4 08/21] mm/hugetlb: Initialize page table lock for vmemmap Muchun Song
2020-11-13 10:59 ` [PATCH v4 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page Muchun Song
2020-11-17 9:54 ` Song Bao Hua (Barry Song)
2020-11-17 10:26 ` [External] " Muchun Song
2020-11-18 3:21 ` Song Bao Hua (Barry Song)
2020-11-13 10:59 ` [PATCH v4 10/21] mm/hugetlb: Defer freeing of hugetlb pages Muchun Song
2020-11-13 10:59 ` [PATCH v4 11/21] mm/hugetlb: Allocate the vmemmap pages associated with each hugetlb page Muchun Song
2020-11-13 10:59 ` [PATCH v4 12/21] mm/hugetlb: Introduce remap_huge_page_pmd_vmemmap helper Muchun Song
2020-11-13 10:59 ` [PATCH v4 13/21] mm/hugetlb: Use PG_slab to indicate split pmd Muchun Song
2020-11-13 10:59 ` [PATCH v4 14/21] mm/hugetlb: Support freeing vmemmap pages of gigantic page Muchun Song
2020-11-13 10:59 ` [PATCH v4 15/21] mm/hugetlb: Set the PageHWPoison to the raw error page Muchun Song
2020-11-13 10:59 ` [PATCH v4 16/21] mm/hugetlb: Flush work when dissolving hugetlb page Muchun Song
2020-11-13 10:59 ` [PATCH v4 17/21] mm/hugetlb: Add a kernel parameter hugetlb_free_vmemmap Muchun Song
2020-11-13 10:59 ` [PATCH v4 18/21] mm/hugetlb: Merge pte to huge pmd only for gigantic page Muchun Song
2020-11-13 10:59 ` [PATCH v4 19/21] mm/hugetlb: Gather discrete indexes of tail page Muchun Song
2020-11-13 10:59 ` [PATCH v4 20/21] mm/hugetlb: Add BUILD_BUG_ON to catch invalid usage of tail struct page Muchun Song
2020-11-13 10:59 ` [PATCH v4 21/21] mm/hugetlb: Disable freeing vmemmap if struct page size is not power of two Muchun Song
2020-11-17 10:15 ` [PATCH v4 00/21] Free some vmemmap pages of hugetlb page Song Bao Hua (Barry Song)
2020-11-17 10:49 ` [External] " Muchun Song
2020-11-17 11:07 ` Song Bao Hua (Barry Song)
2020-11-17 16:29 ` Muchun Song [this message]
2020-11-17 19:22 ` Matthew Wilcox
2020-11-18 2:43 ` Muchun Song
2020-11-17 19:45 ` Oscar Salvador
2020-11-18 3:27 ` Muchun Song
2020-11-18 3:27 ` Song Bao Hua (Barry Song)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMZfGtWNa=abZdN6HmWE1VBFHfGCbsW9D0zrN-F5zrhn6s=ErA@mail.gmail.com' \
--to=songmuchun@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=anshuman.khandual@arm.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=duanxiongchun@bytedance.com \
--cc=hpa@zytor.com \
--cc=jroedel@suse.de \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mchehab+huawei@kernel.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=mingo@redhat.com \
--cc=oneukum@suse.com \
--cc=osalvador@suse.de \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=song.bao.hua@hisilicon.com \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).