From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: David Hildenbrand <david@redhat.com>,
Michal Hocko <mhocko@suse.com>,
Oscar Salvador <osalvador@suse.de>, Zi Yan <ziy@nvidia.com>,
Muchun Song <songmuchun@bytedance.com>,
Naoya Horiguchi <naoya.horiguchi@linux.dev>,
David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH 0/8] hugetlb: add demote/split page functionality
Date: Wed, 21 Jul 2021 16:05:03 -0700 [thread overview]
Message-ID: <20210721230511.201823-1-mike.kravetz@oracle.com> (raw)
The concurrent use of multiple hugetlb page sizes on a single system
is becoming more common. One of the reasons is better TLB support for
gigantic page sizes on x86 hardware. In addition, hugetlb pages are
being used to back VMs in hosting environments.
When using hugetlb pages to back VMs in such environments, it is
sometimes desirable to preallocate hugetlb pools. This avoids the delay
and uncertainty of allocating hugetlb pages at VM startup. In addition,
preallocating huge pages minimizes the issue of memory fragmentation that
increases the longer the system is up and running.
In such environments, a combination of larger and smaller hugetlb pages
are preallocated in anticipation of backing VMs of various sizes. Over
time, the preallocated pool of smaller hugetlb pages may become
depleted while larger hugetlb pages still remain. In such situations,
it may be desirable to convert larger hugetlb pages to smaller hugetlb
pages.
Converting larger to smaller hugetlb pages can be accomplished today by
first freeing the larger page to the buddy allocator and then allocating
the smaller pages. However, there are two issues with this approach:
1) This process can take quite some time, especially if allocation of
the smaller pages is not immediate and requires migration/compaction.
2) There is no guarantee that the total size of smaller pages allocated
will match the size of the larger page which was freed. This is
because the area freed by the larger page could quickly be
fragmented.
To address these issues, introduce the concept of hugetlb page demotion.
Demotion provides a means of 'in place' splitting a hugetlb page to
pages of a smaller size. For example, on x86 one 1G page can be
demoted to 512 2M pages. Page demotion is controlled via sysfs files.
- demote_size Read only target page size for demotion
- demote Writable number of hugetlb pages to be demoted
Only hugetlb pages which are free at the time of the request can be demoted.
Demotion does not add to the complexity surplus pages. Demotion also honors
reserved huge pages. Therefore, when a value is written to the sysfs demote
file, that value is only the maximum number of pages which will be demoted.
It is possible fewer will actually be demoted.
If demote_size is PAGESIZE, demote will simply free pages to the buddy
allocator.
Real world use cases
--------------------
There are groups today using hugetlb pages to back VMs on x86. Their
use case is as described above. They have experienced the issues with
performance and not necessarily getting the excepted number smaller huge
pages after free/allocate cycle.
Note to reviewers
-----------------
Patches 1-5 provide the basic demote functionality. They are built on
next-20210721.
Patch 3 deals with this issue of speculative page references as
discussed in [1] and [2]. It builds on the ideas used in
patches currently in mmotm. There have been few comments on
those patches in mmotm, so I do not feel the approach has been
well vetted.
Patches 6-8 are an optimization to deal with vmemmap optimized pages.
This was discussed in the RFC. IMO, the code may not be worth
the benefit. They could be dropped with no loss of
functionality. In addition, Muchun has recently sent patches to
further optimize hugetlb vmemmap reduction by only requiring one
vmemmap page per huge page [3]. These patches do not take Muchun's
new patches into account.
RFC -> v1
- Provides basic support for vmemmap optimized pages
- Takes speculative page references into account
- Updated Documentation file
- Added optimizations for vmemmap optimized pages
[1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/
[2] https://lore.kernel.org/linux-mm/20210710002441.167759-1-mike.kravetz@oracle.com/
[3] https://lore.kernel.org/linux-mm/20210714091800.42645-1-songmuchun@bytedance.com/
Mike Kravetz (8):
hugetlb: add demote hugetlb page sysfs interfaces
hugetlb: add HPageCma flag and code to free non-gigantic pages in CMA
hugetlb: add demote bool to gigantic page routines
hugetlb: add hugetlb demote page support
hugetlb: document the demote sysfs interfaces
hugetlb: vmemmap optimizations when demoting hugetlb pages
hugetlb: prepare destroy and prep routines for vmemmap optimized pages
hugetlb: Optimized demote vmemmap optimizatized pages
Documentation/admin-guide/mm/hugetlbpage.rst | 29 +-
include/linux/hugetlb.h | 8 +
include/linux/mm.h | 4 +
mm/hugetlb.c | 328 +++++++++++++++++--
mm/hugetlb_vmemmap.c | 72 +++-
mm/hugetlb_vmemmap.h | 16 +
mm/sparse-vmemmap.c | 123 ++++++-
7 files changed, 538 insertions(+), 42 deletions(-)
--
2.31.1
next reply other threads:[~2021-07-21 23:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-21 23:05 Mike Kravetz [this message]
2021-07-21 23:05 ` [PATCH 1/8] hugetlb: add demote hugetlb page sysfs interfaces Mike Kravetz
2021-07-21 23:05 ` [PATCH 2/8] hugetlb: add HPageCma flag and code to free non-gigantic pages in CMA Mike Kravetz
2021-07-21 23:05 ` [PATCH 3/8] hugetlb: add demote bool to gigantic page routines Mike Kravetz
2021-07-21 23:05 ` [PATCH 4/8] hugetlb: add hugetlb demote page support Mike Kravetz
2021-07-21 23:05 ` [PATCH 5/8] hugetlb: document the demote sysfs interfaces Mike Kravetz
2021-07-21 23:05 ` [PATCH 6/8] hugetlb: vmemmap optimizations when demoting hugetlb pages Mike Kravetz
2021-07-21 23:05 ` [PATCH 7/8] hugetlb: prepare destroy and prep routines for vmemmap optimized pages Mike Kravetz
2021-07-21 23:05 ` [PATCH 8/8] hugetlb: Optimized demote vmemmap optimizatized pages Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210721230511.201823-1-mike.kravetz@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=naoya.horiguchi@linux.dev \
--cc=osalvador@suse.de \
--cc=rientjes@google.com \
--cc=songmuchun@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).