linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Subject: [LSF/MM/BPF TOPIC] 1GB PUD THP support (gigantic page allocation, increasing MAX_ORDER, anti-fragmentation and more)
Date: Tue, 11 May 2021 17:18:12 -0400	[thread overview]
Message-ID: <FBF3E7A8-AAD2-4CD2-B939-1574F761A99E@nvidia.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

I have been working on 1GB THP support [1][2][3] and would like to have a discussion on the high-level design and some implementation details. The topics I would like to discuss related to 1GB PUD THP include:

1. Gigantic page allocation. Since MAX_ORDER is limiting us from allocating 1GB pages, we need to enable it via one or more ways, like using alloc_contig_range() or increasing MAX_ORDER.

2. The successful rate of allocating gigantic pages. Exiting anti-fragmentation mechanism works at pageblock level, which is 2MB on x86_64. What could be done to provide some guarantee on gigantic page allocation without being hurt by unmoveable page fragmentation? Increasing pageblock size, additional memory zone/region for gigantic pages, or something else.

3. How to expose 1GB PUD THP to user space. Allocating 1GB THP all the time at page fault is unrealistic and can waste a lot of memory and take a lot of page fault handling time. Would additional MADV_ flags to specify the THP page size be a good choice? Or do we want to introduce an additional API to ask kernel to create gigantic pages per user request[4]?

4. Code deduplication for THP handling and page table handling. When adding 1GB THP support, I needed to mechanically replicate PMD THP code for PUD THP, so I am thinking about possible code deduplication. One thing I did is to have a common split_huge_page_to_list_to_order() for both split_huge_page() and split_huge_pud_page()[5] for THP handling. On the other hand, I am also thinking about reviving Kirill’s idea[6] to consolidate page table manipulation API using page table level numbers like level=1,2,3,… instead of PTE, PMD, PUD, and so on.

There might be other THP-specific topics like how to handling PMD mappings to a 1GB PUD THP in addition to existing PTE mappings to a 2MB PMD THP, but I think we have plenty to discuss already and we can continue if we have time.


[1] https://lore.kernel.org/linux-mm/20200902180628.4052244-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/20200928175428.4110504-1-zi.yan@sent.com/
[3] https://lore.kernel.org/linux-mm/20210224223536.803765-1-zi.yan@sent.com/
[4] https://lore.kernel.org/linux-mm/20200907072014.GD30144@dhcp22.suse.cz/
[5] https://lore.kernel.org/linux-mm/20201119160605.1272425-1-zi.yan@sent.com/
[6] https://lore.kernel.org/linux-mm/20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com/

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

             reply	other threads:[~2021-05-11 21:18 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-11 21:18 Zi Yan [this message]
2021-05-11 21:36 ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FBF3E7A8-AAD2-4CD2-B939-1574F761A99E@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --subject='Re: [LSF/MM/BPF TOPIC] 1GB PUD THP support (gigantic page allocation, increasing MAX_ORDER, anti-fragmentation and more)' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).