linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <zi.yan@sent.com>
To: linux-mm@kvack.org, Roman Gushchin <guro@fb.com>
Cc: Rik van Riel <riel@surriel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Shakeel Butt <shakeelb@google.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	David Nellans <dnellans@nvidia.com>,
	linux-kernel@vger.kernel.org, Zi Yan <ziy@nvidia.com>
Subject: [RFC PATCH 00/16] 1GB THP support on x86_64
Date: Wed,  2 Sep 2020 14:06:12 -0400	[thread overview]
Message-ID: <20200902180628.4052244-1-zi.yan@sent.com> (raw)

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset adds support for 1GB THP on x86_64. It is on top of
v5.9-rc2-mmots-2020-08-25-21-13.

1GB THP is more flexible for reducing translation overhead and increasing the
performance of applications with large memory footprint without application
changes compared to hugetlb.

Design
=======

1GB THP implementation looks similar to exiting THP code except some new designs
for the additional page table level.

1. Page table deposit and withdraw using a new pagechain data structure:
   instead of one PTE page table page, 1GB THP requires 513 page table pages
   (one PMD page table page and 512 PTE page table pages) to be deposited
   at the page allocaiton time, so that we can split the page later. Currently,
   the page table deposit is using ->lru, thus only one page can be deposited.
   A new pagechain data structure is added to enable multi-page deposit.

2. Triple mapped 1GB THP : 1GB THP can be mapped by a combination of PUD, PMD,
   and PTE entries. Mixing PUD an PTE mapping can be achieved with existing
   PageDoubleMap mechanism. To add PMD mapping, PMDPageInPUD and
   sub_compound_mapcount are introduced. PMDPageInPUD is the 512-aligned base
   page in a 1GB THP and sub_compound_mapcount counts the PMD mapping by using
   page[N*512 + 3].compound_mapcount.

3. Using CMA allocaiton for 1GB THP: instead of bump MAX_ORDER, it is more sane
   to use something less intrusive. So all 1GB THPs are allocated from reserved
   CMA areas shared with hugetlb. At page splitting time, the bitmap for the 1GB
   THP is cleared as the resulting pages can be freed via normal page free path.
   We can fall back to alloc_contig_pages for 1GB THP if necessary.


Patch Organization
=======

Patch 01 adds the new pagechain data structure.

Patch 02 to 13 adds 1GB THP support in variable places.

Patch 14 tries to use alloc_contig_pages for 1GB THP allocaiton.

Patch 15 moves hugetlb_cma reservation to cma.c and rename it to hugepage_cma.

Patch 16 use hugepage_cma reservation for 1GB THP allocation.


Any suggestions and comments are welcome.


Zi Yan (16):
  mm: add pagechain container for storing multiple pages.
  mm: thp: 1GB anonymous page implementation.
  mm: proc: add 1GB THP kpageflag.
  mm: thp: 1GB THP copy on write implementation.
  mm: thp: handling 1GB THP reference bit.
  mm: thp: add 1GB THP split_huge_pud_page() function.
  mm: stats: make smap stats understand PUD THPs.
  mm: page_vma_walk: teach it about PMD-mapped PUD THP.
  mm: thp: 1GB THP support in try_to_unmap().
  mm: thp: split 1GB THPs at page reclaim.
  mm: thp: 1GB THP follow_p*d_page() support.
  mm: support 1GB THP pagemap support.
  mm: thp: add a knob to enable/disable 1GB THPs.
  mm: page_alloc: >=MAX_ORDER pages allocation an deallocation.
  hugetlb: cma: move cma reserve function to cma.c.
  mm: thp: use cma reservation for pud thp allocation.

 .../admin-guide/kernel-parameters.txt         |   2 +-
 arch/arm64/mm/hugetlbpage.c                   |   2 +-
 arch/powerpc/mm/hugetlbpage.c                 |   2 +-
 arch/x86/include/asm/pgalloc.h                |  68 ++
 arch/x86/include/asm/pgtable.h                |  26 +
 arch/x86/kernel/setup.c                       |   8 +-
 arch/x86/mm/pgtable.c                         |  38 +
 drivers/base/node.c                           |   3 +
 fs/proc/meminfo.c                             |   2 +
 fs/proc/page.c                                |   2 +
 fs/proc/task_mmu.c                            | 122 ++-
 include/linux/cma.h                           |  18 +
 include/linux/huge_mm.h                       |  84 +-
 include/linux/hugetlb.h                       |  12 -
 include/linux/memcontrol.h                    |   5 +
 include/linux/mm.h                            |  29 +-
 include/linux/mm_types.h                      |   1 +
 include/linux/mmu_notifier.h                  |  13 +
 include/linux/mmzone.h                        |   1 +
 include/linux/page-flags.h                    |  47 +
 include/linux/pagechain.h                     |  73 ++
 include/linux/pgtable.h                       |  34 +
 include/linux/rmap.h                          |  10 +-
 include/linux/swap.h                          |   2 +
 include/linux/vm_event_item.h                 |   7 +
 include/uapi/linux/kernel-page-flags.h        |   2 +
 kernel/events/uprobes.c                       |   4 +-
 kernel/fork.c                                 |   5 +
 mm/cma.c                                      | 119 +++
 mm/gup.c                                      |  60 +-
 mm/huge_memory.c                              | 939 +++++++++++++++++-
 mm/hugetlb.c                                  | 114 +--
 mm/internal.h                                 |   2 +
 mm/khugepaged.c                               |   6 +-
 mm/ksm.c                                      |   4 +-
 mm/memcontrol.c                               |  13 +
 mm/memory.c                                   |  51 +-
 mm/mempolicy.c                                |  21 +-
 mm/migrate.c                                  |  12 +-
 mm/page_alloc.c                               |  57 +-
 mm/page_vma_mapped.c                          | 129 ++-
 mm/pgtable-generic.c                          |  56 ++
 mm/rmap.c                                     | 289 ++++--
 mm/swap.c                                     |  31 +
 mm/swap_slots.c                               |   2 +
 mm/swapfile.c                                 |   8 +-
 mm/userfaultfd.c                              |   2 +-
 mm/util.c                                     |  16 +-
 mm/vmscan.c                                   |  58 +-
 mm/vmstat.c                                   |   8 +
 50 files changed, 2270 insertions(+), 349 deletions(-)
 create mode 100644 include/linux/pagechain.h

--
2.28.0



             reply	other threads:[~2020-09-02 18:06 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-02 18:06 Zi Yan [this message]
2020-09-02 18:06 ` [RFC PATCH 01/16] mm: add pagechain container for storing multiple pages Zi Yan
2020-09-02 20:29   ` Randy Dunlap
2020-09-02 20:48     ` Zi Yan
2020-09-03  3:15   ` Matthew Wilcox
2020-09-07 12:22   ` Kirill A. Shutemov
2020-09-07 15:11     ` Zi Yan
2020-09-09 13:46       ` Kirill A. Shutemov
2020-09-09 14:15         ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 02/16] mm: thp: 1GB anonymous page implementation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 03/16] mm: proc: add 1GB THP kpageflag Zi Yan
2020-09-09 13:46   ` Kirill A. Shutemov
2020-09-02 18:06 ` [RFC PATCH 04/16] mm: thp: 1GB THP copy on write implementation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 05/16] mm: thp: handling 1GB THP reference bit Zi Yan
2020-09-09 14:09   ` Kirill A. Shutemov
2020-09-09 14:36     ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 06/16] mm: thp: add 1GB THP split_huge_pud_page() function Zi Yan
2020-09-09 14:18   ` Kirill A. Shutemov
2020-09-09 14:19     ` Zi Yan
2020-09-02 18:06 ` [RFC PATCH 07/16] mm: stats: make smap stats understand PUD THPs Zi Yan
2020-09-02 18:06 ` [RFC PATCH 08/16] mm: page_vma_walk: teach it about PMD-mapped PUD THP Zi Yan
2020-09-02 18:06 ` [RFC PATCH 09/16] mm: thp: 1GB THP support in try_to_unmap() Zi Yan
2020-09-02 18:06 ` [RFC PATCH 10/16] mm: thp: split 1GB THPs at page reclaim Zi Yan
2020-09-02 18:06 ` [RFC PATCH 11/16] mm: thp: 1GB THP follow_p*d_page() support Zi Yan
2020-09-02 18:06 ` [RFC PATCH 12/16] mm: support 1GB THP pagemap support Zi Yan
2020-09-02 18:06 ` [RFC PATCH 13/16] mm: thp: add a knob to enable/disable 1GB THPs Zi Yan
2020-09-02 18:06 ` [RFC PATCH 14/16] mm: page_alloc: >=MAX_ORDER pages allocation an deallocation Zi Yan
2020-09-02 18:06 ` [RFC PATCH 15/16] hugetlb: cma: move cma reserve function to cma.c Zi Yan
2020-09-02 18:06 ` [RFC PATCH 16/16] mm: thp: use cma reservation for pud thp allocation Zi Yan
2020-09-02 18:40 ` [RFC PATCH 00/16] 1GB THP support on x86_64 Jason Gunthorpe
2020-09-02 18:45   ` Zi Yan
2020-09-02 18:48     ` Jason Gunthorpe
2020-09-02 19:05       ` Zi Yan
2020-09-02 19:57         ` Jason Gunthorpe
2020-09-02 20:29           ` Zi Yan
2020-09-03 16:40             ` Jason Gunthorpe
2020-09-03 16:55               ` Matthew Wilcox
2020-09-03 17:08                 ` Jason Gunthorpe
2020-09-03  7:32 ` Michal Hocko
2020-09-03 16:25   ` Roman Gushchin
2020-09-03 16:50     ` Jason Gunthorpe
2020-09-03 17:01       ` Matthew Wilcox
2020-09-03 17:18         ` Jason Gunthorpe
2020-09-03 20:57     ` Mike Kravetz
2020-09-03 21:06       ` Roman Gushchin
2020-09-04  7:42     ` Michal Hocko
2020-09-04 21:10       ` Roman Gushchin
2020-09-07  7:20         ` Michal Hocko
2020-09-08 15:09           ` Zi Yan
2020-09-08 19:58             ` Roman Gushchin
2020-09-09  4:01               ` John Hubbard
2020-09-09  7:15               ` Michal Hocko
2020-09-03 14:23 ` Kirill A. Shutemov
2020-09-03 16:30   ` Roman Gushchin
2020-09-08 11:57     ` David Hildenbrand
2020-09-08 14:05       ` Zi Yan
2020-09-08 14:22         ` David Hildenbrand
2020-09-08 15:36           ` Zi Yan
2020-09-08 14:27         ` Matthew Wilcox
2020-09-08 15:50           ` Zi Yan
2020-09-09 12:11           ` Jason Gunthorpe
2020-09-09 12:32             ` Matthew Wilcox
2020-09-09 13:14               ` Jason Gunthorpe
2020-09-09 13:27                 ` David Hildenbrand
2020-09-10 10:02                   ` William Kucharski
2020-09-08 14:35         ` Michal Hocko
2020-09-08 14:41           ` Rik van Riel
2020-09-08 15:02             ` David Hildenbrand
2020-09-09  7:04             ` Michal Hocko
2020-09-09 13:19               ` Rik van Riel
2020-09-09 13:43                 ` David Hildenbrand
2020-09-09 13:49                   ` Rik van Riel
2020-09-09 13:54                     ` David Hildenbrand
2020-09-10  7:32                   ` Michal Hocko
2020-09-10  8:27                     ` David Hildenbrand
2020-09-10 14:21                       ` Zi Yan
2020-09-10 14:34                         ` David Hildenbrand
2020-09-10 14:41                           ` Zi Yan
2020-09-10 15:15                             ` David Hildenbrand
2020-09-10 13:32                     ` Rik van Riel
2020-09-10 14:30                       ` Zi Yan
2020-09-09 13:59                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200902180628.4052244-1-zi.yan@sent.com \
    --to=zi.yan@sent.com \
    --cc=dnellans@nvidia.com \
    --cc=guro@fb.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    --subject='Re: [RFC PATCH 00/16] 1GB THP support on x86_64' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).