All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/47] hugetlb: introduce HugeTLB high-granularity mapping
@ 2022-10-21 16:36 James Houghton
  2022-10-21 16:36 ` [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE James Houghton
                   ` (46 more replies)
  0 siblings, 47 replies; 122+ messages in thread
From: James Houghton @ 2022-10-21 16:36 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Peter Xu
  Cc: David Hildenbrand, David Rientjes, Axel Rasmussen, Mina Almasry,
	Zach O'Keefe, Manish Mishra, Naoya Horiguchi,
	Dr . David Alan Gilbert, Matthew Wilcox (Oracle),
	Vlastimil Babka, Baolin Wang, Miaohe Lin, Yang Shi,
	Andrew Morton, linux-mm, linux-kernel, James Houghton

This RFC v2 is a more complete and correct implementation of
the original high-granularity mapping RFC[1]. For HGM background and
motivation, please see the original RFC.

This series has changed quite significantly since its first version, so
I've dropped all the Reviewed-bys that it picked up, and I am not
including a full changelog here. Some notable changes:
  1. mapcount rules have been simplified (now: the number of times a
     hugepage is referenced in page tables, still tracked on the head
     page).
  2. Synchronizing page table collapsing is now done using the VMA lock
     that Mike introduced recently.
  3. PTE splitting is only supported for blank PTEs, and it is done
     without needing to hold the VMA lock for writing. In many places,
     we explicitly check if a PTE has been split from under us.
  4. The userspace API has changed slightly.

This series implements high-granularity mapping basics, enough to
support PAGE_SIZE-aligned UFFDIO_CONTINUE operations and MADV_COLLAPSE
for shared HugeTLB VMAs for x86. The main use case for this is post-copy
for virtual machines, one of the important HGM use cases described in
[1]. MADV_COLLAPSE was originally introduced for THPs[2], but it is
now meaningful for HGM, and so I am co-opting the same API.

- Userspace API

There are two main ways userspace interacts with high-granularity
mappings:
  1. Create them with UFFDIO_CONTINUE in an apporiately configured
     userfaultfd VMA.
  2. Collapse high-granularity mappings with MADV_COLLAPSE.

The userfaultfd bits of the userspace API have changed slightly since
RFC v1. To configure a userfaultfd VMA to enable HGM, userspace must
provide UFFD_FEATURE_MINOR_HUGETLBFS_HGM and UFFD_FEATURE_EXACT_ADDRESS
in its call to UFFDIO_API.

- A Note About KVM

Normally KVM (as well as any other non-HugeTLB code that assumes that
HugeTLB pages will always be mapped with huge PTEs) would need to be
enlightened to do the correct thing with high-granularity-mapped HugeTLB
pages. It turns out that the x86 TDP MMU already handles HGM mappings
correctly, but other architectures' KVM MMUs, like arm64's, will need to
be updated before HGM can be enabled for those architectures.

- How complete is this series?

I have tested this series with the self-tests that I have modified and
added, and I have run real, large end-to-end migration tests. This
series should be mostly stable, though I haven't tested DAMON and other
pieces that were slightly changed by this series.

There is a bug in the current x86 TDP MMU that prevents MADV_COLLAPSE
from having an effect. That is, the second-stage mappings will remain
small. This will be fixed with [3], so unless you have [3] merged in
your tree, you will see that MADV_COLLAPSE does not impact on virtual
machine performance.

- Future Work

The main areas of future work are:
  1) Support more architectures (arm64 support is mostly complete, but
     supporting it is not trivial, and to keep this RFC as short as
     possible, I will send the arm64 support series separately).
  2) Improve performance. Right now we take two per-hpage locks in the
     hotpath for userfaultfd-based post-copy live migration, the page
     lock and the fault mutex. To improve post-copy performance as much
     as possible, we likely need to improve this locking strategy.
  3) Support PAGE_SIZE poisoning of HugeTLB pages. To provide userspace
     with consistent poison behavior whether using MAP_PRIVATE or
     MAP_SHARED, more work is needed to implement basic HGM support for
     MAP_PRIVATE mappings.

- Patches

Patches 1-4:	Cleanup.
Patches 5-6:	Extend the HugeTLB shared VMA lock struct.
Patches 7-14:	Create hugetlb_pte and implement HGM basics (PT walking,
		enabling HGM).
Patches 15-30:	Make existing routines compatible with HGM.
Patches 31-35:	Extend userfaultfd to support high-granularity CONTINUEs.
Patch   36:	Add HugeTLB HGM support to MADV_COLLAPSE.
Patches 37-40:	Cleanup, add HGM stats, and enable HGM for x86.
Patches 41-47:	Documentation and selftests.

This series is based on mm-everything-2022-10-20-00-43.

Finally, I will be on vacation next week (until Nov 2, unfortunate
timing). I will try to respond before Nov 2; I wanted to get this series
up ASAP.

[1] https://lore.kernel.org/linux-mm/20220624173656.2033256-1-jthoughton@google.com/
[2] commit 7d8faaf155454 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
[3] https://lore.kernel.org/kvm/20220830235537.4004585-1-seanjc@google.com/

James Houghton (47):
  hugetlb: don't set PageUptodate for UFFDIO_CONTINUE
  hugetlb: remove mk_huge_pte; it is unused
  hugetlb: remove redundant pte_mkhuge in migration path
  hugetlb: only adjust address ranges when VMAs want PMD sharing
  hugetlb: make hugetlb_vma_lock_alloc return its failure reason
  hugetlb: extend vma lock for shared vmas
  hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
  hugetlb: add HGM enablement functions
  hugetlb: make huge_pte_lockptr take an explicit shift argument.
  hugetlb: add hugetlb_pte to track HugeTLB page table entries
  hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc
  hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step
  hugetlb: add make_huge_pte_with_shift
  hugetlb: make default arch_make_huge_pte understand small mappings
  hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped
  hugetlb: make unmapping compatible with high-granularity mappings
  hugetlb: make hugetlb_change_protection compatible with HGM
  hugetlb: enlighten follow_hugetlb_page to support HGM
  hugetlb: make hugetlb_follow_page_mask HGM-enabled
  hugetlb: use struct hugetlb_pte for walk_hugetlb_range
  mm: rmap: provide pte_order in page_vma_mapped_walk
  mm: rmap: make page_vma_mapped_walk callers use pte_order
  rmap: update hugetlb lock comment for HGM
  hugetlb: update page_vma_mapped to do high-granularity walks
  hugetlb: add HGM support for copy_hugetlb_page_range
  hugetlb: make move_hugetlb_page_tables compatible with HGM
  hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page
  rmap: in try_to_{migrate,unmap}_one, check head page for page flags
  hugetlb: add high-granularity migration support
  hugetlb: add high-granularity check for hwpoison in fault path
  hugetlb: sort hstates in hugetlb_init_hstates
  hugetlb: add for_each_hgm_shift
  userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM
  hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE
  userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM
  hugetlb: add MADV_COLLAPSE for hugetlb
  hugetlb: remove huge_pte_lock and huge_pte_lockptr
  hugetlb: replace make_huge_pte with make_huge_pte_with_shift
  mm: smaps: add stats for HugeTLB mapping size
  hugetlb: x86: enable high-granularity mapping
  docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM
    info
  docs: proc: include information about HugeTLB HGM
  selftests/vm: add HugeTLB HGM to userfaultfd selftest
  selftests/kvm: add HugeTLB HGM to KVM demand paging selftest
  selftests/vm: add anon and shared hugetlb to migration test
  selftests/vm: add hugetlb HGM test to migration selftest
  selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests

 Documentation/admin-guide/mm/hugetlbpage.rst  |    4 +
 Documentation/admin-guide/mm/userfaultfd.rst  |   16 +-
 Documentation/filesystems/proc.rst            |   56 +-
 arch/powerpc/mm/pgtable.c                     |    3 +-
 arch/s390/include/asm/hugetlb.h               |    5 -
 arch/s390/mm/gmap.c                           |   20 +-
 arch/x86/Kconfig                              |    1 +
 fs/Kconfig                                    |    7 +
 fs/hugetlbfs/inode.c                          |   27 +-
 fs/proc/task_mmu.c                            |  184 ++-
 fs/userfaultfd.c                              |   56 +-
 include/asm-generic/hugetlb.h                 |    5 -
 include/asm-generic/tlb.h                     |    6 +-
 include/linux/huge_mm.h                       |   12 +-
 include/linux/hugetlb.h                       |  173 ++-
 include/linux/pagewalk.h                      |   11 +-
 include/linux/rmap.h                          |    5 +
 include/linux/swapops.h                       |    8 +-
 include/linux/userfaultfd_k.h                 |    7 +
 include/uapi/linux/userfaultfd.h              |    2 +
 mm/damon/vaddr.c                              |   57 +-
 mm/debug_vm_pgtable.c                         |    2 +-
 mm/hmm.c                                      |   21 +-
 mm/hugetlb.c                                  | 1209 ++++++++++++++---
 mm/khugepaged.c                               |    4 +-
 mm/madvise.c                                  |   24 +-
 mm/memory-failure.c                           |   17 +-
 mm/mempolicy.c                                |   28 +-
 mm/migrate.c                                  |   20 +-
 mm/mincore.c                                  |   17 +-
 mm/mprotect.c                                 |   18 +-
 mm/page_vma_mapped.c                          |   60 +-
 mm/pagewalk.c                                 |   32 +-
 mm/rmap.c                                     |  102 +-
 mm/userfaultfd.c                              |   46 +-
 .../selftests/kvm/demand_paging_test.c        |   20 +-
 .../testing/selftests/kvm/include/test_util.h |    2 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |    2 +-
 tools/testing/selftests/kvm/lib/test_util.c   |   14 +
 tools/testing/selftests/vm/Makefile           |    1 +
 tools/testing/selftests/vm/hugetlb-hgm.c      |  326 +++++
 tools/testing/selftests/vm/migration.c        |  222 ++-
 tools/testing/selftests/vm/userfaultfd.c      |   90 +-
 43 files changed, 2449 insertions(+), 493 deletions(-)
 create mode 100644 tools/testing/selftests/vm/hugetlb-hgm.c

-- 
2.38.0.135.g90850a2211-goog


^ permalink raw reply	[flat|nested] 122+ messages in thread

end of thread, other threads:[~2023-01-05  1:23 UTC | newest]

Thread overview: 122+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21 16:36 [RFC PATCH v2 00/47] hugetlb: introduce HugeTLB high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE James Houghton
2022-11-16 16:30   ` Peter Xu
2022-11-21 18:33     ` James Houghton
2022-12-08 22:55       ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 02/47] hugetlb: remove mk_huge_pte; it is unused James Houghton
2022-11-16 16:35   ` Peter Xu
2022-12-07 23:13   ` Mina Almasry
2022-12-08 23:42   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 03/47] hugetlb: remove redundant pte_mkhuge in migration path James Houghton
2022-11-16 16:36   ` Peter Xu
2022-12-07 23:16   ` Mina Almasry
2022-12-09  0:10   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 04/47] hugetlb: only adjust address ranges when VMAs want PMD sharing James Houghton
2022-11-16 16:50   ` Peter Xu
2022-12-09  0:22   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason James Houghton
2022-11-16 17:08   ` Peter Xu
2022-11-21 18:11     ` James Houghton
2022-12-07 23:33   ` Mina Almasry
2022-12-09 22:36   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 06/47] hugetlb: extend vma lock for shared vmas James Houghton
2022-11-30 21:01   ` Peter Xu
2022-11-30 23:29     ` James Houghton
2022-12-09 22:48     ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 07/47] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-12-09 22:52   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions James Houghton
2022-11-16 17:19   ` Peter Xu
2022-12-08  0:26   ` Mina Almasry
2022-12-09 15:41     ` James Houghton
2022-12-13  0:13   ` Mike Kravetz
2022-12-13 15:49     ` James Houghton
2022-12-15 17:51       ` Mike Kravetz
2022-12-15 18:08         ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 09/47] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-12-08  0:30   ` Mina Almasry
2022-12-13  0:25   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-11-16 22:17   ` Peter Xu
2022-11-17  1:00     ` James Houghton
2022-11-17 16:27       ` Peter Xu
2022-12-08  0:46   ` Mina Almasry
2022-12-09 16:02     ` James Houghton
2022-12-13 18:44       ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc James Houghton
2022-12-13 19:32   ` Mike Kravetz
2022-12-13 20:18     ` James Houghton
2022-12-14  0:04       ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step James Houghton
2022-11-16 22:02   ` Peter Xu
2022-11-17  1:39     ` James Houghton
2022-12-14  0:47   ` Mike Kravetz
2023-01-05  0:57   ` Jane Chu
2023-01-05  1:12     ` Jane Chu
2023-01-05  1:23     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 13/47] hugetlb: add make_huge_pte_with_shift James Houghton
2022-12-14  1:08   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 14/47] hugetlb: make default arch_make_huge_pte understand small mappings James Houghton
2022-12-14 22:17   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 15/47] hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped James Houghton
2022-12-14 23:37   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 16/47] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-12-15  0:28   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 17/47] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-12-15 18:15   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 18/47] hugetlb: enlighten follow_hugetlb_page to support HGM James Houghton
2022-12-15 19:29   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 19/47] hugetlb: make hugetlb_follow_page_mask HGM-enabled James Houghton
2022-12-16  0:25   ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 20/47] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 21/47] mm: rmap: provide pte_order in page_vma_mapped_walk James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 22/47] mm: rmap: make page_vma_mapped_walk callers use pte_order James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 23/47] rmap: update hugetlb lock comment for HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 24/47] hugetlb: update page_vma_mapped to do high-granularity walks James Houghton
2022-12-15 17:49   ` James Houghton
2022-12-15 18:45     ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 25/47] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-11-30 21:32   ` Peter Xu
2022-11-30 23:18     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 26/47] hugetlb: make move_hugetlb_page_tables compatible with HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 27/47] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 28/47] rmap: in try_to_{migrate,unmap}_one, check head page for page flags James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 29/47] hugetlb: add high-granularity migration support James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 30/47] hugetlb: add high-granularity check for hwpoison in fault path James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 31/47] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 32/47] hugetlb: add for_each_hgm_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-11-16 22:28   ` Peter Xu
2022-11-16 23:30     ` James Houghton
2022-12-21 19:23       ` Peter Xu
2022-12-21 20:21         ` James Houghton
2022-12-21 21:39           ` Mike Kravetz
2022-12-21 22:10             ` Peter Xu
2022-12-21 22:31               ` Mike Kravetz
2022-12-22  0:02                 ` James Houghton
2022-12-22  0:38                   ` Mike Kravetz
2022-12-22  1:24                     ` James Houghton
2022-12-22 14:30                       ` Peter Xu
2022-12-27 17:02                         ` James Houghton
2023-01-03 17:06                           ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 34/47] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-11-17 16:58   ` Peter Xu
2022-12-23 18:38   ` Peter Xu
2022-12-27 16:38     ` James Houghton
2023-01-03 17:09       ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 35/47] userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM James Houghton
2022-12-22 21:47   ` Peter Xu
2022-12-27 16:39     ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 36/47] hugetlb: add MADV_COLLAPSE for hugetlb James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 37/47] hugetlb: remove huge_pte_lock and huge_pte_lockptr James Houghton
2022-11-16 20:16   ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 38/47] hugetlb: replace make_huge_pte with make_huge_pte_with_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 39/47] mm: smaps: add stats for HugeTLB mapping size James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 40/47] hugetlb: x86: enable high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 41/47] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 42/47] docs: proc: include information about HugeTLB HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 43/47] selftests/vm: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 44/47] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 45/47] selftests/vm: add anon and shared hugetlb to migration test James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 46/47] selftests/vm: add hugetlb HGM test to migration selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 47/47] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests James Houghton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.