linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/14] mm: userspace hugepage collapse
@ 2022-03-08 21:34 Zach O'Keefe
  2022-03-08 21:34 ` [RFC PATCH 01/14] mm/rmap: add mm_find_pmd_raw helper Zach O'Keefe
                   ` (16 more replies)
  0 siblings, 17 replies; 57+ messages in thread
From: Zach O'Keefe @ 2022-03-08 21:34 UTC (permalink / raw)
  To: Alex Shi, David Hildenbrand, David Rientjes, Michal Hocko,
	Pasha Tatashin, SeongJae Park, Song Liu, Vlastimil Babka, Zi Yan,
	linux-mm
  Cc: Andrea Arcangeli, Andrew Morton, Arnd Bergmann, Axel Rasmussen,
	Chris Kennelly, Chris Zankel, Helge Deller, Hugh Dickins,
	Ivan Kokshaysky, James E.J. Bottomley, Jens Axboe,
	Kirill A. Shutemov, Matthew Wilcox, Matt Turner, Max Filippov,
	Miaohe Lin, Minchan Kim, Patrick Xia, Pavel Begunkov, Peter Xu,
	Richard Henderson, Thomas Bogendoerfer, Yang Shi,
	Zach O'Keefe

Introduction
--------------------------------

This series provides a mechanism for userspace to induce a collapse of
eligible ranges of memory into transparent hugepages in process context,
thus permitting users to more tightly control their own hugepage
utilization policy at their own expense.

This idea was previously introduced by David Rientjes, and thanks to
everyone for your patience while I prepared these patches resulting from
that discussion[1].

[1] https://lore.kernel.org/all/C8C89F13-3F04-456B-BA76-DE2C378D30BF@nvidia.com/

Interface
--------------------------------

The proposed interface adds a new madvise(2) mode, MADV_COLLAPSE, and
leverages the new process_madvise(2) call.

(*) process_madvise(2)

        Performs a synchronous collapse of the native pages mapped by
        the list of iovecs into transparent hugepages. The default gfp
        flags used will be the same as those used at-fault for the VMA
        region(s) covered. When multiple VMA regions are spanned, if
        faulting-in memory from any VMA would permit synchronous
        compaction and reclaim, then all hugepage allocations required
        to satisfy the request may enter compaction and reclaim.
        Diverging from the at-fault semantics, VM_NOHUGEPAGE is ignored
        by default, as the user is explicitly requesting this action.
        Define two flags to control collapse semantics, passed through
        process_madvise(2)’s optional flags parameter:

        MADV_F_COLLAPSE_LIMITS

        If supplied, collapse respects pte collapse limits set via
        sysfs:
        /transparent_hugepage/khugepaged/max_ptes_[none|swap|shared].
        Required if calling on behalf of another process and not
        CAP_SYS_ADMIN.

        MADV_F_COLLAPSE_DEFRAG

        If supplied, permit synchronous compaction and reclaim,
        regardless of VMA flags.

(*) madvise(2)

        Equivalent to process_madvise(2) on self, with no flags
        passed; pte collapse limits are ignored, and the gfp flags will
        be the same as those used at-fault for the VMA region(s)
        covered. Note that, users wanting different collapse semantics
        can always use process_madvise(2) on themselves.

Discussion
--------------------------------

The mechanism is fully compatible with khugepaged, allowing userspace to
separately define synchronous and asynchronous hugepage policies, as
priority dictates. It also naturally permits a DAMON scheme,
DAMOS_COLLAPSE, to make efficient use of the available hugepages on the
system by backing the most frequently accessed memory by hugepages[2].
Though not required to justify this series, hugepage management could be
offloaded entirely to a sufficiently informed userspace agent,
supplanting the need for khugepaged in the kernel.

Along with the interface, this series proposes a batched implementation
to collapse a range of memory. The motivation for this is to limit
contention on mmap_lock, doing multiple page table modifications while
the lock is held exclusively.

Only private anonymous memory is supported by this series. File-backed
memory support will be added later.

Multiple hugepages support (such as 1 GiB gigantic hugepages) were not
considered at this time, but could be supported by the flags parameter
in the future.

kselftests were omitted from this series for brevity, but would be
included in an eventual patch submission.

[2] https://lore.kernel.org/lkml/bcc8d9a0-81d-5f34-5e4-fcc28eb7ce@google.com/T/

Sequence of Patches
--------------------------------

Patches 1-10 perform refactoring of collapse logic within khugepaged.c:
introducing the notion of a collapse context and isolating logic that
can be reused later in the series for the madvise collapse context.

Patches 11-14 introduce logic for the proposed madvise collapse
mechanism. Patch 11 adds madvise and header file plumbing. Patch 12 and
13, separately, add the core collapse logic, with the former introducing
the overall batched approach and locking strategy, and the latter
fills-in batch action details. This separation was purely to keep patch
size down. Patch 14 adds process_madvise support.

Applies against next-20220308.

Zach O'Keefe (14):
  mm/rmap: add mm_find_pmd_raw helper
  mm/khugepaged: add struct collapse_control
  mm/khugepaged: add __do_collapse_huge_page() helper
  mm/khugepaged: separate khugepaged_scan_pmd() scan and collapse
  mm/khugepaged: add mmap_assert_locked() checks to scan_pmd()
  mm/khugepaged: add hugepage_vma_revalidate_pmd_count()
  mm/khugepaged: add vm_flags_ignore to
    hugepage_vma_revalidate_pmd_count()
  mm/thp: add madv_thp_vm_flags to __transparent_hugepage_enabled()
  mm/khugepaged: record SCAN_PAGE_COMPOUND when scan_pmd() finds THP
  mm/khugepaged: rename khugepaged-specific/not functions
  mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse
  mm/madvise: introduce batched madvise(MADV_COLLPASE) collapse
  mm/madvise: add __madvise_collapse_*_batch() actions.
  mm/madvise: add process_madvise(MADV_COLLAPSE)

 fs/io_uring.c                          |   3 +-
 include/linux/huge_mm.h                |  27 +-
 include/linux/mm.h                     |   3 +-
 include/uapi/asm-generic/mman-common.h |  10 +
 mm/huge_memory.c                       |   2 +-
 mm/internal.h                          |   1 +
 mm/khugepaged.c                        | 937 ++++++++++++++++++++-----
 mm/madvise.c                           |  45 +-
 mm/memory.c                            |   6 +-
 mm/rmap.c                              |  15 +-
 10 files changed, 842 insertions(+), 207 deletions(-)

-- 
2.35.1.616.g0bdcbb4464-goog



^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2022-03-30  0:37 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08 21:34 [RFC PATCH 00/14] mm: userspace hugepage collapse Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 01/14] mm/rmap: add mm_find_pmd_raw helper Zach O'Keefe
2022-03-09 22:48   ` Yang Shi
2022-03-08 21:34 ` [RFC PATCH 02/14] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-03-09 22:53   ` Yang Shi
2022-03-08 21:34 ` [RFC PATCH 03/14] mm/khugepaged: add __do_collapse_huge_page() helper Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 04/14] mm/khugepaged: separate khugepaged_scan_pmd() scan and collapse Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 05/14] mm/khugepaged: add mmap_assert_locked() checks to scan_pmd() Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 06/14] mm/khugepaged: add hugepage_vma_revalidate_pmd_count() Zach O'Keefe
2022-03-09 23:15   ` Yang Shi
2022-03-08 21:34 ` [RFC PATCH 07/14] mm/khugepaged: add vm_flags_ignore to hugepage_vma_revalidate_pmd_count() Zach O'Keefe
2022-03-09 23:17   ` Yang Shi
2022-03-10  0:00     ` Zach O'Keefe
2022-03-10  0:41       ` Yang Shi
2022-03-10  1:09         ` Zach O'Keefe
2022-03-10  2:16           ` Yang Shi
2022-03-10 15:50             ` Zach O'Keefe
2022-03-10 18:17               ` Yang Shi
2022-03-10 18:46                 ` David Rientjes
2022-03-10 18:58                   ` Zach O'Keefe
2022-03-10 19:54                   ` Yang Shi
2022-03-10 20:24                     ` Zach O'Keefe
2022-03-10 18:53                 ` Zach O'Keefe
2022-03-10 15:56   ` David Hildenbrand
2022-03-10 18:39     ` Zach O'Keefe
2022-03-10 18:54     ` David Rientjes
2022-03-21 14:27       ` Michal Hocko
2022-03-08 21:34 ` [RFC PATCH 08/14] mm/thp: add madv_thp_vm_flags to __transparent_hugepage_enabled() Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 09/14] mm/khugepaged: record SCAN_PAGE_COMPOUND when scan_pmd() finds THP Zach O'Keefe
2022-03-09 23:40   ` Yang Shi
2022-03-10  0:46     ` Zach O'Keefe
2022-03-10  2:05       ` Yang Shi
2022-03-10  8:37         ` Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 10/14] mm/khugepaged: rename khugepaged-specific/not functions Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 11/14] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-03-09 23:43   ` Yang Shi
2022-03-10  1:11     ` Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 12/14] mm/madvise: introduce batched madvise(MADV_COLLPASE) collapse Zach O'Keefe
2022-03-10  0:06   ` Yang Shi
2022-03-10 19:26     ` David Rientjes
2022-03-10 20:16       ` Matthew Wilcox
2022-03-11  0:06         ` Zach O'Keefe
2022-03-25 16:51           ` Zach O'Keefe
2022-03-25 19:54             ` Yang Shi
2022-03-08 21:34 ` [RFC PATCH 13/14] mm/madvise: add __madvise_collapse_*_batch() actions Zach O'Keefe
2022-03-08 21:34 ` [RFC PATCH 14/14] mm/madvise: add process_madvise(MADV_COLLAPSE) Zach O'Keefe
2022-03-21 14:32 ` [RFC PATCH 00/14] mm: userspace hugepage collapse Zi Yan
2022-03-21 14:51   ` Zach O'Keefe
2022-03-21 14:37 ` Michal Hocko
2022-03-21 15:46   ` Zach O'Keefe
2022-03-22 12:11     ` Michal Hocko
2022-03-22 15:53       ` Zach O'Keefe
2022-03-29 12:24         ` Michal Hocko
2022-03-30  0:36           ` Zach O'Keefe
2022-03-22  6:40 ` Zach O'Keefe
2022-03-22 12:05   ` Michal Hocko
2022-03-23 13:30     ` Zach O'Keefe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).