All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Zi Yan <ziy@nvidia.com>
Cc: <linux-mm@kvack.org>, Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yang Shi <shy828301@gmail.com>, Michal Hocko <mhocko@suse.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	David Nellans <dnellans@nvidia.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	David Rientjes <rientjes@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Song Liu <songliubraving@fb.com>
Subject: Re: [RFC PATCH v3 00/49] 1GB PUD THP support on x86_64
Date: Mon, 1 Mar 2021 17:59:17 -0800	[thread overview]
Message-ID: <YD2b9Zt3ETKCpSFd@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210224223536.803765-1-zi.yan@sent.com>

On Wed, Feb 24, 2021 at 05:35:36PM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Hi all,
> 
> I have rebased my 1GB PUD THP support patches on v5.11-mmotm-2021-02-18-18-29
> and the code is available at
> https://github.com/x-y-z/linux-1gb-thp/tree/1gb_thp_v5.11-mmotm-2021-02-18-18-29
> if you want to give it a try. The actual 49 patches are not sent out with this
> cover letter. :)
> 
> Instead of asking for code review, I would like to discuss on the concerns I got
> from previous RFCs. I think there are two major ones:
> 
> 1. 1GB page allocation. Current implementation allocates 1GB pages from CMA
>    regions that are reserved at boot time like hugetlbfs. The concerns on
>    using CMA is that an educated guess is needed to avoid depleting kernel
>    memory in case CMA regions are set too large. Recently David Rientjes
>    proposes to use process_madvise() for hugepage collapse, which is an
>    alternative [1] but might not work for 1GB pages, since there is no way of
>    _allocating_ a 1GB page to which collapse pages. I proposed a similar
>    approach at LSF/MM 2019, generating physically contiguous memory after pages
>    are allocated [2], which is usable for 1GB THPs. This approach does in-place
>    huge page promotion thus does not require page allocation.

Well, I don't think there an alternative to cma as now. When the memory is almost
filled at least once, any subsequent activity leading to substantial slab allocations
(e.g. run git gc) will fragment the memory, so that there are virtually no chances
to find a continuous GB.

It's possible in theory to reduce the fragmentation on 1GB scale by grouping
non-movable pageblocks, but it seems a separate project.

Thanks!

> 
> 2. Large amount of new code to review. I find most of added code is just a
>    simply copy paste from existing PMD THP code. I have tried to reduce
>    the new code size by reusing some existing code [3], but did not find a good
>    way of reusing PMD handling code for PUD, which is the major part of this
>    patchset. I am all ears if you have any idea on how to reduce new code size
>    or make code review easier.
> 
> 
> Any comment or suggestion is welcome. Thanks.
> 
> [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/
> [2] https://lwn.net/Articles/779979/ 
> [3] https://lwn.net/Articles/837928/ 
> 
> 
> Roman Gushchin (2):
>   mm: cma: introduce cma_release_nowait()
>   mm: hugetlb: don't drop hugetlb_lock around cma_release() call
> 
> Zi Yan (47):
>   mm: memcg: make memcg huge page split support any order split.
>   mm: page_owner: add support for splitting to any order in split
>     page_owner.
>   mm: thp: add support for split huge page to any lower order pages.
>   mm: thp: use single linked list for THP page table page deposit.
>   mm: add new helper functions to allocate one PMD page with
>     HPAGE_PMD_NR PTE pages.
>   mm: thp: add page table deposit/withdraw functions for PUD THP.
>   mm: change thp_order and thp_nr as we will have not just PMD THPs.
>   mm: thp: add anonymous PUD THP page fault support without enabling it.
>   mm: thp: add PUD THP support for copy_huge_pud.
>   mm: thp: add PUD THP support to zap_huge_pud.
>   fs: proc: add PUD THP kpageflag.
>   mm: thp: handling PUD THP reference bit.
>   mm: rmap: add mappped/unmapped page order to anonymous page rmap
>     functions.
>   mm: rmap: add map_order to page_remove_anon_compound_rmap.
>   mm: add pud manipulation functions.
>   mm: thp: add PUDDoubleMap page flag for PUD- and PMD-mapped pages.
>   mm: thp: add pmd_compound_mapcount for PMD mappings in PUD THPs.
>   mm: thp: add split_huge_pud() function to split PUD entries.
>   mm: thp: handle PMD-mapped PUD THP in split_huge_pmd functions.
>   mm: thp: adjust page map counting functions for PMD- and PTE-mapped
>     PUD THPs.
>   mm: thp: new ttu_flags to split huge pud during try_to_unmap.
>   mm: thp: add new checks for zap_huge_pmd.
>   mm: thp: add pud split events.
>   mm: thp: split pud when adjusting vma ranges.
>   mm: thp: handle PUD THP properly at page allocation and deallocation.
>   mm: rmap: handle PUD-, PMD- and PTE-mapped PUD THP properly in rmap.
>   mm: page_walk: handle PUD after pud entry split.
>   mm: thp: use split_huge_page_to_order_to_list for split huge pud page.
>   mm: thp: add PUD THP to deferred split list when PUD mapping is gone.
>   mm: debug: adapt dump_page to PUD THP.
>   mm: thp: PUD THP COW splits PUD page and falls back to PMD page.
>   mm: thp: PUD THP follow_p*d_page() support.
>   mm: stats: make smap stats understand PUD THPs.
>   mm: page_vma_walk: teach it about PMD-mapped PUD THP.
>   mm: thp: PUD THP support in try_to_unmap().
>   mm: thp: split PUD THPs at page reclaim.
>   mm: support PUD THP pagemap support.
>   mm: madvise: add page size options to MADV_HUGEPAGE and
>     MADV_NOHUGEPAGE.
>   mm: vma: add VM_HUGEPAGE_PUD to vm_flags at bit 37.
>   mm: thp: add a global knob to enable/disable PUD THPs.
>   mm: thp: make PUD THP size public.
>   hugetlb: cma: move cma reserve function to cma.c.
>   mm: thp: use cma reservation for pud thp allocation.
>   mm: thp: enable anonymous PUD THP at page fault path.
>   mm: cma: only clear bitmap no freeing pages.
>   mm: thp: clear cma bitmap during PUD THP split.
>   mm: migrate: split PUD THP if it is going to be migrated.
> 
>  .../admin-guide/kernel-parameters.txt         |   2 +-
>  Documentation/admin-guide/mm/transhuge.rst    |   1 +
>  arch/arm64/mm/hugetlbpage.c                   |   2 +-
>  arch/powerpc/mm/hugetlbpage.c                 |   2 +-
>  arch/x86/include/asm/pgalloc.h                |  69 ++
>  arch/x86/include/asm/pgtable.h                |  26 +
>  arch/x86/kernel/setup.c                       |   8 +-
>  arch/x86/mm/pgtable.c                         |  38 +
>  drivers/base/node.c                           |   2 +
>  fs/proc/meminfo.c                             |   2 +
>  fs/proc/page.c                                |   2 +
>  fs/proc/task_mmu.c                            | 126 ++-
>  include/linux/cma.h                           |  20 +
>  include/linux/huge_mm.h                       |  92 ++-
>  include/linux/hugetlb.h                       |  12 -
>  include/linux/llist.h                         |  11 +
>  include/linux/memcontrol.h                    |   5 +-
>  include/linux/mm.h                            |  53 +-
>  include/linux/mm_types.h                      |  13 +-
>  include/linux/mmu_notifier.h                  |  13 +
>  include/linux/mmzone.h                        |   1 +
>  include/linux/page-flags.h                    |  25 +
>  include/linux/page_owner.h                    |  10 +-
>  include/linux/pgtable.h                       |  34 +
>  include/linux/rmap.h                          |  10 +-
>  include/linux/vm_event_item.h                 |   7 +
>  include/uapi/asm-generic/mman-common.h        |  23 +
>  include/uapi/linux/kernel-page-flags.h        |   1 +
>  kernel/events/uprobes.c                       |   4 +-
>  kernel/fork.c                                 |  10 +-
>  mm/cma.c                                      | 226 ++++++
>  mm/cma.h                                      |   5 +
>  mm/debug.c                                    |   6 +-
>  mm/gup.c                                      |  60 +-
>  mm/huge_memory.c                              | 748 ++++++++++++++++--
>  mm/hugetlb.c                                  | 126 +--
>  mm/khugepaged.c                               |  16 +-
>  mm/ksm.c                                      |   4 +-
>  mm/madvise.c                                  |  17 +-
>  mm/memcontrol.c                               |   6 +-
>  mm/memory.c                                   |  28 +-
>  mm/mempolicy.c                                |  14 +-
>  mm/migrate.c                                  |  16 +-
>  mm/page_alloc.c                               |  55 +-
>  mm/page_owner.c                               |  13 +-
>  mm/page_vma_mapped.c                          | 171 +++-
>  mm/pagewalk.c                                 |   6 +-
>  mm/pgtable-generic.c                          |  49 +-
>  mm/rmap.c                                     | 297 +++++--
>  mm/swap_slots.c                               |   2 +
>  mm/swapfile.c                                 |  11 +-
>  mm/userfaultfd.c                              |   2 +-
>  mm/util.c                                     |  18 +-
>  mm/vmscan.c                                   |  33 +-
>  mm/vmstat.c                                   |   8 +
>  55 files changed, 2160 insertions(+), 401 deletions(-)
> 
> -- 
> 2.30.0
> 


  parent reply	other threads:[~2021-03-02  1:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24 22:35 [RFC PATCH v3 00/49] 1GB PUD THP support on x86_64 Zi Yan
2021-02-25 11:02 ` David Hildenbrand
2021-02-25 22:13   ` Zi Yan
2021-03-02  8:55     ` David Hildenbrand
2021-03-03 23:42       ` Zi Yan
2021-03-04  9:26         ` David Hildenbrand
2021-03-02  1:59 ` Roman Gushchin [this message]
2021-03-04 16:26   ` Zi Yan
2021-03-04 16:45     ` Roman Gushchin
2021-03-30 17:24       ` Zi Yan
2021-03-30 18:02         ` Roman Gushchin
2021-03-31  2:04           ` Zi Yan
2021-03-31  3:09           ` Matthew Wilcox
2021-03-31  3:32             ` Roman Gushchin
2021-03-31 14:48               ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YD2b9Zt3ETKCpSFd@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=dnellans@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=rcampbell@nvidia.com \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=songliubraving@fb.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.