linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Hugh Dickins <hughd@google.com>, Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Michal Hocko <mhocko@kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org, Roman Gushchin <guro@fb.com>
Subject: Re: [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter.
Date: Sun, 8 Aug 2021 21:04:36 -0700 (PDT)	[thread overview]
Message-ID: <6aaa9e7c-5ac4-a490-46f1-31d71edf72cf@google.com> (raw)
In-Reply-To: <13DF8783-289F-4ED7-AC13-E60DF7CD0710@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 3857 bytes --]

On Fri, 6 Aug 2021, Zi Yan wrote:
> On 6 Aug 2021, at 16:27, Hugh Dickins wrote:
> > On Fri, 6 Aug 2021, Zi Yan wrote:
> >>
> >> In addition, I would like to share more detail on my plan on supporting 1GB PUD THP.
> >> This patchset is the first step, enabling kernel to allocate 1GB pages, so that
> >> user can get 1GB THPs from ZONE_NORMAL and ZONE_MOVABLE without using
> >> alloc_contig_pages() or CMA allocator. The next step is to improve kernel memory
> >> fragmentation handling for pages up to MAX_ORDER, since currently pageblock size
> >> is still limited by memory section size. As a result, I will explore solutions
> >> like having additional larger pageblocks (up to MAX_ORDER) to counter memory
> >> fragmentation. I will discover what else needs to be solved as I gradually improve
> >> 1GB PUD THP support.
> >
> > Sorry to be blunt, but let me state my opinion: 2MB THPs have given and
> > continue to give us more than enough trouble.  Complicating the kernel's
> > mm further, just to allow 1GB THPs, seems a very bad tradeoff to me.  I
> > understand that it's an appealing personal project; but for the sake of
> > of all the rest of us, please leave 1GB huge pages to hugetlbfs (until
> > the day when we are all using 2MB base pages).
> 
> I do not agree with you. 2MB THP provides good performance, while letting us
> keep using 4KB base pages. The 2MB THP implementation is the price we pay
> to get the performance. This patchset removes the tie between MAX_ORDER
> and section size to allow >2MB page allocation, which is useful in many
> places. 1GB THP is one of the users. Gigantic pages also improve
> device performance, like GPUs (e.g., AMD GPUs can use any power of two up to
> 1GB pages[1], which I just learnt). Also could you point out which part
> of my patchset complicates kernel’s mm? I could try to simplify it if
> possible.
> 
> In addition, I am not sure hugetlbfs is the way to go. THP is managed by
> core mm, whereas hugetlbfs has its own code for memory management.
> As hugetlbfs gets popular, more core mm functionalities have been
> replicated and added to hugetlbfs codebase. It is not a good tradeoff
> either. One of the reasons I work on 1GB THP is that Roman from Facebook
> explicitly mentioned they want to use THP in place of hugetlbfs[2].
> 
> I think it might be more constructive to point out the existing issues
> in THP so that we can improve the code together. BTW, I am also working
> on simplifying THP code like generalizing THP split[3] and planning to
> simplify page table manipulation code by reviving Kirill’s idea[4].

You may have good reasons for working on huge PUD entry support;
and perhaps we have different understandings of "THP".

Fragmentation: that's what horrifies me about 1GB THP.

The dark side of THP is compaction.  People have put in a lot of effort
to get compaction working as well as it currently does, but getting 512
adjacent 4k pages is not easy.  Getting 512*512 adjacent 4k pages is
very much harder.  Please put in the work on compaction before you
attempt to support 1GB THP.

Related fears: unexpected latencies; unacceptable variance between runs;
frequent rebooting of machines to get back to an unfragmented state;
page table code that most of us will never be in a position to test.

Sorry, no, I'm not reading your patches: that's not personal, it's
just that I've more than enough to do already, and must make choices.

Hugh

> 
> [1] https://lore.kernel.org/linux-mm/bdec12bd-9188-9f3e-c442-aa33e25303a6@amd.com/
> [2] https://lore.kernel.org/linux-mm/20200903162527.GF60440@carbon.dhcp.thefacebook.com/
> [3] https://lwn.net/Articles/837928/
> [4] https://lore.kernel.org/linux-mm/20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com/
> 
> —
> Best Regards,
> Yan, Zi

  reply	other threads:[~2021-08-09  4:05 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 19:02 [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
2021-08-05 19:02 ` [RFC PATCH 01/15] arch: x86: remove MAX_ORDER exceeding SECTION_SIZE check for 32bit vdso Zi Yan
2021-08-05 19:02 ` [RFC PATCH 02/15] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER Zi Yan
2021-08-05 19:02 ` [RFC PATCH 03/15] mm: check pfn validity when buddy allocator can merge pages across mem sections Zi Yan
2021-08-05 19:02 ` [RFC PATCH 04/15] mm: prevent pageblock size being larger than section size Zi Yan
2021-08-05 19:02 ` [RFC PATCH 05/15] mm/memory_hotplug: online pages at " Zi Yan
2021-08-05 19:02 ` [RFC PATCH 06/15] mm: use PAGES_PER_SECTION instead for mem_map_offset/next() Zi Yan
2021-08-05 19:02 ` [RFC PATCH 07/15] mm: hugetlb: use PAGES_PER_SECTION to check mem_map discontiguity Zi Yan
2021-08-05 19:02 ` [RFC PATCH 08/15] fs: proc: use PAGES_PER_SECTION for page offline checking period Zi Yan
2021-08-07 10:32   ` Mike Rapoport
2021-08-09 15:45     ` [RFC PATCH 08/15] " Zi Yan
2021-08-05 19:02 ` [RFC PATCH 09/15] virtio: virtio_mem: use PAGES_PER_SECTION instead of MAX_ORDER_NR_PAGES Zi Yan
2021-08-09  7:35   ` David Hildenbrand
2021-08-05 19:02 ` [RFC PATCH 10/15] virtio: virtio_balloon: " Zi Yan
2021-08-09  7:42   ` David Hildenbrand
2021-08-05 19:02 ` [RFC PATCH 11/15] mm/page_reporting: report pages at section size instead of MAX_ORDER Zi Yan
2021-08-09  7:25   ` David Hildenbrand
2021-08-09 14:12     ` Alexander Duyck
2021-08-09 15:08       ` Zi Yan
2021-08-09 16:51         ` Alexander Duyck
2021-08-09 14:08   ` Alexander Duyck
2021-08-05 19:02 ` [RFC PATCH 12/15] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
2021-08-06 15:16   ` Vlastimil Babka
2021-08-06 15:23     ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 13/15] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
2021-08-05 19:16   ` Christian König
2021-08-05 19:58     ` Zi Yan
2021-08-06  9:37       ` Christian König
2021-08-06 14:00         ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 14/15] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
2021-08-08  8:23   ` Mike Rapoport
2021-08-09 15:35     ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 15/15] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
2021-08-06 15:36 ` [RFC PATCH 00/15] Make MAX_ORDER adjustable as " Vlastimil Babka
2021-08-06 16:16   ` David Hildenbrand
2021-08-06 16:54     ` Vlastimil Babka
2021-08-06 17:08       ` David Hildenbrand
2021-08-06 18:24         ` Zi Yan
2021-08-09  7:20           ` David Hildenbrand
2021-08-08  7:41       ` Mike Rapoport
2021-08-06 16:32 ` Vlastimil Babka
2021-08-06 17:19   ` Zi Yan
2021-08-06 20:27     ` Hugh Dickins
2021-08-06 21:26       ` Zi Yan
2021-08-09  4:04         ` Hugh Dickins [this message]
2021-08-07  1:10       ` Matthew Wilcox
2021-08-07 21:23         ` Matthew Wilcox
2021-08-09  4:29         ` Hugh Dickins
2021-08-09 11:22           ` Matthew Wilcox
2021-08-09  7:41 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6aaa9e7c-5ac4-a490-46f1-31d71edf72cf@google.com \
    --to=hughd@google.com \
    --cc=david@redhat.com \
    --cc=guro@fb.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).