linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>, linux-mm@kvack.org
Cc: Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Michal Hocko <mhocko@kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter.
Date: Mon, 9 Aug 2021 09:41:35 +0200	[thread overview]
Message-ID: <28b57903-fae6-47ac-7e1b-a1dd41421349@redhat.com> (raw)
In-Reply-To: <20210805190253.2795604-1-zi.yan@sent.com>

On 05.08.21 21:02, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> Hi all,
> 
> This patchset add support for kernel boot time adjustable MAX_ORDER, so that
> user can change the largest size of pages obtained from buddy allocator. It also
> removes the restriction on MAX_ORDER based on SECTION_SIZE_BITS, so that
> buddy allocator can merge PFNs across memory sections when SPARSEMEM_VMEMMAP is
> set. It is on top of v5.14-rc4-mmotm-2021-08-02-18-51.
> 
> Motivation
> ===
> 
> This enables kernel to allocate 1GB pages and is necessary for my ongoing work
> on adding support for 1GB PUD THP[1]. This is also the conclusion I came up with
> after some discussion with David Hildenbrand on what methods should be used for
> allocating gigantic pages[2], since other approaches like using CMA allocator or
> alloc_contig_pages() are regarded as suboptimal.
> 
> This also prevents increasing SECTION_SIZE_BITS when increasing MAX_ORDER, since
> increasing SECTION_SIZE_BITS is not desirable as memory hotadd/hotremove chunk
> size will be increased as well, causing memory management difficulty for VMs.
> 
> In addition, make MAX_ORDER a kernel boot time parameter can enable user to
> adjust buddy allocator without recompiling the kernel for their own needs, so
> that one can still have a small MAX_ORDER if he/she does not need to allocate
> gigantic pages like 1GB PUD THPs.
> 
> Background
> ===
> 
> At the moment, kernel imposes MAX_ORDER - 1 + PAGE_SHFIT < SECTION_SIZE_BITS
> restriction. This prevents buddy allocator merging pages across memory sections,
> as PFNs might not be contiguous and code like page++ would fail. But this would
> not be an issue when SPARSEMEM_VMEMMAP is set, since all struct page are
> virtually contiguous. In addition, as long as buddy allocator checks the PFN
> validity during buddy page merging (done in Patch 3), pages allocated from
> buddy allocator can be manipulated by code like page++.
> 
> 
> Description
> ===
> 
> I tested the patchset on both x86_64 and ARM64 at 4KB, 16KB, and 64KB base
> pages. The systems boot and ltp mm test suite finished without issue. Also
> memory hotplug worked on x86_64 when I tested. It definitely needs more tests
> and reviews for other architectures.
> 
> In terms of the concerns on performance degradation if MAX_ORDER is increased,
> I did some initial performance tests comparing MAX_ORDER=11 and MAX_ORDER=20 on
> x86_64 machines and saw no performance difference[3].
> 
> Patch 1 excludes MAX_ORDER check from 32bit vdso compilation. The check uses
> irrelevant 32bit SECTION_SIZE_BITS during 64bit kernel compilation. The
> exclusion does not break the check in 32bit kernel, since the check will still
> be performed during other kernel component compilation.
> 
> Patch 2 gives FORCE_MAX_ZONEORDER a better name.
> 
> Patch 3 restores the pfn_valid_within() check when buddy allocator can merge
> pages across memory sections. The check was removed when ARM64 gets rid of holes
> in zones, but holes can appear in zones again after this patchset.
> 
> Patch 4-11 convert the use of MAX_ORDER to SECTION_SIZE_BITS or its derivative
> constants, since these code places use MAX_ORDER as boundary check for
> physically contiguous pages, where SECTION_SIZE_BITS should be used. After this
> patchset, MAX_ORDER can go beyond SECTION_SIZE_BITS, the code can break.
> I separate changes to different patches for easy review and can merge them into
> a single one if that works better.
> 
> Patch 12 adds a new Kconfig option SET_MAX_ORDER to allow specifying MAX_ORDER
> when ARCH_FORCE_MAX_ORDER is not used by the arch, like x86_64.
> 
> Patch 13 converts statically allocated arrays with MAX_ORDER length to dynamic
> ones if possible and prepares for making MAX_ORDER a boot time parameter.
> 
> Patch 14 adds a new MIN_MAX_ORDER constant to replace soon-to-be-dynamic
> MAX_ORDER for places where converting static array to dynamic one is causing
> hassle and not necessary, i.e., ARM64 hypervisor page allocation and SLAB.
> 
> Patch 15 finally changes MAX_ORDER to be a kernel boot time parameter.
> 
> 
> Any suggestion and/or comment is welcome. Thanks.
> 
> 
> TODO
> ===
> 
> 1. Redo the performance comparison tests using this patchset to understand the
>     performance implication of changing MAX_ORDER.


2. Make alloc_contig_range() cleanly deal with pageblock_order instead 
of MAX_ORDER - 1 to not force the minimal CMA area size/alignment to be 
e.g., 1 GiB instead of 4 MiB and to keep virtio-mem working as expected.

virtio-mem short term would mean disallowing initialization when an 
incompatible setup (MAX_ORDER_NR_PAGE > SECTION_NR_PAGES) is detected 
and bailing out loud that the admin has to fix that on the command line. 
I have optimizing alloc_contig_range() on my todo list, to get rid of 
the MAX_ORDER -1 dependency in virtio-mem; but I have no idea when I'll 
have time to work on that.


-- 
Thanks,

David / dhildenb



      parent reply	other threads:[~2021-08-09  7:41 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 19:02 [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
2021-08-05 19:02 ` [RFC PATCH 01/15] arch: x86: remove MAX_ORDER exceeding SECTION_SIZE check for 32bit vdso Zi Yan
2021-08-05 19:02 ` [RFC PATCH 02/15] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER Zi Yan
2021-08-05 19:02 ` [RFC PATCH 03/15] mm: check pfn validity when buddy allocator can merge pages across mem sections Zi Yan
2021-08-05 19:02 ` [RFC PATCH 04/15] mm: prevent pageblock size being larger than section size Zi Yan
2021-08-05 19:02 ` [RFC PATCH 05/15] mm/memory_hotplug: online pages at " Zi Yan
2021-08-05 19:02 ` [RFC PATCH 06/15] mm: use PAGES_PER_SECTION instead for mem_map_offset/next() Zi Yan
2021-08-05 19:02 ` [RFC PATCH 07/15] mm: hugetlb: use PAGES_PER_SECTION to check mem_map discontiguity Zi Yan
2021-08-05 19:02 ` [RFC PATCH 08/15] fs: proc: use PAGES_PER_SECTION for page offline checking period Zi Yan
2021-08-07 10:32   ` Mike Rapoport
2021-08-09 15:45     ` [RFC PATCH 08/15] " Zi Yan
2021-08-05 19:02 ` [RFC PATCH 09/15] virtio: virtio_mem: use PAGES_PER_SECTION instead of MAX_ORDER_NR_PAGES Zi Yan
2021-08-09  7:35   ` David Hildenbrand
2021-08-05 19:02 ` [RFC PATCH 10/15] virtio: virtio_balloon: " Zi Yan
2021-08-09  7:42   ` David Hildenbrand
2021-08-05 19:02 ` [RFC PATCH 11/15] mm/page_reporting: report pages at section size instead of MAX_ORDER Zi Yan
2021-08-09  7:25   ` David Hildenbrand
2021-08-09 14:12     ` Alexander Duyck
2021-08-09 15:08       ` Zi Yan
2021-08-09 16:51         ` Alexander Duyck
2021-08-09 14:08   ` Alexander Duyck
2021-08-05 19:02 ` [RFC PATCH 12/15] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
2021-08-06 15:16   ` Vlastimil Babka
2021-08-06 15:23     ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 13/15] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
2021-08-05 19:16   ` Christian König
2021-08-05 19:58     ` Zi Yan
2021-08-06  9:37       ` Christian König
2021-08-06 14:00         ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 14/15] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
2021-08-08  8:23   ` Mike Rapoport
2021-08-09 15:35     ` Zi Yan
2021-08-05 19:02 ` [RFC PATCH 15/15] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
2021-08-06 15:36 ` [RFC PATCH 00/15] Make MAX_ORDER adjustable as " Vlastimil Babka
2021-08-06 16:16   ` David Hildenbrand
2021-08-06 16:54     ` Vlastimil Babka
2021-08-06 17:08       ` David Hildenbrand
2021-08-06 18:24         ` Zi Yan
2021-08-09  7:20           ` David Hildenbrand
2021-08-08  7:41       ` Mike Rapoport
2021-08-06 16:32 ` Vlastimil Babka
2021-08-06 17:19   ` Zi Yan
2021-08-06 20:27     ` Hugh Dickins
2021-08-06 21:26       ` Zi Yan
2021-08-09  4:04         ` Hugh Dickins
2021-08-07  1:10       ` Matthew Wilcox
2021-08-07 21:23         ` Matthew Wilcox
2021-08-09  4:29         ` Hugh Dickins
2021-08-09 11:22           ` Matthew Wilcox
2021-08-09  7:41 ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28b57903-fae6-47ac-7e1b-a1dd41421349@redhat.com \
    --to=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).