LinuxPPC-Dev Archive on
 help / color / Atom feed
From: David Hildenbrand <>
To: Christoph Lameter <>,
	Anshuman Khandual <>
Cc: "" <>,
	Mel Gorman <>,, Michal Hocko <>,, Paul Mackerras <>,,
	"linuxppc-dev @ lists . ozlabs . org"
	<>, Vlastimil Babka <>,
	Mike Kravetz <>
Subject: Re: [PATCH V2] mm/page_alloc: Ensure that HUGETLB_PAGE_ORDER is less than MAX_ORDER
Date: Tue, 20 Apr 2021 11:03:43 +0200
Message-ID: <> (raw)
In-Reply-To: <>

Hi Christoph,

thanks for your insight.

> You can have larger blocks but you would need to allocate multiple
> contigous max order blocks or do it at boot time before the buddy
> allocator is active.
> What IA64 did was to do this at boot time thereby avoiding the buddy
> lists. And it had a separate virtual address range and page table for the
> huge pages.
> Looks like the current code does these allocations via CMA which should
> also bypass the buddy allocator.

Using CMA doesn't really care about the pageblock size when it comes to 
fragmentation avoidance a.k.a. somewhat reliable allocation of memory 
chunks with an order > MAX_ORDER - 1.

IOW, when using CMA for hugetlb, we don't need pageblock_order > 

>>>      }
>>> But it's kind of weird, isn't it? Let's assume we have MAX_ORDER - 1 correspond to 4 MiB and pageblock_order correspond to 8 MiB.
>>> Sure, we'd be grouping pages in 8 MiB chunks, however, we cannot even
>>> allocate 8 MiB chunks via the buddy. So only alloc_contig_range()
>>> could really grab them (IOW: gigantic pages).
>> Right.
> But then you can avoid the buddy allocator.
>>> Further, we have code like deferred_free_range(), where we end up
>>> calling __free_pages_core()->...->__free_one_page() with
>>> pageblock_order. Wouldn't we end up setting the buddy order to
>>> something > MAX_ORDER -1 on that path?
>> Agreed.
> We would need to return the supersized block to the huge page pool and not
> to the buddy allocator. There is a special callback in the compound page
> sos that you can call an alternate free function that is not the buddy
> allocator.

Sorry, but that doesn't make any sense. We are talking about bringup 
code, where we transition from memblock to the buddy and fill the free 
page lists. Looking at the code, deferred initialization of the memmap 
is broken on these setups -- so I deferred memmap init is never enabled.

>>> Having pageblock_order > MAX_ORDER feels wrong and looks shaky.
>> Agreed, definitely does not look right. Lets see what other folks
>> might have to say on this.
>> + Christoph Lameter <>
> It was done for a long time successfully and is running in numerous
> configurations.

Enforcing pageblock_order < MAX_ORDER would mean that runtime allocation 
of gigantic (here:huge) pages (HUGETLB_PAGE_ORDER >= MAX_ORDER) via 
alloc_contig_pages() becomes less reliable. To compensate, relevant 
archs could switch to "hugetlb_cma=", to improve the reliability of 
runtime allocation.

I wonder which configurations we are talking about:

a) ia64

At least I couldn't care less; it's a dead architecture -- not
sure how much people care about "more reliable runtime
allocation of gigantic (here: huge) pages". Also, not sure about which 
exact configurations.

b) ppc64

We have variable hpage size only with CONFIG_PPC_BOOK3S_64. We 
initialize the hugepage either to 1M, 2M or 16M. 16M seems to be the 
primary choice.


default "9" if PPC64 && PPC_64K_PAGES
-> 16M effective buddy maximum size
default "13" if PPC64 && !PPC_64K_PAGES
-> 16M effective buddy maximum size

So I fail to see in which scenario we even could end up with 
pageblock_order < MAX_ORDER. I did not check ppc32.


David / dhildenb

      reply index

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <>
2021-04-12  8:06 ` Anshuman Khandual
2021-04-12  8:47   ` David Hildenbrand
2021-04-19  3:45     ` Anshuman Khandual
2021-04-19 10:48       ` Christoph Lameter
2021-04-20  9:03         ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LinuxPPC-Dev Archive on

Archives are clonable:
	git clone --mirror linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ \
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone