linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter.
@ 2022-09-22  1:12 Zi Yan
  2022-09-22  1:12 ` [PATCH v1 01/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator Zi Yan
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset adds support for kernel boot time adjustable MAX_ORDER, so that
user can change the largest size of pages buddy allocator allocates.
It is on top of mm-everything-2022-09-19-00-45.

Changelog
===
From RFCv2

1. Dropped RFC, collected reviewed-by.
2. Added back page validation check in find_buddy_page_pfn() since it is
needed when zone is not contiguous.
3. Converted MAX_ORDER sized static array used in recently added kmsan code to
a dynamic one.

Motivation
===

This enables kernel to allocate 1GB pages and is necessary for my ongoing work
on adding support for 1GB PUD THP[1]. This is also the conclusion I came up with
after some discussion with David Hildenbrand on what methods should be used for
allocating gigantic pages[2], since other approaches like using CMA allocator or
alloc_contig_pages() are regarded as suboptimal.

In addition, make MAX_ORDER a kernel boot time parameter can enable user to
adjust buddy allocator without recompiling the kernel for their own needs, so
that one can still have a small MAX_ORDER if he/she does not need to allocate
gigantic pages like 1GB PUD THPs.

Background
===

At the moment, kernel imposes MAX_ORDER - 1 + PAGE_SHFIT < SECTION_SIZE_BITS
restriction. This prevents buddy allocator merging pages across memory sections,
as PFNs might not be contiguous and code like page++ would fail. But this would
not be an issue when SPARSEMEM_VMEMMAP is set, since all struct page are
virtually contiguous. So boot time adjustable MAX_ORDER depends on
SPARSEMEM_VMEMMAP.

Description
===

I tested the patchset on both x86_64 and ARM64 at 4KB base pages. The systems
boot and run.

In terms of the concerns on performance degradation if MAX_ORDER is increased,
I run vm-scalability from lkp comparing current system, my patchset with
MAX_ORDER=11 and my patchset with MAX_ORDER=20 on a x86_64 VM and saw
almost no performance difference, please vm-scalability reports in the
RFCv2: https://lore.kernel.org/linux-mm/20220811231643.1012912-1-zi.yan@sent.com/

Patch 1 changes MAX_ORDER to represent the max order of pages allocated
by buddy allocator. right now MAX_ORDER - 1 represents that and it is
confusing. Suggested by Vlastimil Babka. checkpatch.pl is updated to
warn future use of MAX_ORDER, since its semantics is changed.

Patch 2 adds a page validation in find_buddy_page_pfn() when zone is not
contiguous, since some pages in the middle of a zone can be invalid.

Patch 3 make deferred struct page initialization work when MAX_ORDER is
bigger than a memory section size.

Patch 4-7 convert the use of MAX_ORDER to pageblock_order. Since
pageblock_order is a constant when MAX_ORDER can be changed at boot time
and close to current MAX_ORDER value. I separate changes to different patches
for easy review and can merge them into a single one if that works better.

Patch 8 replaces MAX_ORDER with MAX_PHYS_CONTIG_ORDER when it is used to
indicate the maximum number of physically contiguous pages.

Patch 9 adds a new Kconfig option SET_MAX_ORDER to allow specifying MAX_ORDER
when ARCH_FORCE_MAX_ORDER is not used by the arch, like x86_64.

Patch 10 converts statically allocated arrays with MAX_ORDER length to dynamic
ones if possible and prepares for making MAX_ORDER a boot time parameter.

Patch 11 adds a new MIN_MAX_ORDER constant to replace soon-to-be-dynamic
MAX_ORDER for places where converting static array to dynamic one is causing
hassle and not necessary, i.e., ARM64 hypervisor page allocation and SLAB.

Patch 12 changes MAX_ORDER to be a kernel boot time parameter and it is
opt-in as an mm/Kconfig option.


Any suggestion and/or comment is welcome. Thanks.


[1] https://lore.kernel.org/linux-mm/20200928175428.4110504-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/e132fdd9-65af-1cad-8a6e-71844ebfe6a2@redhat.com/

Zi Yan (12):
  mm: rectify MAX_ORDER semantics to be the largest page order from
    buddy allocator
  mm: check page validity when find a buddy page in a non-contiguous
    zone
  mm: adapt deferred struct page init to new MAX_ORDER.
  mm: prevent pageblock size being larger than section size.
  fs: proc: use pageblock_nr_pages for reschedule period in read_kcore()
  virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER
  mm/page_reporting: set page_reporting_order to -1 to prevent it
    running
  mm: replace MAX_ORDER when it is used to indicate max physical
    contiguity.
  mm: Make MAX_ORDER of buddy allocator configurable via Kconfig
    SET_MAX_ORDER.
  mm: convert MAX_ORDER sized static arrays to dynamic ones.
  mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time
    constant.
  mm: make MAX_ORDER a kernel boot time parameter.

 .../admin-guide/kdump/vmcoreinfo.rst          |   4 +-
 .../admin-guide/kernel-parameters.txt         |   9 +-
 arch/Kconfig                                  |   4 +
 arch/arc/Kconfig                              |   4 +-
 arch/arm/Kconfig                              |  12 +-
 arch/arm/configs/imx_v6_v7_defconfig          |   2 +-
 arch/arm/configs/milbeaut_m10v_defconfig      |   2 +-
 arch/arm/configs/oxnas_v6_defconfig           |   2 +-
 arch/arm/configs/pxa_defconfig                |   2 +-
 arch/arm/configs/sama7_defconfig              |   2 +-
 arch/arm/configs/sp7021_defconfig             |   2 +-
 arch/arm64/Kconfig                            |  16 +--
 arch/arm64/include/asm/sparsemem.h            |   2 +-
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   2 +-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          |   2 +-
 arch/csky/Kconfig                             |   2 +-
 arch/ia64/Kconfig                             |   8 +-
 arch/ia64/include/asm/sparsemem.h             |   4 +-
 arch/ia64/mm/hugetlbpage.c                    |   2 +-
 arch/loongarch/Kconfig                        |  16 +--
 arch/m68k/Kconfig.cpu                         |   8 +-
 arch/mips/Kconfig                             |  22 ++-
 arch/nios2/Kconfig                            |  10 +-
 arch/powerpc/Kconfig                          |  30 ++---
 arch/powerpc/configs/85xx/ge_imp3a_defconfig  |   2 +-
 arch/powerpc/configs/fsl-emb-nonhw.config     |   2 +-
 arch/powerpc/mm/book3s64/iommu_api.c          |   2 +-
 arch/powerpc/mm/hugetlbpage.c                 |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c     |   2 +-
 arch/sh/configs/ecovec24_defconfig            |   2 +-
 arch/sh/mm/Kconfig                            |  20 ++-
 arch/sparc/Kconfig                            |   8 +-
 arch/sparc/kernel/pci_sun4v.c                 |   2 +-
 arch/sparc/kernel/traps_64.c                  |   2 +-
 arch/sparc/mm/tsb.c                           |   4 +-
 arch/um/kernel/um_arch.c                      |   4 +-
 arch/xtensa/Kconfig                           |   8 +-
 drivers/base/regmap/regmap-debugfs.c          |   8 +-
 drivers/crypto/hisilicon/sgl.c                |   6 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 drivers/gpu/drm/ttm/ttm_device.c              |   7 +-
 drivers/gpu/drm/ttm/ttm_pool.c                |  72 ++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   2 +-
 drivers/irqchip/irq-gic-v3-its.c              |   4 +-
 drivers/md/dm-bufio.c                         |   2 +-
 drivers/misc/genwqe/card_utils.c              |   2 +-
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   2 +-
 drivers/net/ethernet/ibm/ibmvnic.h            |   2 +-
 drivers/video/fbdev/hyperv_fb.c               |   6 +-
 drivers/virtio/virtio_balloon.c               |   2 +-
 drivers/virtio/virtio_mem.c                   |   8 +-
 fs/proc/kcore.c                               |   2 +-
 fs/ramfs/file-nommu.c                         |   2 +-
 include/drm/ttm/ttm_pool.h                    |   4 +-
 include/linux/hugetlb.h                       |   2 +-
 include/linux/mmzone.h                        |  36 ++++-
 include/linux/pageblock-flags.h               |  21 ++-
 include/linux/slab.h                          |   8 +-
 kernel/crash_core.c                           |   2 +-
 kernel/dma/pool.c                             |   8 +-
 kernel/events/ring_buffer.c                   |   2 +-
 mm/Kconfig                                    |  33 ++++-
 mm/compaction.c                               |   8 +-
 mm/debug_vm_pgtable.c                         |   4 +-
 mm/huge_memory.c                              |   2 +-
 mm/hugetlb.c                                  |   4 +-
 mm/internal.h                                 |   8 +-
 mm/kmsan/init.c                               |  18 ++-
 mm/memblock.c                                 |   8 +-
 mm/memory.c                                   |   4 +-
 mm/memory_hotplug.c                           |   6 +-
 mm/page_alloc.c                               | 127 +++++++++++++-----
 mm/page_isolation.c                           |  14 +-
 mm/page_owner.c                               |   6 +-
 mm/page_reporting.c                           |   8 +-
 mm/shuffle.h                                  |   2 +-
 mm/slab.c                                     |   2 +-
 mm/slub.c                                     |   6 +-
 mm/vmscan.c                                   |   1 -
 mm/vmstat.c                                   |  14 +-
 net/smc/smc_ib.c                              |   2 +-
 scripts/checkpatch.pl                         |   8 ++
 security/integrity/ima/ima_crypto.c           |   2 +-
 tools/testing/memblock/linux/mmzone.h         |   6 +-
 84 files changed, 462 insertions(+), 272 deletions(-)

-- 
2.35.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v1 01/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 02/12] mm: check page validity when find a buddy page in a non-contiguous zone Zi Yan
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

MAX_ORDER used to denote the largest page order + 1, but that was
confusing and caused several off-by-1 errors in the code. Fix it by
setting MAX_ORDER to the largest page order from buddy allocator like
what its name says.

Add a warning in checkpatch.pl about the semantics change.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 .../admin-guide/kdump/vmcoreinfo.rst          |  4 +-
 .../admin-guide/kernel-parameters.txt         |  4 +-
 arch/arc/Kconfig                              |  4 +-
 arch/arm/Kconfig                              | 12 +++---
 arch/arm/configs/imx_v6_v7_defconfig          |  2 +-
 arch/arm/configs/milbeaut_m10v_defconfig      |  2 +-
 arch/arm/configs/oxnas_v6_defconfig           |  2 +-
 arch/arm/configs/pxa_defconfig                |  2 +-
 arch/arm/configs/sama7_defconfig              |  2 +-
 arch/arm/configs/sp7021_defconfig             |  2 +-
 arch/arm64/Kconfig                            | 16 ++++----
 arch/arm64/include/asm/sparsemem.h            |  2 +-
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |  2 +-
 arch/csky/Kconfig                             |  2 +-
 arch/ia64/Kconfig                             |  8 ++--
 arch/ia64/include/asm/sparsemem.h             |  4 +-
 arch/ia64/mm/hugetlbpage.c                    |  2 +-
 arch/loongarch/Kconfig                        | 16 ++++----
 arch/m68k/Kconfig.cpu                         |  8 ++--
 arch/mips/Kconfig                             | 22 +++++-----
 arch/nios2/Kconfig                            | 10 ++---
 arch/powerpc/Kconfig                          | 30 +++++++-------
 arch/powerpc/configs/85xx/ge_imp3a_defconfig  |  2 +-
 arch/powerpc/configs/fsl-emb-nonhw.config     |  2 +-
 arch/powerpc/mm/book3s64/iommu_api.c          |  2 +-
 arch/powerpc/mm/hugetlbpage.c                 |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c     |  2 +-
 arch/sh/configs/ecovec24_defconfig            |  2 +-
 arch/sh/mm/Kconfig                            | 20 +++++-----
 arch/sparc/Kconfig                            |  8 ++--
 arch/sparc/kernel/pci_sun4v.c                 |  2 +-
 arch/sparc/kernel/traps_64.c                  |  2 +-
 arch/xtensa/Kconfig                           |  8 ++--
 drivers/base/regmap/regmap-debugfs.c          |  8 ++--
 drivers/crypto/hisilicon/sgl.c                |  6 +--
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/ttm/ttm_pool.c                | 22 +++++-----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  2 +-
 drivers/irqchip/irq-gic-v3-its.c              |  4 +-
 drivers/md/dm-bufio.c                         |  2 +-
 drivers/misc/genwqe/card_utils.c              |  2 +-
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  2 +-
 drivers/net/ethernet/ibm/ibmvnic.h            |  2 +-
 drivers/video/fbdev/hyperv_fb.c               |  6 +--
 drivers/virtio/virtio_balloon.c               |  2 +-
 drivers/virtio/virtio_mem.c                   |  8 ++--
 fs/ramfs/file-nommu.c                         |  2 +-
 include/drm/ttm/ttm_pool.h                    |  2 +-
 include/linux/hugetlb.h                       |  2 +-
 include/linux/mmzone.h                        | 10 ++---
 include/linux/pageblock-flags.h               |  4 +-
 include/linux/slab.h                          |  8 ++--
 kernel/crash_core.c                           |  2 +-
 kernel/dma/pool.c                             |  6 +--
 kernel/events/ring_buffer.c                   |  2 +-
 mm/Kconfig                                    |  6 +--
 mm/compaction.c                               |  8 ++--
 mm/debug_vm_pgtable.c                         |  4 +-
 mm/huge_memory.c                              |  2 +-
 mm/hugetlb.c                                  |  4 +-
 mm/kmsan/init.c                               |  8 ++--
 mm/memblock.c                                 |  2 +-
 mm/memory_hotplug.c                           |  4 +-
 mm/page_alloc.c                               | 40 +++++++++----------
 mm/page_isolation.c                           | 14 +++----
 mm/page_owner.c                               |  6 +--
 mm/page_reporting.c                           |  4 +-
 mm/shuffle.h                                  |  2 +-
 mm/slab.c                                     |  2 +-
 mm/slub.c                                     |  4 +-
 mm/vmstat.c                                   | 14 +++----
 net/smc/smc_ib.c                              |  2 +-
 scripts/checkpatch.pl                         |  8 ++++
 security/integrity/ima/ima_crypto.c           |  2 +-
 tools/testing/memblock/linux/mmzone.h         |  6 +--
 75 files changed, 223 insertions(+), 235 deletions(-)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 8419019b6a88..c572b5230fe0 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -172,7 +172,7 @@ variables.
 Offset of the free_list's member. This value is used to compute the number
 of free pages.
 
-Each zone has a free_area structure array called free_area[MAX_ORDER].
+Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
 The free_list represents a linked list of free page blocks.
 
 (list_head, next|prev)
@@ -189,7 +189,7 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
 information. Makedumpfile gets the start address of the vmalloc region
 from this.
 
-(zone.free_area, MAX_ORDER)
+(zone.free_area, MAX_ORDER + 1)
 ---------------------------
 
 Free areas descriptor. User-space tools use this value to iterate the
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3b95f65bafe2..e1094851c328 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -928,7 +928,7 @@
 			buddy allocator. Bigger value increase the probability
 			of catching random memory corruption, but reduce the
 			amount of memory for normal system use. The maximum
-			possible value is MAX_ORDER/2.  Setting this parameter
+			possible value is (MAX_ORDER + 1)/2.  Setting this parameter
 			to 1 or 2 should be enough to identify most random
 			memory corruption problems caused by bugs in kernel or
 			driver code when a CPU writes to (or reads from) a
@@ -3907,7 +3907,7 @@
 			[KNL] Minimal page reporting order
 			Format: <integer>
 			Adjust the minimal page reporting order. The page
-			reporting is disabled when it exceeds (MAX_ORDER-1).
+			reporting is disabled when it exceeds MAX_ORDER.
 
 	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
 			timeout > 0: seconds before rebooting
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index d9a13ccf89a3..ab6d701365bb 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -556,7 +556,7 @@ endmenu	 # "ARC Architecture Configuration"
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "12" if ARC_HUGEPAGE_16M
-	default "11"
+	default "11" if ARC_HUGEPAGE_16M
+	default "10"
 
 source "kernel/power/Kconfig"
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e6c8ee56ac52..c8f2e46cc8c4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1436,19 +1436,17 @@ config ARM_MODULE_PLTS
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "12" if SOC_AM33XX
-	default "9" if SA1111
-	default "11"
+	default "11" if SOC_AM33XX
+	default "8" if SA1111
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 config ALIGNMENT_TRAP
 	def_bool CPU_CP15_MMU
diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
index fb283059daa0..eeb14499479d 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
 CONFIG_SMP=y
 CONFIG_ARM_PSCI=y
 CONFIG_HIGHMEM=y
-CONFIG_ARCH_FORCE_MAX_ORDER=14
+CONFIG_ARCH_FORCE_MAX_ORDER=13
 CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
 CONFIG_KEXEC=y
 CONFIG_CPU_FREQ=y
diff --git a/arch/arm/configs/milbeaut_m10v_defconfig b/arch/arm/configs/milbeaut_m10v_defconfig
index 8620061e19a8..22732f19e79b 100644
--- a/arch/arm/configs/milbeaut_m10v_defconfig
+++ b/arch/arm/configs/milbeaut_m10v_defconfig
@@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
 # CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
 # CONFIG_ARM_PATCH_IDIV is not set
 CONFIG_HIGHMEM=y
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
 CONFIG_SECCOMP=y
 CONFIG_KEXEC=y
 CONFIG_EFI=y
diff --git a/arch/arm/configs/oxnas_v6_defconfig b/arch/arm/configs/oxnas_v6_defconfig
index 5c163a9d1429..7e43aa355467 100644
--- a/arch/arm/configs/oxnas_v6_defconfig
+++ b/arch/arm/configs/oxnas_v6_defconfig
@@ -12,7 +12,7 @@ CONFIG_ARCH_OXNAS=y
 CONFIG_MACH_OX820=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=16
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
 CONFIG_SECCOMP=y
 CONFIG_ARM_APPENDED_DTB=y
 CONFIG_ARM_ATAG_DTB_COMPAT=y
diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig
index ce3f4ed50498..9b54d5f4edb7 100644
--- a/arch/arm/configs/pxa_defconfig
+++ b/arch/arm/configs/pxa_defconfig
@@ -21,7 +21,7 @@ CONFIG_MACH_AKITA=y
 CONFIG_MACH_BORZOI=y
 CONFIG_PXA_SYSTEMS_CPLDS=y
 CONFIG_AEABI=y
-CONFIG_ARCH_FORCE_MAX_ORDER=9
+CONFIG_ARCH_FORCE_MAX_ORDER=8
 CONFIG_CMDLINE="root=/dev/ram0 ro"
 CONFIG_KEXEC=y
 CONFIG_CPU_FREQ=y
diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig
index 8b2cf6ddd568..c200de3947e3 100644
--- a/arch/arm/configs/sama7_defconfig
+++ b/arch/arm/configs/sama7_defconfig
@@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
 # CONFIG_CACHE_L2X0 is not set
 # CONFIG_ARM_PATCH_IDIV is not set
 # CONFIG_CPU_SW_DOMAIN_PAN is not set
-CONFIG_ARCH_FORCE_MAX_ORDER=15
+CONFIG_ARCH_FORCE_MAX_ORDER=14
 CONFIG_UACCESS_WITH_MEMCPY=y
 # CONFIG_ATAGS is not set
 CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
diff --git a/arch/arm/configs/sp7021_defconfig b/arch/arm/configs/sp7021_defconfig
index 151ca8c47373..2863df49da1c 100644
--- a/arch/arm/configs/sp7021_defconfig
+++ b/arch/arm/configs/sp7021_defconfig
@@ -18,7 +18,7 @@ CONFIG_ARCH_SUNPLUS=y
 # CONFIG_VDSO is not set
 CONFIG_SMP=y
 CONFIG_THUMB2_KERNEL=y
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
 CONFIG_VFP=y
 CONFIG_NEON=y
 CONFIG_MODULES=y
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c5c7d812704c..1567582e95f9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1420,25 +1420,23 @@ config XEN
 
 config ARCH_FORCE_MAX_ORDER
 	int
-	default "14" if ARM64_64K_PAGES
-	default "12" if ARM64_16K_PAGES
-	default "11"
+	default "13" if ARM64_64K_PAGES
+	default "11" if ARM64_16K_PAGES
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  We make sure that we can allocate upto a HugePage size for each configuration.
 	  Hence we have :
-		MAX_ORDER = (PMD_SHIFT - PAGE_SHIFT) + 1 => PAGE_SHIFT - 2
+		MAX_ORDER = PMD_SHIFT - PAGE_SHIFT = PAGE_SHIFT - 3
 
-	  However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
+	  However for 4K, we choose a higher default value, 10 as opposed to 9, giving us
 	  4M allocations matching the default size used by generic code.
 
 config UNMAP_KERNEL_AT_EL0
diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
index 4b73463423c3..5f5437621029 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -10,7 +10,7 @@
 /*
  * Section size must be at least 512MB for 64K base
  * page size config. Otherwise it will be less than
- * (MAX_ORDER - 1) and the build process will fail.
+ * MAX_ORDER and the build process will fail.
  */
 #ifdef CONFIG_ARM64_64K_PAGES
 #define SECTION_SIZE_BITS 29
diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 0a048dc06a7d..fe5472a184a3 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -16,7 +16,7 @@ struct hyp_pool {
 	 * API at EL2.
 	 */
 	hyp_spinlock_t lock;
-	struct list_head free_area[MAX_ORDER];
+	struct list_head free_area[MAX_ORDER + 1];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
 	unsigned short max_order;
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index adee6ab36862..a35fc882e97e 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -334,7 +334,7 @@ config HIGHMEM
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "11"
+	default "10"
 
 config DRAM_BASE
 	hex "DRAM start addr (the same with memory-section in dts)"
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index c6e06cdc738f..d85f6fbd0746 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -201,10 +201,10 @@ config IA64_CYCLONE
 	  If you're unsure, answer N.
 
 config ARCH_FORCE_MAX_ORDER
-	int "MAX_ORDER (11 - 17)"  if !HUGETLB_PAGE
-	range 11 17  if !HUGETLB_PAGE
-	default "17" if HUGETLB_PAGE
-	default "11"
+	int "MAX_ORDER (10 - 16)"  if !HUGETLB_PAGE
+	range 10 16  if !HUGETLB_PAGE
+	default "16" if HUGETLB_PAGE
+	default "10"
 
 config SMP
 	bool "Symmetric multi-processing support"
diff --git a/arch/ia64/include/asm/sparsemem.h b/arch/ia64/include/asm/sparsemem.h
index 84e8ce387b69..04f03a56c166 100644
--- a/arch/ia64/include/asm/sparsemem.h
+++ b/arch/ia64/include/asm/sparsemem.h
@@ -12,9 +12,9 @@
 #define SECTION_SIZE_BITS	(30)
 #define MAX_PHYSMEM_BITS	(50)
 #ifdef CONFIG_ARCH_FORCE_MAX_ORDER
-#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
+#if ((CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS)
 #undef SECTION_SIZE_BITS
-#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT)
+#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT)
 #endif
 #endif
 
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index f993cb36c062..87cc2e8908b4 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -185,7 +185,7 @@ static int __init hugetlb_setup_sz(char *str)
 	size = memparse(str, &str);
 	if (*str || !is_power_of_2(size) || !(tr_pages & size) ||
 		size <= PAGE_SIZE ||
-		size >= (1UL << PAGE_SHIFT << MAX_ORDER)) {
+		size > (1UL << PAGE_SHIFT << MAX_ORDER)) {
 		printk(KERN_WARNING "Invalid huge page size specified\n");
 		return 1;
 	}
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 3c7a5a54b808..e3c7a6293db7 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -372,22 +372,20 @@ config NODES_SHIFT
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 14 64 if PAGE_SIZE_64KB
+	range 13 63 if PAGE_SIZE_64KB
 	default "14" if PAGE_SIZE_64KB
-	range 12 64 if PAGE_SIZE_16KB
-	default "12" if PAGE_SIZE_16KB
-	range 11 64
-	default "11"
+	range 11 63 if PAGE_SIZE_16KB
+	default "11" if PAGE_SIZE_16KB
+	range 10 63
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  The page size is not necessarily 4KB.  Keep this in mind
 	  when choosing a value for this option.
diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index 3b2f39508524..d3832e1ca7df 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -402,22 +402,20 @@ config SINGLE_MEMORY_CHUNK
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order" if ADVANCED
 	depends on !SINGLE_MEMORY_CHUNK
-	default "11"
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  For systems that have holes in their physical address space this
 	  value also defines the minimal size of the hole that allows
 	  freeing unused memory map.
 
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
-
 config 060_WRITETHROUGH
 	bool "Use write-through caching for 68060 supervisor accesses"
 	depends on ADVANCED && M68060
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 70d28976a40d..37116c811e60 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2142,24 +2142,22 @@ endchoice
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
-	default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
-	range 13 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
-	default "13" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
-	range 12 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
-	default "12" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
-	range 0 64
-	default "11"
+	range 13 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
+	default "13" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
+	range 12 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
+	default "12" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
+	range 11 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
+	default "11" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
+	range 0 63
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  The page size is not necessarily 4KB.  Keep this in mind
 	  when choosing a value for this option.
diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index a582f72104f3..0cccaf8b7fdf 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -46,18 +46,16 @@ source "kernel/Kconfig.hz"
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 9 20
-	default "11"
+	range 8 19
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 endmenu
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 39d71d7701bd..d052cf27883e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -847,28 +847,26 @@ config DATA_SHIFT
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 8 9 if PPC64 && PPC_64K_PAGES
-	default "9" if PPC64 && PPC_64K_PAGES
-	range 13 13 if PPC64 && !PPC_64K_PAGES
-	default "13" if PPC64 && !PPC_64K_PAGES
-	range 9 64 if PPC32 && PPC_16K_PAGES
-	default "9" if PPC32 && PPC_16K_PAGES
-	range 7 64 if PPC32 && PPC_64K_PAGES
-	default "7" if PPC32 && PPC_64K_PAGES
-	range 5 64 if PPC32 && PPC_256K_PAGES
-	default "5" if PPC32 && PPC_256K_PAGES
-	range 11 64
-	default "11"
+	range 7 8 if PPC64 && PPC_64K_PAGES
+	default "8" if PPC64 && PPC_64K_PAGES
+	range 12 12 if PPC64 && !PPC_64K_PAGES
+	default "12" if PPC64 && !PPC_64K_PAGES
+	range 8 63 if PPC32 && PPC_16K_PAGES
+	default "8" if PPC32 && PPC_16K_PAGES
+	range 6 63 if PPC32 && PPC_64K_PAGES
+	default "6" if PPC32 && PPC_64K_PAGES
+	range 4 63 if PPC32 && PPC_256K_PAGES
+	default "4" if PPC32 && PPC_256K_PAGES
+	range 10 63
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 11 means that the largest free memory
+	  block is 2^10 pages.
 
 	  The page size is not necessarily 4KB.  For example, on 64-bit
 	  systems, 64KB pages can be enabled via CONFIG_PPC_64K_PAGES.  Keep
diff --git a/arch/powerpc/configs/85xx/ge_imp3a_defconfig b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
index e7672c186325..b8be8280a200 100644
--- a/arch/powerpc/configs/85xx/ge_imp3a_defconfig
+++ b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
@@ -30,7 +30,7 @@ CONFIG_PREEMPT=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_BINFMT_MISC=m
 CONFIG_MATH_EMULATION=y
-CONFIG_ARCH_FORCE_MAX_ORDER=17
+CONFIG_ARCH_FORCE_MAX_ORDER=16
 CONFIG_PCI=y
 CONFIG_PCIEPORTBUS=y
 CONFIG_PCI_MSI=y
diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config b/arch/powerpc/configs/fsl-emb-nonhw.config
index ab8a8c4530d9..3009b0efaf34 100644
--- a/arch/powerpc/configs/fsl-emb-nonhw.config
+++ b/arch/powerpc/configs/fsl-emb-nonhw.config
@@ -41,7 +41,7 @@ CONFIG_FIXED_PHY=y
 CONFIG_FONT_8x16=y
 CONFIG_FONT_8x8=y
 CONFIG_FONTS=y
-CONFIG_ARCH_FORCE_MAX_ORDER=13
+CONFIG_ARCH_FORCE_MAX_ORDER=12
 CONFIG_FRAMEBUFFER_CONSOLE=y
 CONFIG_FRAME_WARN=1024
 CONFIG_FTL=y
diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 7fcfba162e0d..81d7185e2ae8 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -97,7 +97,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
 	}
 
 	mmap_read_lock(mm);
-	chunk = (1UL << (PAGE_SHIFT + MAX_ORDER - 1)) /
+	chunk = (1UL << (PAGE_SHIFT + MAX_ORDER)) /
 			sizeof(struct vm_area_struct *);
 	chunk = min(chunk, entries);
 	for (entry = 0; entry < entries; entry += chunk) {
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index bc84a594ca62..8d63934783dc 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -652,7 +652,7 @@ void __init gigantic_hugetlb_cma_reserve(void)
 		order = mmu_psize_to_shift(MMU_PAGE_16G) - PAGE_SHIFT;
 
 	if (order) {
-		VM_WARN_ON(order < MAX_ORDER);
+		VM_WARN_ON(order <= MAX_ORDER);
 		hugetlb_cma_reserve(order);
 	}
 }
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9de9b2fb163d..8e29a57924ef 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1740,7 +1740,7 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 * DMA window can be larger than available memory, which will
 	 * cause errors later.
 	 */
-	const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
+	const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER);
 
 	/*
 	 * We create the default window as big as we can. The constraint is
diff --git a/arch/sh/configs/ecovec24_defconfig b/arch/sh/configs/ecovec24_defconfig
index b52e14ccb450..4d655e8d4d74 100644
--- a/arch/sh/configs/ecovec24_defconfig
+++ b/arch/sh/configs/ecovec24_defconfig
@@ -8,7 +8,7 @@ CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_CPU_SUBTYPE_SH7724=y
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
 CONFIG_MEMORY_SIZE=0x10000000
 CONFIG_FLATMEM_MANUAL=y
 CONFIG_SH_ECOVEC=y
diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index 411fdc0901f7..e60e77c6edca 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -20,23 +20,21 @@ config PAGE_OFFSET
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 9 64 if PAGE_SIZE_16KB
-	default "9" if PAGE_SIZE_16KB
-	range 7 64 if PAGE_SIZE_64KB
-	default "7" if PAGE_SIZE_64KB
-	range 11 64
-	default "14" if !MMU
-	default "11"
+	range 8 63 if PAGE_SIZE_16KB
+	default "8" if PAGE_SIZE_16KB
+	range 6 63 if PAGE_SIZE_64KB
+	default "6" if PAGE_SIZE_64KB
+	range 10 63
+	default "13" if !MMU
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  The page size is not necessarily 4KB. Keep this in mind when
 	  choosing a value for this option.
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 4d3d1af90d52..099d0b31ea69 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -271,17 +271,15 @@ config ARCH_SPARSEMEM_DEFAULT
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "13"
+	default "12"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 13 means that the largest free memory block is 2^12 pages.
+	  increase this value. A value of 12 means that the largest free memory
+	  block is 2^12 pages.
 
 if SPARC64
 source "kernel/power/Kconfig"
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 384480971805..7d91ca6aa675 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -193,7 +193,7 @@ static void *dma_4v_alloc_coherent(struct device *dev, size_t size,
 
 	size = IO_PAGE_ALIGN(size);
 	order = get_order(size);
-	if (unlikely(order >= MAX_ORDER))
+	if (unlikely(order > MAX_ORDER))
 		return NULL;
 
 	npages = size >> IO_PAGE_SHIFT;
diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c
index 5b4de4a89dec..08ffd17d5ec3 100644
--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -897,7 +897,7 @@ void __init cheetah_ecache_flush_init(void)
 
 	/* Now allocate error trap reporting scoreboard. */
 	sz = NR_CPUS * (2 * sizeof(struct cheetah_err_info));
-	for (order = 0; order < MAX_ORDER; order++) {
+	for (order = 0; order <= MAX_ORDER; order++) {
 		if ((PAGE_SIZE << order) >= sz)
 			break;
 	}
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index bcb0c5d2abc2..2d1d91718263 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -773,17 +773,15 @@ config HIGHMEM
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "11"
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 endmenu
 
diff --git a/drivers/base/regmap/regmap-debugfs.c b/drivers/base/regmap/regmap-debugfs.c
index 817eda2075aa..c491fabe3617 100644
--- a/drivers/base/regmap/regmap-debugfs.c
+++ b/drivers/base/regmap/regmap-debugfs.c
@@ -226,8 +226,8 @@ static ssize_t regmap_read_debugfs(struct regmap *map, unsigned int from,
 	if (*ppos < 0 || !count)
 		return -EINVAL;
 
-	if (count > (PAGE_SIZE << (MAX_ORDER - 1)))
-		count = PAGE_SIZE << (MAX_ORDER - 1);
+	if (count > (PAGE_SIZE << MAX_ORDER))
+		count = PAGE_SIZE << MAX_ORDER;
 
 	buf = kmalloc(count, GFP_KERNEL);
 	if (!buf)
@@ -373,8 +373,8 @@ static ssize_t regmap_reg_ranges_read_file(struct file *file,
 	if (*ppos < 0 || !count)
 		return -EINVAL;
 
-	if (count > (PAGE_SIZE << (MAX_ORDER - 1)))
-		count = PAGE_SIZE << (MAX_ORDER - 1);
+	if (count > (PAGE_SIZE << MAX_ORDER))
+		count = PAGE_SIZE << MAX_ORDER;
 
 	buf = kmalloc(count, GFP_KERNEL);
 	if (!buf)
diff --git a/drivers/crypto/hisilicon/sgl.c b/drivers/crypto/hisilicon/sgl.c
index 2b6f2281cfd6..f30cf96b0a41 100644
--- a/drivers/crypto/hisilicon/sgl.c
+++ b/drivers/crypto/hisilicon/sgl.c
@@ -70,11 +70,11 @@ struct hisi_acc_sgl_pool *hisi_acc_create_sgl_pool(struct device *dev,
 			 HISI_ACC_SGL_ALIGN_SIZE);
 
 	/*
-	 * the pool may allocate a block of memory of size PAGE_SIZE * 2^(MAX_ORDER - 1),
+	 * the pool may allocate a block of memory of size PAGE_SIZE * 2^MAX_ORDER,
 	 * block size may exceed 2^31 on ia64, so the max of block size is 2^31
 	 */
-	block_size = 1 << (PAGE_SHIFT + MAX_ORDER <= 32 ?
-			   PAGE_SHIFT + MAX_ORDER - 1 : 31);
+	block_size = 1 << (PAGE_SHIFT + MAX_ORDER <= 31 ?
+			   PAGE_SHIFT + MAX_ORDER : 31);
 	sgl_num_per_block = block_size / sgl_size;
 	block_num = count / sgl_num_per_block;
 	remain_sgl = count % sgl_num_per_block;
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 72ce2c9f42fd..84498c7f845d 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -111,7 +111,7 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
 		do {
 			struct page *page;
 
-			GEM_BUG_ON(order >= MAX_ORDER);
+			GEM_BUG_ON(order > MAX_ORDER);
 			page = alloc_pages(GFP | __GFP_ZERO, order);
 			if (!page)
 				goto err;
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 21b61631f73a..85d19f425af6 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -64,11 +64,11 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[MAX_ORDER + 1];
+static struct ttm_pool_type global_uncached[MAX_ORDER + 1];
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER + 1];
+static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
@@ -382,7 +382,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	else
 		gfp_flags |= GFP_HIGHUSER;
 
-	for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
+	for (order = min_t(unsigned int, MAX_ORDER, __fls(num_pages));
 	     num_pages;
 	     order = min_t(unsigned int, order, __fls(num_pages))) {
 		bool apply_caching = false;
@@ -507,7 +507,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 
 	if (use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j <= MAX_ORDER; ++j)
 				ttm_pool_type_init(&pool->caching[i].orders[j],
 						   pool, i, j);
 	}
@@ -527,7 +527,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
 
 	if (pool->use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j <= MAX_ORDER; ++j)
 				ttm_pool_type_fini(&pool->caching[i].orders[j]);
 	}
 
@@ -581,7 +581,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
 	unsigned int i;
 
 	seq_puts(m, "\t ");
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i <= MAX_ORDER; ++i)
 		seq_printf(m, " ---%2u---", i);
 	seq_puts(m, "\n");
 }
@@ -592,7 +592,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i <= MAX_ORDER; ++i)
 		seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
 	seq_puts(m, "\n");
 }
@@ -701,7 +701,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i <= MAX_ORDER; ++i) {
 		ttm_pool_type_init(&global_write_combined[i], NULL,
 				   ttm_write_combined, i);
 		ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
@@ -734,7 +734,7 @@ void ttm_pool_mgr_fini(void)
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i <= MAX_ORDER; ++i) {
 		ttm_pool_type_fini(&global_write_combined[i]);
 		ttm_pool_type_fini(&global_uncached[i]);
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..c5ea361bf757 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -182,7 +182,7 @@
 #ifdef CONFIG_CMA_ALIGNMENT
 #define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + CONFIG_CMA_ALIGNMENT)
 #else
-#define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + MAX_ORDER - 1)
+#define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + MAX_ORDER)
 #endif
 
 /*
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5ff09de6c48f..c867432919d8 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2438,8 +2438,8 @@ static bool its_parse_indirect_baser(struct its_node *its,
 	 * feature is not supported by hardware.
 	 */
 	new_order = max_t(u32, get_order(esz << ids), new_order);
-	if (new_order >= MAX_ORDER) {
-		new_order = MAX_ORDER - 1;
+	if (new_order > MAX_ORDER) {
+		new_order = MAX_ORDER;
 		ids = ilog2(PAGE_ORDER_TO_SIZE(new_order) / (int)esz);
 		pr_warn("ITS@%pa: %s Table too large, reduce ids %llu->%u\n",
 			&its->phys_base, its_base_type_string[type],
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 09c7ed2650ca..d80e8d579274 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -405,7 +405,7 @@ static void __cache_size_refresh(void)
  * If the allocation may fail we use __get_free_pages. Memory fragmentation
  * won't have a fatal effect here, but it just causes flushes of some other
  * buffers and more I/O will be performed. Don't use __get_free_pages if it
- * always fails (i.e. order >= MAX_ORDER).
+ * always fails (i.e. order > MAX_ORDER).
  *
  * If the allocation shouldn't fail we use __vmalloc. This is only for the
  * initial reserve allocation, so there's no risk of wasting all vmalloc
diff --git a/drivers/misc/genwqe/card_utils.c b/drivers/misc/genwqe/card_utils.c
index 1167463f26fb..361514cd575c 100644
--- a/drivers/misc/genwqe/card_utils.c
+++ b/drivers/misc/genwqe/card_utils.c
@@ -210,7 +210,7 @@ u32 genwqe_crc32(u8 *buff, size_t len, u32 init)
 void *__genwqe_alloc_consistent(struct genwqe_dev *cd, size_t size,
 			       dma_addr_t *dma_handle)
 {
-	if (get_order(size) >= MAX_ORDER)
+	if (get_order(size) > MAX_ORDER)
 		return NULL;
 
 	return dma_alloc_coherent(&cd->pci_dev->dev, size, dma_handle,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 35d70041b9e8..6213879b926a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1038,7 +1038,7 @@ static void hns3_init_tx_spare_buffer(struct hns3_enet_ring *ring)
 		return;
 
 	order = get_order(alloc_size);
-	if (order >= MAX_ORDER) {
+	if (order > MAX_ORDER) {
 		if (net_ratelimit())
 			dev_warn(ring_to_dev(ring), "failed to allocate tx spare buffer, exceed to max order\n");
 		return;
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index e5c6ff3d0c47..608f9df67eb8 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -75,7 +75,7 @@
  * pool for the 4MB. Thus the 16 Rx and Tx queues require 32 * 5 = 160
  * plus 16 for the TSO pools for a total of 176 LTB mappings per VNIC.
  */
-#define IBMVNIC_ONE_LTB_MAX	((u32)((1 << (MAX_ORDER - 1)) * PAGE_SIZE))
+#define IBMVNIC_ONE_LTB_MAX	((u32)((1 << MAX_ORDER) * PAGE_SIZE))
 #define IBMVNIC_ONE_LTB_SIZE	min((u32)(8 << 20), IBMVNIC_ONE_LTB_MAX)
 #define IBMVNIC_LTB_SET_SIZE	(38 << 20)
 
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index 886c564787f1..a852ab6c1f52 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -944,8 +944,8 @@ static phys_addr_t hvfb_get_phymem(struct hv_device *hdev,
 	if (request_size == 0)
 		return -1;
 
-	if (order < MAX_ORDER) {
-		/* Call alloc_pages if the size is less than 2^MAX_ORDER */
+	if (order <= MAX_ORDER) {
+		/* Call alloc_pages if the size is no greater than 2^MAX_ORDER */
 		page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
 		if (!page)
 			return -1;
@@ -975,7 +975,7 @@ static void hvfb_release_phymem(struct hv_device *hdev,
 {
 	unsigned int order = get_order(size);
 
-	if (order < MAX_ORDER)
+	if (order <= MAX_ORDER)
 		__free_pages(pfn_to_page(paddr >> PAGE_SHIFT), order);
 	else
 		dma_free_coherent(&hdev->device,
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 3f78a3a1eb75..5b15936a5214 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -33,7 +33,7 @@
 #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
 					     __GFP_NOMEMALLOC)
 /* The order of free page blocks to report to host */
-#define VIRTIO_BALLOON_HINT_BLOCK_ORDER (MAX_ORDER - 1)
+#define VIRTIO_BALLOON_HINT_BLOCK_ORDER MAX_ORDER
 /* The size of a free page block in bytes */
 #define VIRTIO_BALLOON_HINT_BLOCK_BYTES \
 	(1 << (VIRTIO_BALLOON_HINT_BLOCK_ORDER + PAGE_SHIFT))
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 0c2892ec6817..0e1253e3423a 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1120,13 +1120,13 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
  */
 static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
 {
-	unsigned long order = MAX_ORDER - 1;
+	unsigned long order = MAX_ORDER;
 	unsigned long i;
 
 	/*
 	 * We might get called for ranges that don't cover properly aligned
-	 * MAX_ORDER - 1 pages; however, we can only online properly aligned
-	 * pages with an order of MAX_ORDER - 1 at maximum.
+	 * MAX_ORDER pages; however, we can only online properly aligned
+	 * pages with an order of MAX_ORDER at maximum.
 	 */
 	while (!IS_ALIGNED(pfn | nr_pages, 1 << order))
 		order--;
@@ -1237,7 +1237,7 @@ static void virtio_mem_online_page(struct virtio_mem *vm,
 	bool do_online;
 
 	/*
-	 * We can get called with any order up to MAX_ORDER - 1. If our
+	 * We can get called with any order up to MAX_ORDER. If our
 	 * subblock size is smaller than that and we have a mixture of plugged
 	 * and unplugged subblocks within such a page, we have to process in
 	 * smaller granularity. In that case we'll adjust the order exactly once
diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index cb240eac5036..c614ffdb28be 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -70,7 +70,7 @@ int ramfs_nommu_expand_for_mapping(struct inode *inode, size_t newsize)
 
 	/* make various checks */
 	order = get_order(newsize);
-	if (unlikely(order >= MAX_ORDER))
+	if (unlikely(order > MAX_ORDER))
 		return -EFBIG;
 
 	ret = inode_newsize_ok(inode, newsize);
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index ef09b23d29e3..8ce14f9d202a 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -72,7 +72,7 @@ struct ttm_pool {
 	bool use_dma32;
 
 	struct {
-		struct ttm_pool_type orders[MAX_ORDER];
+		struct ttm_pool_type orders[MAX_ORDER + 1];
 	} caching[TTM_NUM_CACHING_TYPES];
 };
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cfe15b32e2d4..179a13490bd6 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -786,7 +786,7 @@ static inline unsigned huge_page_shift(struct hstate *h)
 
 static inline bool hstate_is_gigantic(struct hstate *h)
 {
-	return huge_page_order(h) >= MAX_ORDER;
+	return huge_page_order(h) > MAX_ORDER;
 }
 
 static inline unsigned int pages_per_huge_page(const struct hstate *h)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4c8510f26b02..de1548f4fc07 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -25,11 +25,11 @@
 
 /* Free memory management - zoned buddy allocator.  */
 #ifndef CONFIG_ARCH_FORCE_MAX_ORDER
-#define MAX_ORDER 11
+#define MAX_ORDER 10
 #else
 #define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
 #endif
-#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
+#define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 
 /*
  * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed
@@ -92,7 +92,7 @@ static inline bool migratetype_is_mergeable(int mt)
 }
 
 #define for_each_migratetype_order(order, type) \
-	for (order = 0; order < MAX_ORDER; order++) \
+	for (order = 0; order <= MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
 
 extern int page_group_by_mobility_disabled;
@@ -819,7 +819,7 @@ struct zone {
 	CACHELINE_PADDING(_pad1_);
 
 	/* free areas of different sizes */
-	struct free_area	free_area[MAX_ORDER];
+	struct free_area	free_area[MAX_ORDER + 1];
 
 	/* zone flags, see below */
 	unsigned long		flags;
@@ -1588,7 +1588,7 @@ static inline bool movable_only_nodes(nodemask_t *nodes)
 #define SECTION_BLOCKFLAGS_BITS \
 	((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
 
-#if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
+#if (MAX_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
 #endif
 
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 5f1ae07d724b..e83c4c095041 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -41,14 +41,14 @@ extern unsigned int pageblock_order;
  * Huge pages are a constant size, but don't exceed the maximum allocation
  * granularity.
  */
-#define pageblock_order		min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER - 1)
+#define pageblock_order		min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER)
 
 #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
 
 #else /* CONFIG_HUGETLB_PAGE */
 
 /* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
-#define pageblock_order		(MAX_ORDER-1)
+#define pageblock_order		MAX_ORDER
 
 #endif /* CONFIG_HUGETLB_PAGE */
 
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 617a39f7db46..1553bb127d8c 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -257,8 +257,8 @@ static inline unsigned int arch_slab_minalign(void)
  * to do various tricks to work around compiler limitations in order to
  * ensure proper constant folding.
  */
-#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
-				(MAX_ORDER + PAGE_SHIFT - 1) : 25)
+#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT) <= 25 ? \
+				(MAX_ORDER + PAGE_SHIFT) : 25)
 #define KMALLOC_SHIFT_MAX	KMALLOC_SHIFT_HIGH
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	5
@@ -271,7 +271,7 @@ static inline unsigned int arch_slab_minalign(void)
  * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
  */
 #define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
-#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
+#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	3
 #endif
@@ -284,7 +284,7 @@ static inline unsigned int arch_slab_minalign(void)
  * be allocated from the same page.
  */
 #define KMALLOC_SHIFT_HIGH	PAGE_SHIFT
-#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
+#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	3
 #endif
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a0eb4d5cf557..245e2ee20718 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -471,7 +471,7 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_OFFSET(list_head, prev);
 	VMCOREINFO_OFFSET(vmap_area, va_start);
 	VMCOREINFO_OFFSET(vmap_area, list);
-	VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER);
+	VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER + 1);
 	log_buf_vmcoreinfo_setup();
 	VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES);
 	VMCOREINFO_NUMBER(NR_FREE_PAGES);
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 4d40dcce7604..1acec2e22827 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -84,8 +84,8 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
 	void *addr;
 	int ret = -ENOMEM;
 
-	/* Cannot allocate larger than MAX_ORDER-1 */
-	order = min(get_order(pool_size), MAX_ORDER-1);
+	/* Cannot allocate larger than MAX_ORDER */
+	order = min(get_order(pool_size), MAX_ORDER);
 
 	do {
 		pool_size = 1 << (PAGE_SHIFT + order);
@@ -190,7 +190,7 @@ static int __init dma_atomic_pool_init(void)
 
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
-	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
+	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER.
 	 */
 	if (!atomic_pool_size) {
 		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 726132039c38..981d7f64a671 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -814,7 +814,7 @@ struct perf_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
 	size = sizeof(struct perf_buffer);
 	size += nr_pages * sizeof(void *);
 
-	if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
+	if (order_base_2(size) > PAGE_SHIFT+MAX_ORDER)
 		goto fail;
 
 	node = (cpu == -1) ? cpu : cpu_to_node(cpu);
diff --git a/mm/Kconfig b/mm/Kconfig
index ceec438c0741..ae6711d24e4a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -301,7 +301,7 @@ config SHUFFLE_PAGE_ALLOCATOR
 	  the presence of a memory-side-cache. There are also incidental
 	  security benefits as it reduces the predictability of page
 	  allocations to compliment SLAB_FREELIST_RANDOM, but the
-	  default granularity of shuffling on the "MAX_ORDER - 1" i.e,
+	  default granularity of shuffling on the "MAX_ORDER" i.e,
 	  10th order of pages is selected based on cache utilization
 	  benefits on x86.
 
@@ -615,8 +615,8 @@ config HUGETLB_PAGE_SIZE_VARIABLE
 	  HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available
 	  on a platform.
 
-	  Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
-	  clamped down to MAX_ORDER - 1.
+	  Note that the pageblock_order cannot exceed MAX_ORDER and will be
+	  clamped down to MAX_ORDER.
 
 config CONTIG_ALLOC
 	def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
diff --git a/mm/compaction.c b/mm/compaction.c
index c4e4453187a2..a7927bb9480c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -584,7 +584,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 		if (PageCompound(page)) {
 			const unsigned int order = compound_order(page);
 
-			if (likely(order < MAX_ORDER)) {
+			if (likely(order <= MAX_ORDER)) {
 				blockpfn += (1UL << order) - 1;
 				cursor += (1UL << order) - 1;
 			}
@@ -939,7 +939,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			 * a valid page order. Consider only values in the
 			 * valid order range to prevent low_pfn overflow.
 			 */
-			if (freepage_order > 0 && freepage_order < MAX_ORDER)
+			if (freepage_order > 0 && freepage_order <= MAX_ORDER)
 				low_pfn += (1UL << freepage_order) - 1;
 			continue;
 		}
@@ -955,7 +955,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		if (PageCompound(page) && !cc->alloc_contig) {
 			const unsigned int order = compound_order(page);
 
-			if (likely(order < MAX_ORDER))
+			if (likely(order <= MAX_ORDER))
 				low_pfn += (1UL << order) - 1;
 			goto isolate_fail;
 		}
@@ -2127,7 +2127,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 
 	/* Direct compactor: Is a suitable page free? */
 	ret = COMPACT_NO_SUITABLE_PAGE;
-	for (order = cc->order; order < MAX_ORDER; order++) {
+	for (order = cc->order; order <= MAX_ORDER; order++) {
 		struct free_area *area = &cc->zone->free_area[order];
 		bool can_steal;
 
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index dc7df1254f0a..7e53c4a42047 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -1094,7 +1094,7 @@ debug_vm_pgtable_alloc_huge_page(struct pgtable_debug_args *args, int order)
 	struct page *page = NULL;
 
 #ifdef CONFIG_CONTIG_ALLOC
-	if (order >= MAX_ORDER) {
+	if (order > MAX_ORDER) {
 		page = alloc_contig_pages((1 << order), GFP_KERNEL,
 					  first_online_node, NULL);
 		if (page) {
@@ -1104,7 +1104,7 @@ debug_vm_pgtable_alloc_huge_page(struct pgtable_debug_args *args, int order)
 	}
 #endif
 
-	if (order < MAX_ORDER)
+	if (order <= MAX_ORDER)
 		page = alloc_pages(GFP_KERNEL, order);
 
 	return page;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1cc4a5f4791e..844e0e7e6f06 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -470,7 +470,7 @@ static int __init hugepage_init(void)
 	/*
 	 * hugepages can't be allocated by the buddy allocator
 	 */
-	MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER >= MAX_ORDER);
+	MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_ORDER);
 	/*
 	 * we use page->mapping and page->index in second tail page
 	 * as list_head: assuming THP order >= 2
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9b8526d27c29..84a4ea87f394 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1922,7 +1922,7 @@ pgoff_t hugetlb_basepage_index(struct page *page)
 	pgoff_t index = page_index(page_head);
 	unsigned long compound_idx;
 
-	if (compound_order(page_head) >= MAX_ORDER)
+	if (compound_order(page_head) > MAX_ORDER)
 		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
 	else
 		compound_idx = page - page_head;
@@ -4323,7 +4323,7 @@ static int __init default_hugepagesz_setup(char *s)
 	 * The number of default huge pages (for this size) could have been
 	 * specified as the first hugetlb parameter: hugepages=X.  If so,
 	 * then default_hstate_max_huge_pages is set.  If the default huge
-	 * page size is gigantic (>= MAX_ORDER), then the pages must be
+	 * page size is gigantic (> MAX_ORDER), then the pages must be
 	 * allocated here from bootmem allocator.
 	 */
 	if (default_hstate_max_huge_pages) {
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
index 7fb794242fad..6c0834274316 100644
--- a/mm/kmsan/init.c
+++ b/mm/kmsan/init.c
@@ -96,7 +96,7 @@ void __init kmsan_init_shadow(void)
 struct metadata_page_pair {
 	struct page *shadow, *origin;
 };
-static struct metadata_page_pair held_back[MAX_ORDER] __initdata;
+static struct metadata_page_pair held_back[MAX_ORDER + 1] __initdata;
 
 /*
  * Eager metadata allocation. When the memblock allocator is freeing pages to
@@ -141,7 +141,7 @@ struct smallstack {
 
 static struct smallstack collect = {
 	.index = 0,
-	.order = MAX_ORDER,
+	.order = MAX_ORDER + 1,
 };
 
 static void smallstack_push(struct smallstack *stack, struct page *pages)
@@ -211,8 +211,8 @@ static void kmsan_memblock_discard(void)
 	 *    order=N-1,
 	 *  - repeat.
 	 */
-	collect.order = MAX_ORDER - 1;
-	for (int i = MAX_ORDER - 1; i >= 0; i--) {
+	collect.order = MAX_ORDER;
+	for (int i = MAX_ORDER; i >= 0; i--) {
 		if (held_back[i].shadow)
 			smallstack_push(&collect, held_back[i].shadow);
 		if (held_back[i].origin)
diff --git a/mm/memblock.c b/mm/memblock.c
index 511d4783dcf1..acbc77367faf 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2030,7 +2030,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 	int order;
 
 	while (start < end) {
-		order = min(MAX_ORDER - 1UL, __ffs(start));
+		order = min_t(unsigned long, MAX_ORDER, __ffs(start));
 
 		while (start + (1UL << order) > end)
 			order--;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fd40f7e9f176..2985d5302c57 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -596,7 +596,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 	unsigned long pfn;
 
 	/*
-	 * Online the pages in MAX_ORDER - 1 aligned chunks. The callback might
+	 * Online the pages in MAX_ORDER aligned chunks. The callback might
 	 * decide to not expose all pages to the buddy (e.g., expose them
 	 * later). We account all pages as being online and belonging to this
 	 * zone ("present").
@@ -605,7 +605,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 	 * this and the first chunk to online will be pageblock_nr_pages.
 	 */
 	for (pfn = start_pfn; pfn < end_pfn;) {
-		int order = min(MAX_ORDER - 1UL, __ffs(pfn));
+		int order = min_t(unsigned long, MAX_ORDER, __ffs(pfn));
 
 		(*online_page_callback)(pfn_to_page(pfn), order);
 		pfn += (1UL << order);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d7b20bf09c1c..b3dd5248e63d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -850,7 +850,7 @@ static int __init debug_guardpage_minorder_setup(char *buf)
 {
 	unsigned long res;
 
-	if (kstrtoul(buf, 10, &res) < 0 ||  res > MAX_ORDER / 2) {
+	if (kstrtoul(buf, 10, &res) < 0 ||  res > (MAX_ORDER + 1) / 2) {
 		pr_err("Bad debug_guardpage_minorder value\n");
 		return 0;
 	}
@@ -1073,7 +1073,7 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
 	unsigned long higher_page_pfn;
 	struct page *higher_page;
 
-	if (order >= MAX_ORDER - 2)
+	if (order >= MAX_ORDER - 1)
 		return false;
 
 	higher_page_pfn = buddy_pfn & pfn;
@@ -1128,7 +1128,7 @@ static inline void __free_one_page(struct page *page,
 	VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
 
-	while (order < MAX_ORDER - 1) {
+	while (order < MAX_ORDER) {
 		if (compaction_capture(capc, page, order, migratetype)) {
 			__mod_zone_freepage_state(zone, -(1 << order),
 								migratetype);
@@ -2565,7 +2565,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 	struct page *page;
 
 	/* Find a page of the appropriate size in the preferred list */
-	for (current_order = order; current_order < MAX_ORDER; ++current_order) {
+	for (current_order = order; current_order <= MAX_ORDER; ++current_order) {
 		area = &(zone->free_area[current_order]);
 		page = get_page_from_free_area(area, migratetype);
 		if (!page)
@@ -2940,7 +2940,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 			continue;
 
 		spin_lock_irqsave(&zone->lock, flags);
-		for (order = 0; order < MAX_ORDER; order++) {
+		for (order = 0; order <= MAX_ORDER; order++) {
 			struct free_area *area = &(zone->free_area[order]);
 
 			page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC);
@@ -3024,7 +3024,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	 * approximates finding the pageblock with the most free pages, which
 	 * would be too costly to do exactly.
 	 */
-	for (current_order = MAX_ORDER - 1; current_order >= min_order;
+	for (current_order = MAX_ORDER; current_order >= min_order;
 				--current_order) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
@@ -3050,7 +3050,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	return false;
 
 find_smallest:
-	for (current_order = order; current_order < MAX_ORDER;
+	for (current_order = order; current_order <= MAX_ORDER;
 							current_order++) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
@@ -3063,7 +3063,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	 * This should not happen - we already found a suitable fallback
 	 * when looking for the largest page.
 	 */
-	VM_BUG_ON(current_order == MAX_ORDER);
+	VM_BUG_ON(current_order == MAX_ORDER + 1);
 
 do_steal:
 	page = get_page_from_free_area(area, fallback_mt);
@@ -4008,7 +4008,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		return true;
 
 	/* For a high-order request, check at least one suitable page is free */
-	for (o = order; o < MAX_ORDER; o++) {
+	for (o = order; o <= MAX_ORDER; o++) {
 		struct free_area *area = &z->free_area[o];
 		int mt;
 
@@ -5518,7 +5518,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
 	 * There are several places where we assume that the order value is sane
 	 * so bail out early if the request is out of bound.
 	 */
-	if (WARN_ON_ONCE_GFP(order >= MAX_ORDER, gfp))
+	if (WARN_ON_ONCE_GFP(order > MAX_ORDER, gfp))
 		return NULL;
 
 	gfp &= gfp_allowed_mask;
@@ -6237,8 +6237,8 @@ void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_i
 
 	for_each_populated_zone(zone) {
 		unsigned int order;
-		unsigned long nr[MAX_ORDER], flags, total = 0;
-		unsigned char types[MAX_ORDER];
+		unsigned long nr[MAX_ORDER + 1], flags, total = 0;
+		unsigned char types[MAX_ORDER + 1];
 
 		if (zone_idx(zone) > max_zone_idx)
 			continue;
@@ -6248,7 +6248,7 @@ void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_i
 		printk(KERN_CONT "%s: ", zone->name);
 
 		spin_lock_irqsave(&zone->lock, flags);
-		for (order = 0; order < MAX_ORDER; order++) {
+		for (order = 0; order <= MAX_ORDER; order++) {
 			struct free_area *area = &zone->free_area[order];
 			int type;
 
@@ -6262,7 +6262,7 @@ void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_i
 			}
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
-		for (order = 0; order < MAX_ORDER; order++) {
+		for (order = 0; order <= MAX_ORDER; order++) {
 			printk(KERN_CONT "%lu*%lukB ",
 			       nr[order], K(1UL) << order);
 			if (nr[order])
@@ -7605,7 +7605,7 @@ static inline void setup_usemap(struct zone *zone) {}
 /* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
 void __init set_pageblock_order(void)
 {
-	unsigned int order = MAX_ORDER - 1;
+	unsigned int order = MAX_ORDER;
 
 	/* Check that pageblock_nr_pages has not already been setup */
 	if (pageblock_order)
@@ -9101,7 +9101,7 @@ void *__init alloc_large_system_hash(const char *tablename,
 			else
 				table = memblock_alloc_raw(size,
 							   SMP_CACHE_BYTES);
-		} else if (get_order(size) >= MAX_ORDER || hashdist) {
+		} else if (get_order(size) > MAX_ORDER || hashdist) {
 			table = vmalloc_huge(size, gfp_flags);
 			virt = true;
 			if (table)
@@ -9315,7 +9315,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	order = 0;
 	outer_start = start;
 	while (!PageBuddy(pfn_to_page(outer_start))) {
-		if (++order >= MAX_ORDER) {
+		if (++order > MAX_ORDER) {
 			outer_start = start;
 			break;
 		}
@@ -9565,7 +9565,7 @@ bool is_free_buddy_page(struct page *page)
 	unsigned long pfn = page_to_pfn(page);
 	unsigned int order;
 
-	for (order = 0; order < MAX_ORDER; order++) {
+	for (order = 0; order <= MAX_ORDER; order++) {
 		struct page *page_head = page - (pfn & ((1 << order) - 1));
 
 		if (PageBuddy(page_head) &&
@@ -9573,7 +9573,7 @@ bool is_free_buddy_page(struct page *page)
 			break;
 	}
 
-	return order < MAX_ORDER;
+	return order <= MAX_ORDER;
 }
 EXPORT_SYMBOL(is_free_buddy_page);
 
@@ -9624,7 +9624,7 @@ bool take_page_off_buddy(struct page *page)
 	bool ret = false;
 
 	spin_lock_irqsave(&zone->lock, flags);
-	for (order = 0; order < MAX_ORDER; order++) {
+	for (order = 0; order <= MAX_ORDER; order++) {
 		struct page *page_head = page - (pfn & ((1 << order) - 1));
 		int page_order = buddy_order(page_head);
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 04141a9bea70..0c2b7d4f7054 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -226,7 +226,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype)
 	 */
 	if (PageBuddy(page)) {
 		order = buddy_order(page);
-		if (order >= pageblock_order && order < MAX_ORDER - 1) {
+		if (order >= pageblock_order && order <= MAX_ORDER) {
 			buddy = find_buddy_page_pfn(page, page_to_pfn(page),
 						    order, NULL);
 			if (buddy && !is_migrate_isolate_page(buddy)) {
@@ -290,11 +290,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  *			isolate_single_pageblock()
  * @migratetype:	migrate type to set in error recovery.
  *
- * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
+ * Free and in-use pages can be as big as MAX_ORDER and contain more than one
  * pageblock. When not all pageblocks within a page are isolated at the same
  * time, free page accounting can go wrong. For example, in the case of
- * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
- * [         MAX_ORDER-1         ]
+ * MAX_ORDER = pageblock_order + 1, a MAX_ORDER page has two pagelbocks.
+ * [         MAX_ORDER           ]
  * [  pageblock0  |  pageblock1  ]
  * When either pageblock is isolated, if it is a free page, the page is not
  * split into separate migratetype lists, which is supposed to; if it is an
@@ -451,7 +451,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 				 * the free page to the right migratetype list.
 				 *
 				 * head_pfn is not used here as a hugetlb page order
-				 * can be bigger than MAX_ORDER-1, but after it is
+				 * can be bigger than MAX_ORDER, but after it is
 				 * freed, the free page order is not. Use pfn within
 				 * the range to find the head of the free page.
 				 */
@@ -459,7 +459,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 				outer_pfn = pfn;
 				while (!PageBuddy(pfn_to_page(outer_pfn))) {
 					/* stop if we cannot find the free page */
-					if (++order >= MAX_ORDER)
+					if (++order > MAX_ORDER)
 						goto failed;
 					outer_pfn &= ~0UL << order;
 				}
@@ -641,7 +641,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	int ret;
 
 	/*
-	 * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
+	 * Note: pageblock_order != MAX_ORDER. Then, chunks of free pages
 	 * are not aligned to pageblock_nr_pages.
 	 * Then we just check migratetype first.
 	 */
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 2d27f532df4c..54a559f7bce4 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -313,7 +313,7 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 				unsigned long freepage_order;
 
 				freepage_order = buddy_order_unsafe(page);
-				if (freepage_order < MAX_ORDER)
+				if (freepage_order <= MAX_ORDER)
 					pfn += (1UL << freepage_order) - 1;
 				continue;
 			}
@@ -547,7 +547,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 		if (PageBuddy(page)) {
 			unsigned long freepage_order = buddy_order_unsafe(page);
 
-			if (freepage_order < MAX_ORDER)
+			if (freepage_order <= MAX_ORDER)
 				pfn += (1UL << freepage_order) - 1;
 			continue;
 		}
@@ -655,7 +655,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 			if (PageBuddy(page)) {
 				unsigned long order = buddy_order_unsafe(page);
 
-				if (order > 0 && order < MAX_ORDER)
+				if (order > 0 && order <= MAX_ORDER)
 					pfn += (1UL << order) - 1;
 				continue;
 			}
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 382958eef8a9..d52a55bca6d5 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -11,7 +11,7 @@
 #include "page_reporting.h"
 #include "internal.h"
 
-unsigned int page_reporting_order = MAX_ORDER;
+unsigned int page_reporting_order = MAX_ORDER + 1;
 module_param(page_reporting_order, uint, 0644);
 MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
 
@@ -244,7 +244,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
 		return err;
 
 	/* Process each free list starting from lowest order/mt */
-	for (order = page_reporting_order; order < MAX_ORDER; order++) {
+	for (order = page_reporting_order; order <= MAX_ORDER; order++) {
 		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
 			/* We do not pull pages from the isolate free list */
 			if (is_migrate_isolate(mt))
diff --git a/mm/shuffle.h b/mm/shuffle.h
index cec62984f7d3..a6bdf54f96f1 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -4,7 +4,7 @@
 #define _MM_SHUFFLE_H
 #include <linux/jump_label.h>
 
-#define SHUFFLE_ORDER (MAX_ORDER-1)
+#define SHUFFLE_ORDER MAX_ORDER
 
 #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
 DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
diff --git a/mm/slab.c b/mm/slab.c
index 10e96137b44f..530f418a4930 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -466,7 +466,7 @@ static int __init slab_max_order_setup(char *str)
 {
 	get_option(&str, &slab_max_order);
 	slab_max_order = slab_max_order < 0 ? 0 :
-				min(slab_max_order, MAX_ORDER - 1);
+				min(slab_max_order, MAX_ORDER);
 	slab_max_order_set = true;
 
 	return 1;
diff --git a/mm/slub.c b/mm/slub.c
index ce8310e131b3..d001573b67c7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3892,7 +3892,7 @@ static inline int calculate_order(unsigned int size)
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */
 	order = calc_slab_order(size, 1, MAX_ORDER, 1);
-	if (order < MAX_ORDER)
+	if (order <= MAX_ORDER)
 		return order;
 	return -ENOSYS;
 }
@@ -4403,7 +4403,7 @@ __setup("slub_min_order=", setup_slub_min_order);
 static int __init setup_slub_max_order(char *str)
 {
 	get_option(&str, (int *)&slub_max_order);
-	slub_max_order = min(slub_max_order, (unsigned int)MAX_ORDER - 1);
+	slub_max_order = min_t(unsigned int, slub_max_order, MAX_ORDER);
 
 	return 1;
 }
diff --git a/mm/vmstat.c b/mm/vmstat.c
index bd8040f25c27..e0bd2978305d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1067,7 +1067,7 @@ static void fill_contig_page_info(struct zone *zone,
 	info->free_blocks_total = 0;
 	info->free_blocks_suitable = 0;
 
-	for (order = 0; order < MAX_ORDER; order++) {
+	for (order = 0; order <= MAX_ORDER; order++) {
 		unsigned long blocks;
 
 		/*
@@ -1100,7 +1100,7 @@ static int __fragmentation_index(unsigned int order, struct contig_page_info *in
 {
 	unsigned long requested = 1UL << order;
 
-	if (WARN_ON_ONCE(order >= MAX_ORDER))
+	if (WARN_ON_ONCE(order > MAX_ORDER))
 		return 0;
 
 	if (!info->free_blocks_total)
@@ -1470,7 +1470,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
 	int order;
 
 	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
-	for (order = 0; order < MAX_ORDER; ++order)
+	for (order = 0; order <= MAX_ORDER; ++order)
 		/*
 		 * Access to nr_free is lockless as nr_free is used only for
 		 * printing purposes. Use data_race to avoid KCSAN warning.
@@ -1499,7 +1499,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 					pgdat->node_id,
 					zone->name,
 					migratetype_names[mtype]);
-		for (order = 0; order < MAX_ORDER; ++order) {
+		for (order = 0; order <= MAX_ORDER; ++order) {
 			unsigned long freecount = 0;
 			struct free_area *area;
 			struct list_head *curr;
@@ -1539,7 +1539,7 @@ static void pagetypeinfo_showfree(struct seq_file *m, void *arg)
 
 	/* Print header */
 	seq_printf(m, "%-43s ", "Free pages count per migrate type at order");
-	for (order = 0; order < MAX_ORDER; ++order)
+	for (order = 0; order <= MAX_ORDER; ++order)
 		seq_printf(m, "%6d ", order);
 	seq_putc(m, '\n');
 
@@ -2161,7 +2161,7 @@ static void unusable_show_print(struct seq_file *m,
 	seq_printf(m, "Node %d, zone %8s ",
 				pgdat->node_id,
 				zone->name);
-	for (order = 0; order < MAX_ORDER; ++order) {
+	for (order = 0; order <= MAX_ORDER; ++order) {
 		fill_contig_page_info(zone, order, &info);
 		index = unusable_free_index(order, &info);
 		seq_printf(m, "%d.%03d ", index / 1000, index % 1000);
@@ -2213,7 +2213,7 @@ static void extfrag_show_print(struct seq_file *m,
 	seq_printf(m, "Node %d, zone %8s ",
 				pgdat->node_id,
 				zone->name);
-	for (order = 0; order < MAX_ORDER; ++order) {
+	for (order = 0; order <= MAX_ORDER; ++order) {
 		fill_contig_page_info(zone, order, &info);
 		index = __fragmentation_index(order, &info);
 		seq_printf(m, "%2d.%03d ", index / 1000, index % 1000);
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 854772dd52fd..9b66d6aeeb1a 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -843,7 +843,7 @@ long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev)
 		goto out;
 	/* the calculated number of cq entries fits to mlx5 cq allocation */
 	cqe_size_order = cache_line_size() == 128 ? 7 : 6;
-	smc_order = MAX_ORDER - cqe_size_order - 1;
+	smc_order = MAX_ORDER - cqe_size_order;
 	if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE)
 		cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2;
 	smcibdev->roce_cq_send = ib_create_cq(smcibdev->ibdev,
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 18effbe1fe90..dd2e5d53d70e 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -7374,6 +7374,14 @@ sub process {
 			}
 		}
 
+# check for MAX_ORDER uses as its semantics has changed.
+# MAX_ORDER now really means the max order of a page that can come out of
+# kernel buddy allocator
+        if ($line =~ /MAX_ORDER/) {
+            WARN("MAX_ORDER",
+                 "MAX_ORDER has changed its semantics. The max order of a page that can be allocated from buddy allocator is MAX_ORDER instead of MAX_ORDER - 1.")
+        }
+
 # Mode permission misuses where it seems decimal should be octal
 # This uses a shortcut match to avoid unnecessary uses of a slow foreach loop
 # o Ignore module_param*(...) uses with a decimal 0 permission as that has a
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c
index 64499056648a..51ad29940f05 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -38,7 +38,7 @@ static int param_set_bufsize(const char *val, const struct kernel_param *kp)
 
 	size = memparse(val, NULL);
 	order = get_order(size);
-	if (order >= MAX_ORDER)
+	if (order > MAX_ORDER)
 		return -EINVAL;
 	ima_maxorder = order;
 	ima_bufsize = PAGE_SIZE << order;
diff --git a/tools/testing/memblock/linux/mmzone.h b/tools/testing/memblock/linux/mmzone.h
index e65f89b12f1c..134f8eab0768 100644
--- a/tools/testing/memblock/linux/mmzone.h
+++ b/tools/testing/memblock/linux/mmzone.h
@@ -17,10 +17,10 @@ enum zone_type {
 };
 
 #define MAX_NR_ZONES __MAX_NR_ZONES
-#define MAX_ORDER 11
-#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
+#define MAX_ORDER 10
+#define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 
-#define pageblock_order		(MAX_ORDER - 1)
+#define pageblock_order		MAX_ORDER
 #define pageblock_nr_pages	BIT(pageblock_order)
 #define pageblock_align(pfn)	ALIGN((pfn), pageblock_nr_pages)
 #define pageblock_start_pfn(pfn)	ALIGN_DOWN((pfn), pageblock_nr_pages)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 02/12] mm: check page validity when find a buddy page in a non-contiguous zone
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
  2022-09-22  1:12 ` [PATCH v1 01/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 03/12] mm: adapt deferred struct page init to new MAX_ORDER Zi Yan
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

When MAX_ORDER > section size, buddy page might not be valid when the zone
is non-contiguous. Check it and return NULL if buddy page is not valid.

For PFNs that not aligned to MAX_ORDER (usually at the beginning and end
of a zone), __free_pages_memory() clamps down the order to make sure
invalid PFN will not show up as a buddy PFN.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/internal.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/internal.h b/mm/internal.h
index b3002e03c28f..22fb1e6e3541 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -330,12 +330,16 @@ static inline struct page *find_buddy_page_pfn(struct page *page,
 			unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
 {
 	unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
+	struct zone *zone = page_zone(page);
 	struct page *buddy;
 
 	buddy = page + (__buddy_pfn - pfn);
 	if (buddy_pfn)
 		*buddy_pfn = __buddy_pfn;
 
+	if (unlikely(!zone->contiguous && !pfn_valid(__buddy_pfn)))
+		return NULL;
+
 	if (page_is_buddy(page, buddy, order))
 		return buddy;
 	return NULL;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 03/12] mm: adapt deferred struct page init to new MAX_ORDER.
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
  2022-09-22  1:12 ` [PATCH v1 01/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator Zi Yan
  2022-09-22  1:12 ` [PATCH v1 02/12] mm: check page validity when find a buddy page in a non-contiguous zone Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 04/12] mm: prevent pageblock size being larger than section size Zi Yan
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

deferred_init only initializes first section of a zone and defers the
rest and the rest of the zone will be initialized in size of a section.
When MAX_ORDER grows beyond a section size, early_page_uninitialised()
did not prevent pages beyond first section from initialization, since it
only checked the starting pfn and assumes MAX_ORDER is smaller than
a section size. In addition, deferred_init_maxorder() uses
MAX_ORDER_NR_PAGES as the initialization unit, which can cause the
initialized chunk of memory overlapping with other initialization jobs.

For the first issue, make early_page_uninitialised() decrease the order
for non-deferred memory initialization when it is bigger than first
section. For the second issue, when adjust pfn alignment in
deferred_init_maxorder(), make sure the alignment is not bigger than
a section size.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/internal.h   |  2 +-
 mm/memblock.c   |  6 ++++--
 mm/page_alloc.c | 28 ++++++++++++++++++++--------
 3 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 22fb1e6e3541..d688c0320cda 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -361,7 +361,7 @@ extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __putback_isolated_page(struct page *page, unsigned int order,
 				    int mt);
 extern void memblock_free_pages(struct page *page, unsigned long pfn,
-					unsigned int order);
+					unsigned int *order);
 extern void __free_pages_core(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned int order);
 extern void post_alloc_hook(struct page *page, unsigned int order,
diff --git a/mm/memblock.c b/mm/memblock.c
index acbc77367faf..b957c12a93e7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1640,7 +1640,9 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
 	end = PFN_DOWN(base + size);
 
 	for (; cursor < end; cursor++) {
-		memblock_free_pages(pfn_to_page(cursor), cursor, 0);
+		unsigned int order = 0;
+
+		memblock_free_pages(pfn_to_page(cursor), cursor, &order);
 		totalram_pages_inc();
 	}
 }
@@ -2035,7 +2037,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 		while (start + (1UL << order) > end)
 			order--;
 
-		memblock_free_pages(pfn_to_page(start), start, order);
+		memblock_free_pages(pfn_to_page(start), start, &order);
 
 		start += (1UL << order);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b3dd5248e63d..e3af87d89ebf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -464,13 +464,19 @@ static inline bool deferred_pages_enabled(void)
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
-static inline bool __meminit early_page_uninitialised(unsigned long pfn)
+static inline bool __meminit early_page_uninitialised(unsigned long pfn, unsigned int *order)
 {
 	int nid = early_pfn_to_nid(pfn);
 
 	if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn)
 		return true;
 
+	/* clamp down order to not exceed first_deferred_pfn */
+	if (order)
+		*order = min_t(unsigned int,
+			       *order,
+			       ilog2(NODE_DATA(nid)->first_deferred_pfn - pfn));
+
 	return false;
 }
 
@@ -518,7 +524,7 @@ static inline bool deferred_pages_enabled(void)
 	return false;
 }
 
-static inline bool early_page_uninitialised(unsigned long pfn)
+static inline bool early_page_uninitialised(unsigned long pfn, unsigned int *order)
 {
 	return false;
 }
@@ -1653,7 +1659,7 @@ static void __meminit init_reserved_page(unsigned long pfn)
 	pg_data_t *pgdat;
 	int nid, zid;
 
-	if (!early_page_uninitialised(pfn))
+	if (!early_page_uninitialised(pfn, NULL))
 		return;
 
 	nid = early_pfn_to_nid(pfn);
@@ -1809,15 +1815,15 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 #endif /* CONFIG_NUMA */
 
 void __init memblock_free_pages(struct page *page, unsigned long pfn,
-							unsigned int order)
+							unsigned int *order)
 {
-	if (early_page_uninitialised(pfn))
+	if (early_page_uninitialised(pfn, order))
 		return;
-	if (!kmsan_memblock_free_pages(page, order)) {
+	if (!kmsan_memblock_free_pages(page, *order)) {
 		/* KMSAN will take care of these pages. */
 		return;
 	}
-	__free_pages_core(page, order);
+	__free_pages_core(page, *order);
 }
 
 /*
@@ -2036,7 +2042,13 @@ static unsigned long __init
 deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn,
 		       unsigned long *end_pfn)
 {
-	unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES);
+	/*
+	 * deferred_init_memmap_chunk gives out jobs with max size to
+	 * PAGES_PER_SECTION. Do not align mo_pfn beyond that.
+	 */
+	unsigned long align = min_t(unsigned long,
+				MAX_ORDER_NR_PAGES, PAGES_PER_SECTION);
+	unsigned long mo_pfn = ALIGN(*start_pfn + 1, align);
 	unsigned long spfn = *start_pfn, epfn = *end_pfn;
 	unsigned long nr_pages = 0;
 	u64 j = *i;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 04/12] mm: prevent pageblock size being larger than section size.
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (2 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 03/12] mm: adapt deferred struct page init to new MAX_ORDER Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 05/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore() Zi Yan
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

Only physical pages from a section can be guaranteed to be contiguous
and so far a pageblock can only group contiguous physical pages by
design. Set pageblock_order properly to prevent pageblock going beyond
section size.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/pageblock-flags.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index e83c4c095041..95589b24fff9 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -47,8 +47,11 @@ extern unsigned int pageblock_order;
 
 #else /* CONFIG_HUGETLB_PAGE */
 
-/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
-#define pageblock_order		MAX_ORDER
+/*
+ * If huge pages are not used, group by MAX_ORDER_NR_PAGES or
+ * PAGES_PER_SECTION when MAX_ORDER_NR_PAGES is larger.
+ */
+#define pageblock_order		(min(PFN_SECTION_SHIFT, MAX_ORDER))
 
 #endif /* CONFIG_HUGETLB_PAGE */
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 05/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore()
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (3 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 04/12] mm: prevent pageblock size being larger than section size Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 06/12] virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER Zi Yan
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

MAX_ORDER_NR_PAGES can be increased when it becomes a boot time parameter
in later commits. To make sure read_kcore() reschedule its work in a
constant period, use pageblock_nr_pages instead for reschedule period,
since pageblock_nr_pages is a constant and either the same or half of
MAX_ORDER_NR_PAGES.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Ying Chen <chenying.kernel@bytedance.com>
Cc: Feng Zhou <zhoufeng.zf@bytedance.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 fs/proc/kcore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index dff921f7ca33..7dc09d211b48 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -491,7 +491,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 			}
 		}
 
-		if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) {
+		if (page_offline_frozen++ % pageblock_nr_pages == 0) {
 			page_offline_thaw();
 			cond_resched();
 			page_offline_freeze();
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 06/12] virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (4 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 05/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore() Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 07/12] mm/page_reporting: set page_reporting_order to -1 to prevent it running Zi Yan
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

virtio_balloon used MAX_ORDER to report free page blocks to host, as
MAX_ORDER becomes modifiable in later commits, the reported free size might
be too big. pageblock_order is either 1/2 of or the same as MAX_ORDER
currently. Use pageblock_order instead to make virtio_balloon have a
constant free page block report size when MAX_ORDER is changed in the later
commits.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/virtio/virtio_balloon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 5b15936a5214..51447737538b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -33,7 +33,7 @@
 #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
 					     __GFP_NOMEMALLOC)
 /* The order of free page blocks to report to host */
-#define VIRTIO_BALLOON_HINT_BLOCK_ORDER MAX_ORDER
+#define VIRTIO_BALLOON_HINT_BLOCK_ORDER pageblock_order
 /* The size of a free page block in bytes */
 #define VIRTIO_BALLOON_HINT_BLOCK_BYTES \
 	(1 << (VIRTIO_BALLOON_HINT_BLOCK_ORDER + PAGE_SHIFT))
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 07/12] mm/page_reporting: set page_reporting_order to -1 to prevent it running
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (5 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 06/12] virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 08/12] mm: replace MAX_ORDER when it is used to indicate max physical contiguity Zi Yan
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

page_reporting_order was initialized to MAX_ORDER to prevent it running
before its value is overwritten. Use -1 instead to remove the
dependency on MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/page_reporting.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index d52a55bca6d5..e3d6f4af34e0 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -11,7 +11,11 @@
 #include "page_reporting.h"
 #include "internal.h"
 
-unsigned int page_reporting_order = MAX_ORDER + 1;
+/*
+ * Set page_reporting_order to (unsigned int)-1 to prevent it running until the
+ * value is being overwritten
+ */
+unsigned int page_reporting_order = (unsigned int)-1;
 module_param(page_reporting_order, uint, 0644);
 MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 08/12] mm: replace MAX_ORDER when it is used to indicate max physical contiguity.
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (6 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 07/12] mm/page_reporting: set page_reporting_order to -1 to prevent it running Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

MAX_ORDER is limited at a memory section size, thus widely used as
a variable to indicate maximum physically contiguous page size. But this
limitation is no longer necessary as kernel only supports sparse memory
model. Add a new variable MAX_PHYS_CONTIG_ORDER to replace such uses of
MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 arch/sparc/mm/tsb.c                             |  4 ++--
 arch/um/kernel/um_arch.c                        |  4 ++--
 include/linux/pageblock-flags.h                 | 12 ++++++++++++
 kernel/dma/pool.c                               |  8 ++++----
 mm/hugetlb.c                                    |  2 +-
 mm/internal.h                                   |  2 +-
 mm/memory.c                                     |  4 ++--
 mm/memory_hotplug.c                             |  6 +++---
 mm/page_isolation.c                             |  2 +-
 mm/page_reporting.c                             |  2 +-
 11 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e1094851c328..5f633844daac 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3907,7 +3907,7 @@
 			[KNL] Minimal page reporting order
 			Format: <integer>
 			Adjust the minimal page reporting order. The page
-			reporting is disabled when it exceeds MAX_ORDER.
+			reporting is disabled when it exceeds MAX_PHYS_CONTIG_ORDER.
 
 	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
 			timeout > 0: seconds before rebooting
diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
index 912205787161..15c31d050dab 100644
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -402,8 +402,8 @@ void tsb_grow(struct mm_struct *mm, unsigned long tsb_index, unsigned long rss)
 	unsigned long new_rss_limit;
 	gfp_t gfp_flags;
 
-	if (max_tsb_size > (PAGE_SIZE << MAX_ORDER))
-		max_tsb_size = (PAGE_SIZE << MAX_ORDER);
+	if (max_tsb_size > (PAGE_SIZE << MAX_PHYS_CONTIG_ORDER))
+		max_tsb_size = (PAGE_SIZE << MAX_PHYS_CONTIG_ORDER);
 
 	new_cache_index = 0;
 	for (new_size = 8192; new_size < max_tsb_size; new_size <<= 1UL) {
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index e0de60e503b9..52a474f4f1c7 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -368,10 +368,10 @@ int __init linux_main(int argc, char **argv)
 	max_physmem = TASK_SIZE - uml_physmem - iomem_size - MIN_VMALLOC;
 
 	/*
-	 * Zones have to begin on a 1 << MAX_ORDER page boundary,
+	 * Zones have to begin on a 1 << MAX_PHYS_CONTIG_ORDER page boundary,
 	 * so this makes sure that's true for highmem
 	 */
-	max_physmem &= ~((1 << (PAGE_SHIFT + MAX_ORDER)) - 1);
+	max_physmem &= ~((1 << (PAGE_SHIFT + MAX_PHYS_CONTIG_ORDER)) - 1);
 	if (physmem_size + iomem_size > max_physmem) {
 		highmem = physmem_size + iomem_size - max_physmem;
 		physmem_size -= highmem;
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 95589b24fff9..5469ba6f97c1 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -61,6 +61,18 @@ extern unsigned int pageblock_order;
 #define pageblock_start_pfn(pfn)	ALIGN_DOWN((pfn), pageblock_nr_pages)
 #define pageblock_end_pfn(pfn)		ALIGN((pfn) + 1, pageblock_nr_pages)
 
+/*
+ * memory section is only defined in sparsemem and in flatmem, pages are always
+ * physically contiguous, but we use MAX_ORDER since all users assume so.
+ */
+#ifdef CONFIG_FLATMEM
+#define MAX_PHYS_CONTIG_ORDER	MAX_ORDER
+#else /* SPARSEMEM */
+#define MAX_PHYS_CONTIG_ORDER	(min(PFN_SECTION_SHIFT, MAX_ORDER))
+#endif /* CONFIG_FLATMEM */
+
+#define MAX_PHYS_CONTIG_NR_PAGES	(1UL << MAX_PHYS_CONTIG_ORDER)
+
 /* Forward declaration */
 struct page;
 
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 1acec2e22827..11af1063eb86 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -84,8 +84,8 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
 	void *addr;
 	int ret = -ENOMEM;
 
-	/* Cannot allocate larger than MAX_ORDER */
-	order = min(get_order(pool_size), MAX_ORDER);
+	/* Cannot allocate larger than MAX_PHYS_CONTIG_ORDER */
+	order = min(get_order(pool_size), MAX_PHYS_CONTIG_ORDER);
 
 	do {
 		pool_size = 1 << (PAGE_SHIFT + order);
@@ -190,11 +190,11 @@ static int __init dma_atomic_pool_init(void)
 
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
-	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER.
+	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_PHYS_CONTIG_ORDER.
 	 */
 	if (!atomic_pool_size) {
 		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
-		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
+		pages = min_t(unsigned long, pages, MAX_PHYS_CONTIG_NR_PAGES);
 		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
 	}
 	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 84a4ea87f394..e6c829a581d6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1922,7 +1922,7 @@ pgoff_t hugetlb_basepage_index(struct page *page)
 	pgoff_t index = page_index(page_head);
 	unsigned long compound_idx;
 
-	if (compound_order(page_head) > MAX_ORDER)
+	if (compound_order(page_head) > MAX_PHYS_CONTIG_ORDER)
 		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
 	else
 		compound_idx = page - page_head;
diff --git a/mm/internal.h b/mm/internal.h
index d688c0320cda..1b1abfc2196e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -304,7 +304,7 @@ static inline bool page_is_buddy(struct page *page, struct page *buddy,
  * satisfies the following equation:
  *     P = B & ~(1 << O)
  *
- * Assumption: *_mem_map is contiguous at least up to MAX_ORDER
+ * Assumption: *_mem_map is contiguous at least up to MAX_PHYS_CONTIG_ORDER
  */
 static inline unsigned long
 __find_buddy_pfn(unsigned long page_pfn, unsigned int order)
diff --git a/mm/memory.c b/mm/memory.c
index 118e5f023597..06f5f2de3a2c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5717,7 +5717,7 @@ void clear_huge_page(struct page *page,
 	unsigned long addr = addr_hint &
 		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
 
-	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
+	if (unlikely(pages_per_huge_page > MAX_PHYS_CONTIG_NR_PAGES)) {
 		clear_gigantic_page(page, addr, pages_per_huge_page);
 		return;
 	}
@@ -5769,7 +5769,7 @@ void copy_user_huge_page(struct page *dst, struct page *src,
 		.vma = vma,
 	};
 
-	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
+	if (unlikely(pages_per_huge_page > MAX_PHYS_CONTIG_NR_PAGES)) {
 		copy_user_gigantic_page(dst, src, addr, vma,
 					pages_per_huge_page);
 		return;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2985d5302c57..994a648c0f5c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -596,16 +596,16 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 	unsigned long pfn;
 
 	/*
-	 * Online the pages in MAX_ORDER aligned chunks. The callback might
+	 * Online the pages in MAX_PHYS_CONTIG_ORDER aligned chunks. The callback might
 	 * decide to not expose all pages to the buddy (e.g., expose them
 	 * later). We account all pages as being online and belonging to this
 	 * zone ("present").
 	 * When using memmap_on_memory, the range might not be aligned to
-	 * MAX_ORDER_NR_PAGES - 1, but pageblock aligned. __ffs() will detect
+	 * MAX_PHYS_CONTIG_NR_PAGES - 1, but pageblock aligned. __ffs() will detect
 	 * this and the first chunk to online will be pageblock_nr_pages.
 	 */
 	for (pfn = start_pfn; pfn < end_pfn;) {
-		int order = min_t(unsigned long, MAX_ORDER, __ffs(pfn));
+		int order = min_t(unsigned long, MAX_PHYS_CONTIG_ORDER, __ffs(pfn));
 
 		(*online_page_callback)(pfn_to_page(pfn), order);
 		pfn += (1UL << order);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 0c2b7d4f7054..2c183827d365 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -226,7 +226,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype)
 	 */
 	if (PageBuddy(page)) {
 		order = buddy_order(page);
-		if (order >= pageblock_order && order <= MAX_ORDER) {
+		if (order >= pageblock_order && order <= MAX_PHYS_CONTIG_ORDER) {
 			buddy = find_buddy_page_pfn(page, page_to_pfn(page),
 						    order, NULL);
 			if (buddy && !is_migrate_isolate_page(buddy)) {
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index e3d6f4af34e0..001438f3dbeb 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -248,7 +248,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
 		return err;
 
 	/* Process each free list starting from lowest order/mt */
-	for (order = page_reporting_order; order <= MAX_ORDER; order++) {
+	for (order = page_reporting_order; order <= MAX_PHYS_CONTIG_ORDER; order++) {
 		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
 			/* We do not pull pages from the isolate free list */
 			if (is_migrate_isolate(mt))
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER.
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (7 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 08/12] mm: replace MAX_ORDER when it is used to indicate max physical contiguity Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 10/12] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

With SPARSEMEM_VMEMMAP, all struct page are virtually contigous,
thus kernel can manipulate arbitrarily large pages. By checking
PFN validity during buddy page merging process, all free pages in buddy
allocator's free area have their PFNs contiguous even if the system has
several not physically contiguous memory sections. With these two
conditions, it is OK to remove the restriction of
MAX_ORDER + PAGE_SHIFT < SECTION_SIZE_BITS and change MAX_ORDER freely.

Add SET_MAX_ORDER to allow MAX_ORDER adjustment when arch does not set
its own MAX_ORDER via ARCH_FORCE_MAX_ORDER. Make it depend
on SPARSEMEM_VMEMMAP, when MAX_ORDER is not limited by SECTION_SIZE_BITS.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 arch/Kconfig           |  4 ++++
 include/linux/mmzone.h | 17 ++++++++++++++---
 mm/Kconfig             | 14 ++++++++++++++
 mm/internal.h          |  2 --
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 1c2599618eeb..e51c759a82ad 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -11,6 +11,10 @@ source "arch/$(SRCARCH)/Kconfig"
 
 menu "General architecture-dependent options"
 
+config ARCH_FORCE_MAX_ORDER
+	int
+	default "0"
+
 config CRASH_CORE
 	bool
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index de1548f4fc07..da5745fa15c3 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -24,11 +24,14 @@
 #include <asm/page.h>
 
 /* Free memory management - zoned buddy allocator.  */
-#ifndef CONFIG_ARCH_FORCE_MAX_ORDER
-#define MAX_ORDER 10
-#else
+#ifdef CONFIG_SET_MAX_ORDER
+#define MAX_ORDER CONFIG_SET_MAX_ORDER
+#elif CONFIG_ARCH_FORCE_MAX_ORDER != 0
 #define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
+#else
+#define MAX_ORDER 10
 #endif
+
 #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 
 /*
@@ -1588,9 +1591,17 @@ static inline bool movable_only_nodes(nodemask_t *nodes)
 #define SECTION_BLOCKFLAGS_BITS \
 	((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
 
+/*
+ * The MAX_ORDER check is not necessary when CONFIG_SET_MAX_ORDER is set, since
+ * it depends on CONFIG_SPARSEMEM_VMEMMAP, where all struct page are virtually
+ * contiguous, thus > section size pages can be allocated and manipulated
+ * without worrying about non-contiguous struct page.
+ */
+#ifndef CONFIG_SET_MAX_ORDER
 #if (MAX_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
 #endif
+#endif /* CONFIG_SET_MAX_ORDER*/
 
 static inline unsigned long pfn_to_section_nr(unsigned long pfn)
 {
diff --git a/mm/Kconfig b/mm/Kconfig
index ae6711d24e4a..9c7280acd528 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -435,6 +435,20 @@ config SPARSEMEM_VMEMMAP
 	  pfn_to_page and page_to_pfn operations.  This is the most
 	  efficient option when sufficient kernel resources are available.
 
+config SET_MAX_ORDER
+	int "Set maximum order of buddy allocator"
+	depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER = 0)
+	range 10 255
+	default "10"
+	help
+	  The kernel memory allocator divides physically contiguous memory
+	  blocks into "zones", where each zone is a power of two number of
+	  pages.  This option selects the largest power of two that the kernel
+	  keeps in the memory allocator.  If you need to allocate very large
+	  blocks of physically contiguous memory, then you may need to
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
+
 config HAVE_MEMBLOCK_PHYS_MAP
 	bool
 
diff --git a/mm/internal.h b/mm/internal.h
index 1b1abfc2196e..1c3f260930d8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -303,8 +303,6 @@ static inline bool page_is_buddy(struct page *page, struct page *buddy,
  * 2) Any buddy B will have an order O+1 parent P which
  * satisfies the following equation:
  *     P = B & ~(1 << O)
- *
- * Assumption: *_mem_map is contiguous at least up to MAX_PHYS_CONTIG_ORDER
  */
 static inline unsigned long
 __find_buddy_pfn(unsigned long page_pfn, unsigned int order)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 10/12] mm: convert MAX_ORDER sized static arrays to dynamic ones.
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (8 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 11/12] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
  2022-09-22  1:12 ` [PATCH v1 12/12] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

This prepares for the upcoming changes to make MAX_ORDER a boot time
parameter instead of compilation time constant. All static arrays with
MAX_ORDER size are converted to pointers and their memory is allocated
at runtime.

free_area array in struct zone is allocated using memblock_alloc_node()
at boot time and using kzalloc() when memory is hot-added.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 .../admin-guide/kdump/vmcoreinfo.rst          |  2 +-
 drivers/gpu/drm/ttm/ttm_device.c              |  7 ++-
 drivers/gpu/drm/ttm/ttm_pool.c                | 58 +++++++++++++++++--
 include/drm/ttm/ttm_pool.h                    |  4 +-
 include/linux/mmzone.h                        |  2 +-
 mm/kmsan/init.c                               | 12 +++-
 mm/page_alloc.c                               | 32 ++++++++--
 7 files changed, 98 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index c572b5230fe0..a775462aa7c7 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -172,7 +172,7 @@ variables.
 Offset of the free_list's member. This value is used to compute the number
 of free pages.
 
-Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
+Each zone has a free_area structure array called free_area with length of MAX_ORDER + 1.
 The free_list represents a linked list of free page blocks.
 
 (list_head, next|prev)
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index e7147e304637..442a77bb5b4f 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -92,7 +92,9 @@ static int ttm_global_init(void)
 		>> PAGE_SHIFT;
 	num_dma32 = min(num_dma32, 2UL << (30 - PAGE_SHIFT));
 
-	ttm_pool_mgr_init(num_pages);
+	ret = ttm_pool_mgr_init(num_pages);
+	if (ret)
+		goto out;
 	ttm_tt_mgr_init(num_pages, num_dma32);
 
 	glob->dummy_read_page = alloc_page(__GFP_ZERO | GFP_DMA32);
@@ -218,7 +220,8 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
 	bdev->funcs = funcs;
 
 	ttm_sys_man_init(bdev);
-	ttm_pool_init(&bdev->pool, dev, use_dma_alloc, use_dma32);
+	if (ttm_pool_init(&bdev->pool, dev, use_dma_alloc, use_dma32))
+		return -ENOMEM;
 
 	bdev->vma_manager = vma_manager;
 	INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 85d19f425af6..d76f7d476421 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -64,11 +64,11 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER + 1];
-static struct ttm_pool_type global_uncached[MAX_ORDER + 1];
+static struct ttm_pool_type *global_write_combined;
+static struct ttm_pool_type *global_uncached;
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER + 1];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];
+static struct ttm_pool_type *global_dma32_write_combined;
+static struct ttm_pool_type *global_dma32_uncached;
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
@@ -493,8 +493,10 @@ EXPORT_SYMBOL(ttm_pool_free);
  * @use_dma32: true if GFP_DMA32 should be used
  *
  * Initialize the pool and its pool types.
+ *
+ * Returns: 0 on successe, negative error code otherwise
  */
-void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
+int ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 		   bool use_dma_alloc, bool use_dma32)
 {
 	unsigned int i, j;
@@ -506,11 +508,30 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 	pool->use_dma32 = use_dma32;
 
 	if (use_dma_alloc) {
-		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
+		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
+			pool->caching[i].orders =
+				kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+					GFP_KERNEL);
+			if (!pool->caching[i].orders) {
+				i--;
+				goto failed;
+			}
 			for (j = 0; j <= MAX_ORDER; ++j)
 				ttm_pool_type_init(&pool->caching[i].orders[j],
 						   pool, i, j);
+
+		}
+		return 0;
+
+failed:
+		for (; i >= 0; i--) {
+			for (j = 0; j <= MAX_ORDER; ++j)
+				ttm_pool_type_fini(&pool->caching[i].orders[j]);
+			kfree(pool->caching[i].orders);
+		}
+		return -ENOMEM;
 	}
+	return 0;
 }
 
 /**
@@ -701,6 +722,31 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
+	if (!global_write_combined) {
+		global_write_combined = kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+						GFP_KERNEL);
+		if (!global_write_combined)
+			return -ENOMEM;
+	}
+	if (!global_uncached) {
+		global_uncached = kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+					  GFP_KERNEL);
+		if (!global_uncached)
+			return -ENOMEM;
+	}
+	if (!global_dma32_write_combined) {
+		global_dma32_write_combined = kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+						      GFP_KERNEL);
+		if (!global_dma32_write_combined)
+			return -ENOMEM;
+	}
+	if (!global_dma32_uncached) {
+		global_dma32_uncached = kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+						GFP_KERNEL);
+		if (!global_dma32_uncached)
+			return -ENOMEM;
+	}
+
 	for (i = 0; i <= MAX_ORDER; ++i) {
 		ttm_pool_type_init(&global_write_combined[i], NULL,
 				   ttm_write_combined, i);
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 8ce14f9d202a..f5ce60f629ae 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -72,7 +72,7 @@ struct ttm_pool {
 	bool use_dma32;
 
 	struct {
-		struct ttm_pool_type orders[MAX_ORDER + 1];
+		struct ttm_pool_type *orders;
 	} caching[TTM_NUM_CACHING_TYPES];
 };
 
@@ -80,7 +80,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		   struct ttm_operation_ctx *ctx);
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt);
 
-void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
+int ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 		   bool use_dma_alloc, bool use_dma32);
 void ttm_pool_fini(struct ttm_pool *pool);
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index da5745fa15c3..032d347f36dd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -822,7 +822,7 @@ struct zone {
 	CACHELINE_PADDING(_pad1_);
 
 	/* free areas of different sizes */
-	struct free_area	free_area[MAX_ORDER + 1];
+	struct free_area	*free_area;
 
 	/* zone flags, see below */
 	unsigned long		flags;
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
index 6c0834274316..d3777146e7c4 100644
--- a/mm/kmsan/init.c
+++ b/mm/kmsan/init.c
@@ -96,7 +96,7 @@ void __init kmsan_init_shadow(void)
 struct metadata_page_pair {
 	struct page *shadow, *origin;
 };
-static struct metadata_page_pair held_back[MAX_ORDER + 1] __initdata;
+static struct metadata_page_pair *held_back __initdata;
 
 /*
  * Eager metadata allocation. When the memblock allocator is freeing pages to
@@ -115,6 +115,16 @@ bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
 {
 	struct page *shadow, *origin;
 
+	if (!held_back) {
+		held_back = memblock_alloc((MAX_ORDER + 1) * sizeof(struct metadata_page_pair),
+				sizeof(struct metadata_page_pair));
+		/* held_back cannot be allocated, kmsan will not take the page */
+		if (!held_back) {
+			WARN_ONCE(1, "held_back array cannot be allocated, kmsan will not work");
+			return true;
+		}
+	}
+
 	if (!held_back[order].shadow) {
 		held_back[order].shadow = page;
 		return false;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e3af87d89ebf..ba7c284ba3d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6249,13 +6249,23 @@ void __show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_zone_i
 
 	for_each_populated_zone(zone) {
 		unsigned int order;
-		unsigned long nr[MAX_ORDER + 1], flags, total = 0;
-		unsigned char types[MAX_ORDER + 1];
+		unsigned long *nr, flags, total = 0;
+		unsigned char *types;
 
 		if (zone_idx(zone) > max_zone_idx)
 			continue;
 		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
 			continue;
+
+		nr = kmalloc_array(MAX_ORDER + 1, sizeof(unsigned long), GFP_KERNEL);
+		if (!nr)
+			break;
+		types = kmalloc_array(MAX_ORDER + 1, sizeof(unsigned char), GFP_KERNEL);
+		if (!types) {
+			kfree(nr);
+			break;
+		}
+
 		show_node(zone);
 		printk(KERN_CONT "%s: ", zone->name);
 
@@ -7710,8 +7720,8 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
 	lruvec_init(&pgdat->__lruvec);
 }
 
-static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
-							unsigned long remaining_pages)
+static void __init zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
+					unsigned long remaining_pages, bool hotplug)
 {
 	atomic_long_set(&zone->managed_pages, remaining_pages);
 	zone_set_nid(zone, nid);
@@ -7720,6 +7730,16 @@ static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx,
 	spin_lock_init(&zone->lock);
 	zone_seqlock_init(zone);
 	zone_pcp_init(zone);
+	if (hotplug)
+		zone->free_area =
+			kcalloc_node(MAX_ORDER + 1, sizeof(struct free_area),
+				     GFP_KERNEL, nid);
+	else
+		zone->free_area =
+			memblock_alloc_node(sizeof(struct free_area) * (MAX_ORDER + 1),
+					    sizeof(struct free_area), nid);
+	BUG_ON(!zone->free_area);
+
 }
 
 /*
@@ -7758,7 +7778,7 @@ void __ref free_area_init_core_hotplug(struct pglist_data *pgdat)
 	}
 
 	for (z = 0; z < MAX_NR_ZONES; z++)
-		zone_init_internals(&pgdat->node_zones[z], z, nid, 0);
+		zone_init_internals(&pgdat->node_zones[z], z, nid, 0, true);
 }
 #endif
 
@@ -7821,7 +7841,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 		 * when the bootmem allocator frees pages into the buddy system.
 		 * And all highmem pages will be managed by the buddy system.
 		 */
-		zone_init_internals(zone, j, nid, freesize);
+		zone_init_internals(zone, j, nid, freesize, false);
 
 		if (!size)
 			continue;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 11/12] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant.
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (9 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 10/12] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  2022-09-22  1:12 ` [PATCH v1 12/12] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

For other MAX_ORDER uses (described below), there is no need or too much
hassle to convert certain static array to dynamic ones. Add
MIN_MAX_ORDER to serve as compile time constant in place of MAX_ORDER.

ARM64 hypervisor maintains its own free page list and does not import
any core kernel symbols, so soon-to-be runtime variable MAX_ORDER is not
accessible in ARM64 hypervisor code. Also there is no need to allocating
very large pages.

In SLAB/SLOB/SLUB, 2-D array kmalloc_caches uses MAX_ORDER in its second
dimension. It is too much hassle to allocate memory for kmalloc_caches
before any proper memory allocator is set up.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Quentin Perret <qperret@google.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h | 2 +-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c  | 2 +-
 include/linux/mmzone.h                | 3 +++
 include/linux/pageblock-flags.h       | 4 ++--
 include/linux/slab.h                  | 8 ++++----
 mm/slab.c                             | 2 +-
 mm/slub.c                             | 6 +++---
 7 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index fe5472a184a3..29b92f68ab69 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -16,7 +16,7 @@ struct hyp_pool {
 	 * API at EL2.
 	 */
 	hyp_spinlock_t lock;
-	struct list_head free_area[MAX_ORDER + 1];
+	struct list_head free_area[MIN_MAX_ORDER + 1];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
 	unsigned short max_order;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index d40f0b30b534..7ebbac3e2e76 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -241,7 +241,7 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
 	int i;
 
 	hyp_spin_lock_init(&pool->lock);
-	pool->max_order = min(MAX_ORDER, get_order((nr_pages + 1) << PAGE_SHIFT));
+	pool->max_order = min(MIN_MAX_ORDER, get_order((nr_pages + 1) << PAGE_SHIFT));
 	for (i = 0; i < pool->max_order; i++)
 		INIT_LIST_HEAD(&pool->free_area[i]);
 	pool->range_start = phys;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 032d347f36dd..19fca391f635 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -26,10 +26,13 @@
 /* Free memory management - zoned buddy allocator.  */
 #ifdef CONFIG_SET_MAX_ORDER
 #define MAX_ORDER CONFIG_SET_MAX_ORDER
+#define MIN_MAX_ORDER CONFIG_SET_MAX_ORDER
 #elif CONFIG_ARCH_FORCE_MAX_ORDER != 0
 #define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
+#define MIN_MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
 #else
 #define MAX_ORDER 10
+#define MIN_MAX_ORDER MAX_ORDER
 #endif
 
 #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 5469ba6f97c1..0424b6e04dae 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -41,7 +41,7 @@ extern unsigned int pageblock_order;
  * Huge pages are a constant size, but don't exceed the maximum allocation
  * granularity.
  */
-#define pageblock_order		min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER)
+#define pageblock_order		min_t(unsigned int, HUGETLB_PAGE_ORDER, MIN_MAX_ORDER)
 
 #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
 
@@ -51,7 +51,7 @@ extern unsigned int pageblock_order;
  * If huge pages are not used, group by MAX_ORDER_NR_PAGES or
  * PAGES_PER_SECTION when MAX_ORDER_NR_PAGES is larger.
  */
-#define pageblock_order		(min(PFN_SECTION_SHIFT, MAX_ORDER))
+#define pageblock_order		(min(PFN_SECTION_SHIFT, MIN_MAX_ORDER))
 
 #endif /* CONFIG_HUGETLB_PAGE */
 
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 1553bb127d8c..737bb72e5a60 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -257,8 +257,8 @@ static inline unsigned int arch_slab_minalign(void)
  * to do various tricks to work around compiler limitations in order to
  * ensure proper constant folding.
  */
-#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT) <= 25 ? \
-				(MAX_ORDER + PAGE_SHIFT) : 25)
+#define KMALLOC_SHIFT_HIGH	((MIN_MAX_ORDER + PAGE_SHIFT) <= 25 ? \
+				(MIN_MAX_ORDER + PAGE_SHIFT) : 25)
 #define KMALLOC_SHIFT_MAX	KMALLOC_SHIFT_HIGH
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	5
@@ -271,7 +271,7 @@ static inline unsigned int arch_slab_minalign(void)
  * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
  */
 #define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
-#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT)
+#define KMALLOC_SHIFT_MAX	(MIN_MAX_ORDER + PAGE_SHIFT)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	3
 #endif
@@ -284,7 +284,7 @@ static inline unsigned int arch_slab_minalign(void)
  * be allocated from the same page.
  */
 #define KMALLOC_SHIFT_HIGH	PAGE_SHIFT
-#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT)
+#define KMALLOC_SHIFT_MAX	(MIN_MAX_ORDER + PAGE_SHIFT)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	3
 #endif
diff --git a/mm/slab.c b/mm/slab.c
index 530f418a4930..23798c32bb38 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -466,7 +466,7 @@ static int __init slab_max_order_setup(char *str)
 {
 	get_option(&str, &slab_max_order);
 	slab_max_order = slab_max_order < 0 ? 0 :
-				min(slab_max_order, MAX_ORDER);
+				min(slab_max_order, MIN_MAX_ORDER);
 	slab_max_order_set = true;
 
 	return 1;
diff --git a/mm/slub.c b/mm/slub.c
index d001573b67c7..c3b1f438a8e3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3891,8 +3891,8 @@ static inline int calculate_order(unsigned int size)
 	/*
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */
-	order = calc_slab_order(size, 1, MAX_ORDER, 1);
-	if (order <= MAX_ORDER)
+	order = calc_slab_order(size, 1, MIN_MAX_ORDER, 1);
+	if (order <= MIN_MAX_ORDER)
 		return order;
 	return -ENOSYS;
 }
@@ -4403,7 +4403,7 @@ __setup("slub_min_order=", setup_slub_min_order);
 static int __init setup_slub_max_order(char *str)
 {
 	get_option(&str, (int *)&slub_max_order);
-	slub_max_order = min_t(unsigned int, slub_max_order, MAX_ORDER);
+	slub_max_order = min_t(unsigned int, slub_max_order, MIN_MAX_ORDER);
 
 	return 1;
 }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 12/12] mm: make MAX_ORDER a kernel boot time parameter.
  2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (10 preceding siblings ...)
  2022-09-22  1:12 ` [PATCH v1 11/12] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
@ 2022-09-22  1:12 ` Zi Yan
  11 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2022-09-22  1:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Zi Yan, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, Muchun Song,
	Andrew Morton, linux-kernel

From: Zi Yan <ziy@nvidia.com>

With the new buddy_alloc_max_order, users can specify larger MAX_ORDER
than set in CONFIG_ARCH_MAX_ORDER or CONFIG_SET_MAX_ORDER.
It can be set any value >= CONFIG_ARCH_MAX_ORDER or CONFIG_SET_MAX_ORDER,
but < 256 (limited by vmscan scan_control and per-cpu free page list).

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-doc@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 include/linux/mmzone.h                        | 10 +++++-
 mm/Kconfig                                    | 13 ++++++++
 mm/page_alloc.c                               | 31 +++++++++++++++++++
 mm/vmscan.c                                   |  1 -
 5 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5f633844daac..eb0dd8a78205 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -494,6 +494,11 @@
 	bttv.pll=	See Documentation/admin-guide/media/bttv.rst
 	bttv.tuner=
 
+	buddy_alloc_max_order=	[KNL] This parameter adjusts the size of largest
+			pages that can be allocated from kernel buddy allocator. The largest
+			page size is 2^buddy_alloc_max_order * PAGE_SIZE.
+			Format: integer
+
 	bulk_remove=off	[PPC]  This parameter disables the use of the pSeries
 			firmware feature for flushing multiple hpte entries
 			at a time.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 19fca391f635..5669191d15dc 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,6 +35,14 @@
 #define MIN_MAX_ORDER MAX_ORDER
 #endif
 
+/* remap MAX_ORDER to buddy_alloc_max_order for boot time adjustment */
+#ifdef CONFIG_BOOT_TIME_MAX_ORDER
+/* Defined in mm/page_alloc.c */
+extern int buddy_alloc_max_order;
+#undef MAX_ORDER
+#define MAX_ORDER buddy_alloc_max_order
+#endif /* CONFIG_BOOT_TIME_MAX_ORDER */
+
 #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 
 /*
@@ -1600,7 +1608,7 @@ static inline bool movable_only_nodes(nodemask_t *nodes)
  * contiguous, thus > section size pages can be allocated and manipulated
  * without worrying about non-contiguous struct page.
  */
-#ifndef CONFIG_SET_MAX_ORDER
+#if !defined(CONFIG_SET_MAX_ORDER) && !defined(CONFIG_BOOT_TIME_MAX_ORDER)
 #if (MAX_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index 9c7280acd528..3e6b61ba9fec 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -449,6 +449,19 @@ config SET_MAX_ORDER
 	  increase this value. A value of 10 means that the largest free memory
 	  block is 2^10 pages.
 
+config BOOT_TIME_MAX_ORDER
+	bool "Set maximum order of buddy allocator at boot time"
+	depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER != 0 || SET_MAX_ORDER != 0)
+	help
+	  It enables users to set the maximum order of buddy allocator at system
+	  boot time instead of a static MACRO set at compilation time. Systems with
+	  a lot of memory might want to allocate large pages whereas it is much
+	  less feasible and desirable for systems with less memory. This option
+	  allows different systems to control the largest page they want to
+	  allocate. By default, MAX_ORDER will be set to ARCH_FORCE_MAX_ORDER or
+	  SET_MAX_ORDER, whichever is non-zero, when the boot time parameter is not
+	  set. The maximum of MAX_ORDER is currently limited at 256.
+
 config HAVE_MEMBLOCK_PHYS_MAP
 	bool
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ba7c284ba3d3..9eacdf3a37c4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9720,3 +9720,34 @@ bool has_managed_dma(void)
 	return false;
 }
 #endif /* CONFIG_ZONE_DMA */
+
+#ifdef CONFIG_BOOT_TIME_MAX_ORDER
+int buddy_alloc_max_order = MIN_MAX_ORDER;
+EXPORT_SYMBOL(buddy_alloc_max_order);
+
+static int __init buddy_alloc_set(char *val)
+{
+	int ret;
+	unsigned long max_order;
+
+	ret = kstrtoul(val, 10, &max_order);
+
+	if (ret < 0)
+		return -EINVAL;
+
+	/*
+	 * max_order is also limited at below locations:
+	 * 1. scan_control in mm/vmscan.c uses s8 field for order, max_order cannot
+	 * be bigger than S8_MAX before the field is changed.
+	 * 2. free_pcppages_bulk has max_order upper limit.
+	 */
+	if (max_order > MIN_MAX_ORDER && max_order <= S8_MAX)
+		buddy_alloc_max_order = max_order;
+	else
+		buddy_alloc_max_order = MIN_MAX_ORDER;
+
+	return 0;
+}
+
+early_param("buddy_alloc_max_order", buddy_alloc_set);
+#endif /* CONFIG_BOOT_TIME_MAX_ORDER */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a8fd6300fa7e..009632243398 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6623,7 +6623,6 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 	 * scan_control uses s8 fields for order, priority, and reclaim_idx.
 	 * Confirm they are large enough for max values.
 	 */
-	BUILD_BUG_ON(MAX_ORDER > S8_MAX);
 	BUILD_BUG_ON(DEF_PRIORITY > S8_MAX);
 	BUILD_BUG_ON(MAX_NR_ZONES > S8_MAX);
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-22  1:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22  1:12 [PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
2022-09-22  1:12 ` [PATCH v1 01/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator Zi Yan
2022-09-22  1:12 ` [PATCH v1 02/12] mm: check page validity when find a buddy page in a non-contiguous zone Zi Yan
2022-09-22  1:12 ` [PATCH v1 03/12] mm: adapt deferred struct page init to new MAX_ORDER Zi Yan
2022-09-22  1:12 ` [PATCH v1 04/12] mm: prevent pageblock size being larger than section size Zi Yan
2022-09-22  1:12 ` [PATCH v1 05/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore() Zi Yan
2022-09-22  1:12 ` [PATCH v1 06/12] virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER Zi Yan
2022-09-22  1:12 ` [PATCH v1 07/12] mm/page_reporting: set page_reporting_order to -1 to prevent it running Zi Yan
2022-09-22  1:12 ` [PATCH v1 08/12] mm: replace MAX_ORDER when it is used to indicate max physical contiguity Zi Yan
2022-09-22  1:12 ` [PATCH v1 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
2022-09-22  1:12 ` [PATCH v1 10/12] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
2022-09-22  1:12 ` [PATCH v1 11/12] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
2022-09-22  1:12 ` [PATCH v1 12/12] mm: make MAX_ORDER a kernel boot time parameter Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).