linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter.
@ 2022-08-11 23:16 Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 01/12] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER Zi Yan
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset adds support for kernel boot time adjustable MAX_ORDER, so that
user can change the largest size of pages buddy allocator allocates.
It is on top of mm-everything-2022-08-11-02-10.

Motivation
===

This enables kernel to allocate 1GB pages and is necessary for my ongoing work
on adding support for 1GB PUD THP[1]. This is also the conclusion I came up with
after some discussion with David Hildenbrand on what methods should be used for
allocating gigantic pages[2], since other approaches like using CMA allocator or
alloc_contig_pages() are regarded as suboptimal.

In addition, make MAX_ORDER a kernel boot time parameter can enable user to
adjust buddy allocator without recompiling the kernel for their own needs, so
that one can still have a small MAX_ORDER if he/she does not need to allocate
gigantic pages like 1GB PUD THPs.

Background
===

At the moment, kernel imposes MAX_ORDER - 1 + PAGE_SHFIT < SECTION_SIZE_BITS
restriction. This prevents buddy allocator merging pages across memory sections,
as PFNs might not be contiguous and code like page++ would fail. But this would
not be an issue when SPARSEMEM_VMEMMAP is set, since all struct page are
virtually contiguous. So boot time adjustable MAX_ORDER depends on
SPARSEMEM_VMEMMAP.

Description
===

I tested the patchset on both x86_64 and ARM64 at 4KB base pages. The systems
boot and run. It definitely needs more tests and reviews.

In terms of the concerns on performance degradation if MAX_ORDER is increased,
I run vm-scalability from lkp comparing current system, my patchset with
MAX_ORDER=11 and my patchset with MAX_ORDER=20 on a x86_64 VM and saw
almost no performance difference, please attached vm-scalability reports.

Patch 1 renames FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER for a more
precise description.

Patch 2 changes MAX_ORDER to represent the max order of pages allocated
by buddy allocator. right now MAX_ORDER - 1 represents that and it is
confusing. Suggested by Vlastimil Babka.

Patch 3 replaces MAX_ORDER with MAX_PHYS_CONTIG_ORDER when it is used to
indicate the maximum number of physically contiguous pages.

Patch 4 fixes deferred struct page initialization when MAX_ORDER is
bigger than a memory section size.

Patch 5-8 convert the use of MAX_ORDER to pageblock_order. Since
pageblock_order is a constant when MAX_ORDER can be changed at boot time
and close to current MAX_ORDER value. I separate changes to different patches
for easy review and can merge them into a single one if that works better.

Patch 9 adds a new Kconfig option SET_MAX_ORDER to allow specifying MAX_ORDER
when ARCH_FORCE_MAX_ORDER is not used by the arch, like x86_64.

Patch 10 converts statically allocated arrays with MAX_ORDER length to dynamic
ones if possible and prepares for making MAX_ORDER a boot time parameter.

Patch 11 adds a new MIN_MAX_ORDER constant to replace soon-to-be-dynamic
MAX_ORDER for places where converting static array to dynamic one is causing
hassle and not necessary, i.e., ARM64 hypervisor page allocation and SLAB.

Patch 12 changes MAX_ORDER to be a kernel boot time parameter and it is
opt-in as an mm/Kconfig option.


Any suggestion and/or comment is welcome. Thanks.


[1] https://lore.kernel.org/linux-mm/20200928175428.4110504-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/e132fdd9-65af-1cad-8a6e-71844ebfe6a2@redhat.com/

Performance comparison
====

Only the changed stats is shown below. If you do not see some stats,
they are the same across three.

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/small-allocs/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
   1266004            -0.3%    1262674            +1.1%    1279441        vm-scalability.median

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/small-allocs-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
     76016            +0.2%      76178            +1.2%      76936        vm-scalability.median
   1216312            +0.2%    1218252            +1.2%    1231465        vm-scalability.throughput
 3.653e+08            +0.2%  3.659e+08            +1.3%  3.701e+08        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/mmap-xread-seq-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
   1574020 ±  2%      +0.1%    1576169            -2.3%    1537232 ±  2%  vm-scalability.median
  25184277 ±  2%      +0.1%   25218477            -2.3%   24595646 ±  2%  vm-scalability.throughput
 7.567e+09 ±  2%      +0.1%  7.575e+09            -2.3%  7.395e+09 ±  2%  vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/mmap-pread-rand/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      2.28 ± 11%     -21.0%       1.80 ± 11%     -18.2%       1.87 ± 11%  vm-scalability.free_time
      8.58 ±  9%      +3.4       11.95 ±  7%      +1.1        9.69 ± 13%  vm-scalability.stddev%
   1541489            -0.2%    1539102            +1.3%    1561678        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/mmap-pread-rand-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
     94376            +0.4%      94716            +1.8%      96103        vm-scalability.median
     12.96 ±  3%     +11.9       24.88 ± 80%      +0.3       13.30 ±  5%  vm-scalability.stddev%
   1509455            +0.8%    1522093            +1.8%    1536886        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/lru-file-readtwice/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    433656            -5.3%     410460 ±  2%      -4.8%     412737        vm-scalability.median
  13879867            -5.5%   13118050 ±  2%      -4.8%   13212361        vm-scalability.throughput
 4.164e+09            -5.5%  3.935e+09 ±  2%      -4.8%  3.964e+09        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/lru-file-mmap-read/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    488915 ±  3%      -2.3%     477658 ±  6%     -11.7%     431771 ±  6%  vm-scalability.median
    120.69 ± 35%     -39.0       81.65 ± 84%     -89.1       31.56 ±154%  vm-scalability.stddev%
   8106774 ±  4%      -3.3%    7835670 ±  7%     -13.9%    6981078 ±  8%  vm-scalability.throughput
 2.435e+09 ±  4%      -3.4%  2.353e+09 ±  7%     -13.8%  2.099e+09 ±  8%  vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/anon-rx-rand-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    196783            -0.8%     195189 ±  2%      -2.8%     191323        vm-scalability.median
     53.88 ±  3%      -4.9       48.96 ±  2%     -43.9        9.95 ± 35%  vm-scalability.stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/anon-r-seq-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
     50.03 ± 29%     -15.9       34.08 ± 12%      -2.4       47.66 ± 32%  vm-scalability.stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/qemu-vm/anon-r-rand/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      3.82            -0.8%       3.79            -0.6%       3.79 ±  3%  vm-scalability.free_time
    172116            +0.3%     172685            -2.1%     168557        vm-scalability.median
     75.53 ± 12%     -15.6       59.88 ± 13%     -60.9       14.64 ± 17%  vm-scalability.stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/8T/qemu-vm/anon-wx-seq-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    798340            +0.8%     804861            -3.0%     774082        vm-scalability.median
      1.55 ± 28%      -0.1        1.47 ± 32%      +0.6        2.19 ± 14%  vm-scalability.median_stddev%
      1.55 ± 28%      -0.1        1.47 ± 32%      +0.6        2.19 ± 14%  vm-scalability.stddev%
  12773455            +0.8%   12877783            -3.0%   12385319        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/8T/qemu-vm/anon-w-seq/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      0.13            +4.1%       0.14 ±  3%     +37.0%       0.18        vm-scalability.free_time
    923091            -0.6%     917275 ±  2%      +3.7%     957298        vm-scalability.median
      4.57 ±  2%      -1.7        2.89 ± 15%      -3.9        0.68 ±  7%  vm-scalability.median_stddev%
  14811265            -0.5%   14731710            +1.8%   15079698        vm-scalability.throughput
 3.173e+09            -0.5%  3.156e+09 ±  2%      -1.2%  3.134e+09        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/8T/qemu-vm/anon-w-seq-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      0.07            +1.1%       0.07            +3.6%       0.07        vm-scalability.free_time
    667055            -1.6%     656481            -1.4%     657861        vm-scalability.median
      2.22 ±  4%      -0.1        2.12 ±  6%      +0.4        2.60 ± 14%  vm-scalability.median_stddev%
  10817276            -1.3%   10673638            -2.3%   10568517        vm-scalability.throughput
 2.022e+09            -1.0%  2.002e+09            -1.5%  1.991e+09        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/8T/qemu-vm/anon-cow-seq/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    475554            -0.5%     473278            +2.6%     487908        vm-scalability.median
      4.20 ±  2%      -1.5        2.73 ±  6%      -3.3        0.89 ±  6%  vm-scalability.median_stddev%
      3.58 ±  3%      -1.0        2.58 ±  5%      -1.7        1.88 ±  9%  vm-scalability.stddev%
   7533010            +0.4%    7559545            +1.7%    7663820        vm-scalability.throughput
 1.764e+09            +1.8%  1.795e+09            +1.2%  1.785e+09        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/8T/qemu-vm/anon-cow-seq-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      1.13 ± 14%      -0.3        0.85 ± 15%      -0.5        0.66 ± 32%  vm-scalability.median_stddev%
      1.13 ± 14%      -0.3        0.85 ± 15%      -0.5        0.66 ± 32%  vm-scalability.stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/512G/qemu-vm/anon-wx-rand-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
     72308            -1.4%      71294 ±  3%      -7.9%      66569        vm-scalability.median
      0.96 ± 11%      -0.0        0.94 ± 14%      -0.5        0.44 ±  5%  vm-scalability.stddev%
 2.743e+08            -0.0%  2.743e+08           +12.7%   3.09e+08        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/512G/qemu-vm/anon-w-rand/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      0.09            +0.8%       0.09 ±  3%      +5.9%       0.10 ±  3%  vm-scalability.free_time
     67458 ±  2%      -6.0%      63414 ±  2%     -11.3%      59805        vm-scalability.median
      4.66 ± 36%      +4.7        9.38 ± 34%      -2.2        2.50 ± 23%  vm-scalability.median_stddev%
    971866            -1.3%     959227            -2.3%     949434        vm-scalability.throughput
 2.469e+08            -0.0%  2.469e+08           +11.1%  2.743e+08        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/512G/qemu-vm/anon-w-rand-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      0.12 ±  2%      +2.8%       0.13           +10.5%       0.14 ±  3%  vm-scalability.free_time
     65926 ±  3%      -1.8%      64711 ±  4%      -9.3%      59770        vm-scalability.median
      4.51 ± 38%      +1.3        5.83 ± 48%      -3.1        1.44 ± 31%  vm-scalability.median_stddev%
      1.24 ± 24%      -0.3        0.93 ± 25%      -0.8        0.48 ± 17%  vm-scalability.stddev%
 2.395e+08            +1.5%  2.432e+08           +11.5%   2.67e+08        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/512G/qemu-vm/anon-cow-rand/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
     63519 ±  3%      -2.3%      62074 ±  2%     -12.2%      55775        vm-scalability.median
    914972            -1.2%     904135            -2.4%     893097        vm-scalability.throughput
 2.323e+08            -0.0%  2.323e+08           +11.1%  2.582e+08        vm-scalability.workload

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/512G/qemu-vm/anon-cow-rand-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
     64719 ±  2%      -1.7%      63626 ±  2%      -7.4%      59953        vm-scalability.median
      3.32 ± 77%      +1.3        4.64 ± 64%      -2.3        1.02 ± 60%  vm-scalability.median_stddev%
      0.83 ± 27%      -0.1        0.74 ± 53%      -0.7        0.18 ± 43%  vm-scalability.stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/2T/qemu-vm/shm-xread-seq/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    346505            +2.5%     355073            +1.8%     352797        vm-scalability.median
      1.29 ± 26%      +0.4        1.73 ± 11%      +0.2        1.47 ± 22%  vm-scalability.median_stddev%
      1.29 ± 26%      +0.4        1.73 ± 11%      +0.2        1.47 ± 22%  vm-scalability.stddev%
   5544053            +2.5%    5681145            +1.8%    5644734        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/2T/qemu-vm/shm-pread-seq/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      3.06           +10.8%       3.40           +11.5%       3.42        vm-scalability.free_time
    344737            +3.5%     356824            +2.0%     351766        vm-scalability.median
   5515773            +3.5%    5709150            +2.0%    5628245        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/2T/qemu-vm/shm-pread-seq-mt/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    363265            +2.4%     371881            +2.0%     370384        vm-scalability.median
   5807625            +2.4%    5948137            +2.0%    5922313        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/256G/qemu-vm/msync/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    124686 ±  4%      +2.9%     128345            +9.0%     135953        vm-scalability.median
     19.68 ±  9%      -1.2       18.47 ±  6%      -5.0       14.67 ±  3%  vm-scalability.median_stddev%
     18.87 ±  9%      -1.5       17.38 ±  9%      -6.1       12.76 ±  4%  vm-scalability.stddev%
   2047903 ±  2%      +2.1%    2090681            +4.4%    2138545        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/256G/qemu-vm/lru-shm-rand/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      0.03            -0.4%       0.03            -2.0%       0.02        vm-scalability.free_time
      7.02 ± 18%      +0.7        7.68 ± 12%      -5.4        1.65 ± 13%  vm-scalability.median_stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/1T/qemu-vm/lru-shm/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      2.36 ±  4%      -0.5        1.90 ± 13%      -1.6        0.80 ± 20%  vm-scalability.median_stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/16G/qemu-vm/shm-xread-rand/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    168.32 ±  6%      -0.1      168.24 ± 17%    -114.4       53.94 ± 59%  vm-scalability.stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/16G/qemu-vm/shm-pread-rand/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
    172.46 ±  8%     -14.2      158.30 ± 20%    -106.4       66.04 ± 74%  vm-scalability.stddev%

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase/unit_size:
  gcc-11/defconfig/debian/300s/16G/qemu-vm/shm-pread-rand-mt/vm-scalability/1G

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      0.06 ±  4%      +5.0%       0.07 ±  5%     +38.6%       0.09 ±  6%  vm-scalability.free_time

=========================================================================================
compiler/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-11/defconfig/debian/300s/128G/qemu-vm/truncate-seq/vm-scalability

commit: 
  5.19.0-rc4-mm-everything+
  5.19.0-rc4-boot-time-max-order-10+
  5.19.0-rc4-boot-time-max-order-20+

5.19.0-rc4-mm-ev 5.19.0-rc4-boot-time-max-or 5.19.0-rc4-boot-time-max-or 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
      9.00 ± 15%      +1.6       10.59 ± 11%      -2.6        6.35 ±  8%  vm-scalability.median_fault_stddev%
      9.00 ± 15%      +1.6       10.59 ± 11%      -2.6        6.35 ±  8%  vm-scalability.stddev_fault%


Zi Yan (12):
  arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER
  mm: rectify MAX_ORDER semantics to be the largest page order from
    buddy allocator
  mm: replace MAX_ORDER when it is used to indicate max physical
    contiguity.
  mm: adapt deferred struct page init to new MAX_ORDER.
  mm: prevent pageblock size being larger than section size.
  fs: proc: use pageblock_nr_pages for reschedule period in read_kcore()
  virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER
  mm/page_reporting: set page_reporting_order to -1 to prevent it
    running
  mm: Make MAX_ORDER of buddy allocator configurable via Kconfig
    SET_MAX_ORDER.
  mm: convert MAX_ORDER sized static arrays to dynamic ones.
  mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time
    constant.
  mm: make MAX_ORDER a kernel boot time parameter.

 .../admin-guide/kdump/vmcoreinfo.rst          |   4 +-
 .../admin-guide/kernel-parameters.txt         |   9 +-
 arch/Kconfig                                  |   4 +
 arch/arc/Kconfig                              |   6 +-
 arch/arm/Kconfig                              |  14 +-
 arch/arm/configs/imx_v6_v7_defconfig          |   2 +-
 arch/arm/configs/milbeaut_m10v_defconfig      |   2 +-
 arch/arm/configs/oxnas_v6_defconfig           |   2 +-
 arch/arm/configs/sama7_defconfig              |   2 +-
 arch/arm64/Kconfig                            |  18 ++-
 arch/arm64/include/asm/sparsemem.h            |   2 +-
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   2 +-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          |   2 +-
 arch/csky/Kconfig                             |   4 +-
 arch/ia64/Kconfig                             |  10 +-
 arch/ia64/include/asm/sparsemem.h             |   6 +-
 arch/ia64/mm/hugetlbpage.c                    |   2 +-
 arch/m68k/Kconfig.cpu                         |  10 +-
 arch/mips/Kconfig                             |  24 ++--
 arch/nios2/Kconfig                            |  12 +-
 arch/powerpc/Kconfig                          |  32 ++---
 arch/powerpc/configs/85xx/ge_imp3a_defconfig  |   2 +-
 arch/powerpc/configs/fsl-emb-nonhw.config     |   2 +-
 arch/powerpc/mm/book3s64/iommu_api.c          |   2 +-
 arch/powerpc/mm/hugetlbpage.c                 |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c     |   2 +-
 arch/sh/configs/ecovec24_defconfig            |   2 +-
 arch/sh/mm/Kconfig                            |  22 ++-
 arch/sparc/Kconfig                            |  10 +-
 arch/sparc/kernel/pci_sun4v.c                 |   2 +-
 arch/sparc/kernel/traps_64.c                  |   2 +-
 arch/sparc/mm/tsb.c                           |   4 +-
 arch/um/kernel/um_arch.c                      |   4 +-
 arch/xtensa/Kconfig                           |  10 +-
 drivers/base/regmap/regmap-debugfs.c          |   8 +-
 drivers/crypto/hisilicon/sgl.c                |   6 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 drivers/gpu/drm/ttm/ttm_device.c              |   7 +-
 drivers/gpu/drm/ttm/ttm_pool.c                |  72 ++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   2 +-
 drivers/irqchip/irq-gic-v3-its.c              |   4 +-
 drivers/md/dm-bufio.c                         |   2 +-
 drivers/misc/genwqe/card_utils.c              |   2 +-
 drivers/net/ethernet/ibm/ibmvnic.h            |   2 +-
 drivers/video/fbdev/hyperv_fb.c               |   6 +-
 drivers/virtio/virtio_balloon.c               |   2 +-
 drivers/virtio/virtio_mem.c                   |   8 +-
 fs/proc/kcore.c                               |   2 +-
 fs/ramfs/file-nommu.c                         |   2 +-
 include/drm/ttm/ttm_pool.h                    |   4 +-
 include/linux/hugetlb.h                       |   2 +-
 include/linux/mmzone.h                        |  36 ++++-
 include/linux/pageblock-flags.h               |  21 ++-
 include/linux/slab.h                          |   8 +-
 kernel/crash_core.c                           |   2 +-
 kernel/dma/pool.c                             |   8 +-
 mm/Kconfig                                    |  33 ++++-
 mm/compaction.c                               |   8 +-
 mm/debug_vm_pgtable.c                         |   4 +-
 mm/huge_memory.c                              |   2 +-
 mm/hugetlb.c                                  |   4 +-
 mm/internal.h                                 |  10 +-
 mm/memblock.c                                 |   8 +-
 mm/memory.c                                   |   4 +-
 mm/memory_hotplug.c                           |   6 +-
 mm/page_alloc.c                               | 128 +++++++++++++-----
 mm/page_isolation.c                           |  14 +-
 mm/page_owner.c                               |   6 +-
 mm/page_reporting.c                           |   8 +-
 mm/shuffle.h                                  |   2 +-
 mm/slab.c                                     |   2 +-
 mm/slub.c                                     |   6 +-
 mm/vmscan.c                                   |   1 -
 mm/vmstat.c                                   |  14 +-
 net/smc/smc_ib.c                              |   2 +-
 scripts/checkpatch.pl                         |   8 ++
 security/integrity/ima/ima_crypto.c           |   2 +-
 tools/testing/memblock/linux/mmzone.h         |   6 +-
 78 files changed, 451 insertions(+), 270 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 01/12] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-13 15:36   ` Mike Rapoport
  2022-08-11 23:16 ` [RFC PATCH v2 02/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator Zi Yan
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

This Kconfig option is used by individual arch to set its desired
MAX_ORDER. Rename it to reflect its actual use.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-oxnas@groups.io
Cc: linux-csky@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arc/Kconfig                             | 2 +-
 arch/arm/Kconfig                             | 2 +-
 arch/arm/configs/imx_v6_v7_defconfig         | 2 +-
 arch/arm/configs/milbeaut_m10v_defconfig     | 2 +-
 arch/arm/configs/oxnas_v6_defconfig          | 2 +-
 arch/arm/configs/sama7_defconfig             | 2 +-
 arch/arm64/Kconfig                           | 2 +-
 arch/csky/Kconfig                            | 2 +-
 arch/ia64/Kconfig                            | 2 +-
 arch/ia64/include/asm/sparsemem.h            | 6 +++---
 arch/m68k/Kconfig.cpu                        | 2 +-
 arch/mips/Kconfig                            | 2 +-
 arch/nios2/Kconfig                           | 2 +-
 arch/powerpc/Kconfig                         | 2 +-
 arch/powerpc/configs/85xx/ge_imp3a_defconfig | 2 +-
 arch/powerpc/configs/fsl-emb-nonhw.config    | 2 +-
 arch/sh/configs/ecovec24_defconfig           | 2 +-
 arch/sh/mm/Kconfig                           | 2 +-
 arch/sparc/Kconfig                           | 2 +-
 arch/xtensa/Kconfig                          | 2 +-
 include/linux/mmzone.h                       | 4 ++--
 21 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 9e3653253ef2..d9a13ccf89a3 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -554,7 +554,7 @@ config ARC_BUILTIN_DTB_NAME
 
 endmenu	 # "ARC Architecture Configuration"
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	default "12" if ARC_HUGEPAGE_16M
 	default "11"
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 87badeae3181..e6c8ee56ac52 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1434,7 +1434,7 @@ config ARM_MODULE_PLTS
 	  Disabling this is usually safe for small single-platform
 	  configurations. If unsure, say y.
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	default "12" if SOC_AM33XX
 	default "9" if SA1111
diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
index 01012537a9b9..fb283059daa0 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
 CONFIG_SMP=y
 CONFIG_ARM_PSCI=y
 CONFIG_HIGHMEM=y
-CONFIG_FORCE_MAX_ZONEORDER=14
+CONFIG_ARCH_FORCE_MAX_ORDER=14
 CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
 CONFIG_KEXEC=y
 CONFIG_CPU_FREQ=y
diff --git a/arch/arm/configs/milbeaut_m10v_defconfig b/arch/arm/configs/milbeaut_m10v_defconfig
index 58810e98de3d..8620061e19a8 100644
--- a/arch/arm/configs/milbeaut_m10v_defconfig
+++ b/arch/arm/configs/milbeaut_m10v_defconfig
@@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
 # CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
 # CONFIG_ARM_PATCH_IDIV is not set
 CONFIG_HIGHMEM=y
-CONFIG_FORCE_MAX_ZONEORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=12
 CONFIG_SECCOMP=y
 CONFIG_KEXEC=y
 CONFIG_EFI=y
diff --git a/arch/arm/configs/oxnas_v6_defconfig b/arch/arm/configs/oxnas_v6_defconfig
index 600f78b363dd..5c163a9d1429 100644
--- a/arch/arm/configs/oxnas_v6_defconfig
+++ b/arch/arm/configs/oxnas_v6_defconfig
@@ -12,7 +12,7 @@ CONFIG_ARCH_OXNAS=y
 CONFIG_MACH_OX820=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=16
-CONFIG_FORCE_MAX_ZONEORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=12
 CONFIG_SECCOMP=y
 CONFIG_ARM_APPENDED_DTB=y
 CONFIG_ARM_ATAG_DTB_COMPAT=y
diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig
index 0384030d8b25..8b2cf6ddd568 100644
--- a/arch/arm/configs/sama7_defconfig
+++ b/arch/arm/configs/sama7_defconfig
@@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
 # CONFIG_CACHE_L2X0 is not set
 # CONFIG_ARM_PATCH_IDIV is not set
 # CONFIG_CPU_SW_DOMAIN_PAN is not set
-CONFIG_FORCE_MAX_ZONEORDER=15
+CONFIG_ARCH_FORCE_MAX_ORDER=15
 CONFIG_UACCESS_WITH_MEMCPY=y
 # CONFIG_ATAGS is not set
 CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..c6fcd8746f60 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1401,7 +1401,7 @@ config XEN
 	help
 	  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int
 	default "14" if ARM64_64K_PAGES
 	default "12" if ARM64_16K_PAGES
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index 3cbc2dc62baf..adee6ab36862 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -332,7 +332,7 @@ config HIGHMEM
 	select KMAP_LOCAL
 	default y
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	default "11"
 
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 26ac8ea15a9e..c6e06cdc738f 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -200,7 +200,7 @@ config IA64_CYCLONE
 	  Say Y here to enable support for IBM EXA Cyclone time source.
 	  If you're unsure, answer N.
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "MAX_ORDER (11 - 17)"  if !HUGETLB_PAGE
 	range 11 17  if !HUGETLB_PAGE
 	default "17" if HUGETLB_PAGE
diff --git a/arch/ia64/include/asm/sparsemem.h b/arch/ia64/include/asm/sparsemem.h
index 42ed5248fae9..84e8ce387b69 100644
--- a/arch/ia64/include/asm/sparsemem.h
+++ b/arch/ia64/include/asm/sparsemem.h
@@ -11,10 +11,10 @@
 
 #define SECTION_SIZE_BITS	(30)
 #define MAX_PHYSMEM_BITS	(50)
-#ifdef CONFIG_FORCE_MAX_ZONEORDER
-#if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
+#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
+#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
 #undef SECTION_SIZE_BITS
-#define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT)
+#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT)
 #endif
 #endif
 
diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index e0e9e31339c1..3b2f39508524 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -399,7 +399,7 @@ config SINGLE_MEMORY_CHUNK
 	  order" to save memory that could be wasted for unused memory map.
 	  Say N if not sure.
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order" if ADVANCED
 	depends on !SINGLE_MEMORY_CHUNK
 	default "11"
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index ec21f8999249..70d28976a40d 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2140,7 +2140,7 @@ config PAGE_SIZE_64KB
 
 endchoice
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
 	default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index 4167f1eb4cd8..a582f72104f3 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -44,7 +44,7 @@ menu "Kernel features"
 
 source "kernel/Kconfig.hz"
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	range 9 20
 	default "11"
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4c466acdc70d..39d71d7701bd 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -845,7 +845,7 @@ config DATA_SHIFT
 	  in that case. If PIN_TLB is selected, it must be aligned to 8M as
 	  8M pages will be pinned.
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	range 8 9 if PPC64 && PPC_64K_PAGES
 	default "9" if PPC64 && PPC_64K_PAGES
diff --git a/arch/powerpc/configs/85xx/ge_imp3a_defconfig b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
index f29c166998af..e7672c186325 100644
--- a/arch/powerpc/configs/85xx/ge_imp3a_defconfig
+++ b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
@@ -30,7 +30,7 @@ CONFIG_PREEMPT=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_BINFMT_MISC=m
 CONFIG_MATH_EMULATION=y
-CONFIG_FORCE_MAX_ZONEORDER=17
+CONFIG_ARCH_FORCE_MAX_ORDER=17
 CONFIG_PCI=y
 CONFIG_PCIEPORTBUS=y
 CONFIG_PCI_MSI=y
diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config b/arch/powerpc/configs/fsl-emb-nonhw.config
index f14c6dbd7346..ab8a8c4530d9 100644
--- a/arch/powerpc/configs/fsl-emb-nonhw.config
+++ b/arch/powerpc/configs/fsl-emb-nonhw.config
@@ -41,7 +41,7 @@ CONFIG_FIXED_PHY=y
 CONFIG_FONT_8x16=y
 CONFIG_FONT_8x8=y
 CONFIG_FONTS=y
-CONFIG_FORCE_MAX_ZONEORDER=13
+CONFIG_ARCH_FORCE_MAX_ORDER=13
 CONFIG_FRAMEBUFFER_CONSOLE=y
 CONFIG_FRAME_WARN=1024
 CONFIG_FTL=y
diff --git a/arch/sh/configs/ecovec24_defconfig b/arch/sh/configs/ecovec24_defconfig
index e699e2e04128..b52e14ccb450 100644
--- a/arch/sh/configs/ecovec24_defconfig
+++ b/arch/sh/configs/ecovec24_defconfig
@@ -8,7 +8,7 @@ CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_CPU_SUBTYPE_SH7724=y
-CONFIG_FORCE_MAX_ZONEORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=12
 CONFIG_MEMORY_SIZE=0x10000000
 CONFIG_FLATMEM_MANUAL=y
 CONFIG_SH_ECOVEC=y
diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index ba569cfb4368..411fdc0901f7 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -18,7 +18,7 @@ config PAGE_OFFSET
 	default "0x80000000" if MMU
 	default "0x00000000"
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	range 9 64 if PAGE_SIZE_16KB
 	default "9" if PAGE_SIZE_16KB
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 1c852bb530ec..4d3d1af90d52 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -269,7 +269,7 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
 	def_bool y if SPARC64
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	default "13"
 	help
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 12ac277282ba..bcb0c5d2abc2 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -771,7 +771,7 @@ config HIGHMEM
 
 	  If unsure, say Y.
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
 	default "11"
 	help
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8f571dc7c524..ca285ed3c6e0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -24,10 +24,10 @@
 #include <asm/page.h>
 
 /* Free memory management - zoned buddy allocator.  */
-#ifndef CONFIG_FORCE_MAX_ZONEORDER
+#ifndef CONFIG_ARCH_FORCE_MAX_ORDER
 #define MAX_ORDER 11
 #else
-#define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
+#define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
 #endif
 #define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 02/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 01/12] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 03/12] mm: replace MAX_ORDER when it is used to indicate max physical contiguity Zi Yan
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

MAX_ORDER used to denote the largest page order + 1, but that was
confusing and caused several off-by-1 errors in the code. Fix it by
setting MAX_ORDER to the largest page order from buddy allocator like
what its name says.

Add a warning in checkpatch.pl about the semantics change.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 .../admin-guide/kdump/vmcoreinfo.rst          |  4 +-
 .../admin-guide/kernel-parameters.txt         |  4 +-
 arch/arc/Kconfig                              |  4 +-
 arch/arm/Kconfig                              | 12 +++---
 arch/arm/configs/imx_v6_v7_defconfig          |  2 +-
 arch/arm/configs/milbeaut_m10v_defconfig      |  2 +-
 arch/arm/configs/oxnas_v6_defconfig           |  2 +-
 arch/arm/configs/sama7_defconfig              |  2 +-
 arch/arm64/Kconfig                            | 16 ++++----
 arch/arm64/include/asm/sparsemem.h            |  2 +-
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |  2 +-
 arch/csky/Kconfig                             |  2 +-
 arch/ia64/Kconfig                             |  8 ++--
 arch/ia64/include/asm/sparsemem.h             |  4 +-
 arch/ia64/mm/hugetlbpage.c                    |  2 +-
 arch/m68k/Kconfig.cpu                         |  8 ++--
 arch/mips/Kconfig                             | 22 +++++-----
 arch/nios2/Kconfig                            | 10 ++---
 arch/powerpc/Kconfig                          | 30 +++++++-------
 arch/powerpc/configs/85xx/ge_imp3a_defconfig  |  2 +-
 arch/powerpc/configs/fsl-emb-nonhw.config     |  2 +-
 arch/powerpc/mm/book3s64/iommu_api.c          |  2 +-
 arch/powerpc/mm/hugetlbpage.c                 |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c     |  2 +-
 arch/sh/configs/ecovec24_defconfig            |  2 +-
 arch/sh/mm/Kconfig                            | 20 +++++-----
 arch/sparc/Kconfig                            |  8 ++--
 arch/sparc/kernel/pci_sun4v.c                 |  2 +-
 arch/sparc/kernel/traps_64.c                  |  2 +-
 arch/xtensa/Kconfig                           |  8 ++--
 drivers/base/regmap/regmap-debugfs.c          |  8 ++--
 drivers/crypto/hisilicon/sgl.c                |  6 +--
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/ttm/ttm_pool.c                | 22 +++++-----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  2 +-
 drivers/irqchip/irq-gic-v3-its.c              |  4 +-
 drivers/md/dm-bufio.c                         |  2 +-
 drivers/misc/genwqe/card_utils.c              |  2 +-
 drivers/net/ethernet/ibm/ibmvnic.h            |  2 +-
 drivers/video/fbdev/hyperv_fb.c               |  6 +--
 drivers/virtio/virtio_balloon.c               |  2 +-
 drivers/virtio/virtio_mem.c                   |  8 ++--
 fs/ramfs/file-nommu.c                         |  2 +-
 include/drm/ttm/ttm_pool.h                    |  2 +-
 include/linux/hugetlb.h                       |  2 +-
 include/linux/mmzone.h                        | 10 ++---
 include/linux/pageblock-flags.h               |  4 +-
 include/linux/slab.h                          |  8 ++--
 kernel/crash_core.c                           |  2 +-
 kernel/dma/pool.c                             |  6 +--
 mm/Kconfig                                    |  6 +--
 mm/compaction.c                               |  8 ++--
 mm/debug_vm_pgtable.c                         |  4 +-
 mm/huge_memory.c                              |  2 +-
 mm/hugetlb.c                                  |  4 +-
 mm/memblock.c                                 |  2 +-
 mm/memory_hotplug.c                           |  4 +-
 mm/page_alloc.c                               | 40 +++++++++----------
 mm/page_isolation.c                           | 14 +++----
 mm/page_owner.c                               |  6 +--
 mm/page_reporting.c                           |  4 +-
 mm/shuffle.h                                  |  2 +-
 mm/slab.c                                     |  2 +-
 mm/slub.c                                     |  4 +-
 mm/vmstat.c                                   | 14 +++----
 net/smc/smc_ib.c                              |  2 +-
 scripts/checkpatch.pl                         |  8 ++++
 security/integrity/ima/ima_crypto.c           |  2 +-
 tools/testing/memblock/linux/mmzone.h         |  6 +--
 69 files changed, 208 insertions(+), 218 deletions(-)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 8419019b6a88..c572b5230fe0 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -172,7 +172,7 @@ variables.
 Offset of the free_list's member. This value is used to compute the number
 of free pages.
 
-Each zone has a free_area structure array called free_area[MAX_ORDER].
+Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
 The free_list represents a linked list of free page blocks.
 
 (list_head, next|prev)
@@ -189,7 +189,7 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
 information. Makedumpfile gets the start address of the vmalloc region
 from this.
 
-(zone.free_area, MAX_ORDER)
+(zone.free_area, MAX_ORDER + 1)
 ---------------------------
 
 Free areas descriptor. User-space tools use this value to iterate the
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index db5de5f0b9d3..ff33971e1630 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -928,7 +928,7 @@
 			buddy allocator. Bigger value increase the probability
 			of catching random memory corruption, but reduce the
 			amount of memory for normal system use. The maximum
-			possible value is MAX_ORDER/2.  Setting this parameter
+			possible value is (MAX_ORDER + 1)/2.  Setting this parameter
 			to 1 or 2 should be enough to identify most random
 			memory corruption problems caused by bugs in kernel or
 			driver code when a CPU writes to (or reads from) a
@@ -3899,7 +3899,7 @@
 			[KNL] Minimal page reporting order
 			Format: <integer>
 			Adjust the minimal page reporting order. The page
-			reporting is disabled when it exceeds (MAX_ORDER-1).
+			reporting is disabled when it exceeds MAX_ORDER.
 
 	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
 			timeout > 0: seconds before rebooting
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index d9a13ccf89a3..ab6d701365bb 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -556,7 +556,7 @@ endmenu	 # "ARC Architecture Configuration"
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "12" if ARC_HUGEPAGE_16M
-	default "11"
+	default "11" if ARC_HUGEPAGE_16M
+	default "10"
 
 source "kernel/power/Kconfig"
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e6c8ee56ac52..c8f2e46cc8c4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1436,19 +1436,17 @@ config ARM_MODULE_PLTS
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "12" if SOC_AM33XX
-	default "9" if SA1111
-	default "11"
+	default "11" if SOC_AM33XX
+	default "8" if SA1111
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 config ALIGNMENT_TRAP
 	def_bool CPU_CP15_MMU
diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
index fb283059daa0..eeb14499479d 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
 CONFIG_SMP=y
 CONFIG_ARM_PSCI=y
 CONFIG_HIGHMEM=y
-CONFIG_ARCH_FORCE_MAX_ORDER=14
+CONFIG_ARCH_FORCE_MAX_ORDER=13
 CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
 CONFIG_KEXEC=y
 CONFIG_CPU_FREQ=y
diff --git a/arch/arm/configs/milbeaut_m10v_defconfig b/arch/arm/configs/milbeaut_m10v_defconfig
index 8620061e19a8..22732f19e79b 100644
--- a/arch/arm/configs/milbeaut_m10v_defconfig
+++ b/arch/arm/configs/milbeaut_m10v_defconfig
@@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
 # CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
 # CONFIG_ARM_PATCH_IDIV is not set
 CONFIG_HIGHMEM=y
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
 CONFIG_SECCOMP=y
 CONFIG_KEXEC=y
 CONFIG_EFI=y
diff --git a/arch/arm/configs/oxnas_v6_defconfig b/arch/arm/configs/oxnas_v6_defconfig
index 5c163a9d1429..7e43aa355467 100644
--- a/arch/arm/configs/oxnas_v6_defconfig
+++ b/arch/arm/configs/oxnas_v6_defconfig
@@ -12,7 +12,7 @@ CONFIG_ARCH_OXNAS=y
 CONFIG_MACH_OX820=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=16
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
 CONFIG_SECCOMP=y
 CONFIG_ARM_APPENDED_DTB=y
 CONFIG_ARM_ATAG_DTB_COMPAT=y
diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig
index 8b2cf6ddd568..c200de3947e3 100644
--- a/arch/arm/configs/sama7_defconfig
+++ b/arch/arm/configs/sama7_defconfig
@@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
 # CONFIG_CACHE_L2X0 is not set
 # CONFIG_ARM_PATCH_IDIV is not set
 # CONFIG_CPU_SW_DOMAIN_PAN is not set
-CONFIG_ARCH_FORCE_MAX_ORDER=15
+CONFIG_ARCH_FORCE_MAX_ORDER=14
 CONFIG_UACCESS_WITH_MEMCPY=y
 # CONFIG_ATAGS is not set
 CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c6fcd8746f60..1afcfc9d2dc0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1403,25 +1403,23 @@ config XEN
 
 config ARCH_FORCE_MAX_ORDER
 	int
-	default "14" if ARM64_64K_PAGES
-	default "12" if ARM64_16K_PAGES
-	default "11"
+	default "13" if ARM64_64K_PAGES
+	default "11" if ARM64_16K_PAGES
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  We make sure that we can allocate upto a HugePage size for each configuration.
 	  Hence we have :
-		MAX_ORDER = (PMD_SHIFT - PAGE_SHIFT) + 1 => PAGE_SHIFT - 2
+		MAX_ORDER = PMD_SHIFT - PAGE_SHIFT = PAGE_SHIFT - 3
 
-	  However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
+	  However for 4K, we choose a higher default value, 10 as opposed to 9, giving us
 	  4M allocations matching the default size used by generic code.
 
 config UNMAP_KERNEL_AT_EL0
diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
index 4b73463423c3..5f5437621029 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -10,7 +10,7 @@
 /*
  * Section size must be at least 512MB for 64K base
  * page size config. Otherwise it will be less than
- * (MAX_ORDER - 1) and the build process will fail.
+ * MAX_ORDER and the build process will fail.
  */
 #ifdef CONFIG_ARM64_64K_PAGES
 #define SECTION_SIZE_BITS 29
diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 0a048dc06a7d..fe5472a184a3 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -16,7 +16,7 @@ struct hyp_pool {
 	 * API at EL2.
 	 */
 	hyp_spinlock_t lock;
-	struct list_head free_area[MAX_ORDER];
+	struct list_head free_area[MAX_ORDER + 1];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
 	unsigned short max_order;
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index adee6ab36862..a35fc882e97e 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -334,7 +334,7 @@ config HIGHMEM
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "11"
+	default "10"
 
 config DRAM_BASE
 	hex "DRAM start addr (the same with memory-section in dts)"
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index c6e06cdc738f..d85f6fbd0746 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -201,10 +201,10 @@ config IA64_CYCLONE
 	  If you're unsure, answer N.
 
 config ARCH_FORCE_MAX_ORDER
-	int "MAX_ORDER (11 - 17)"  if !HUGETLB_PAGE
-	range 11 17  if !HUGETLB_PAGE
-	default "17" if HUGETLB_PAGE
-	default "11"
+	int "MAX_ORDER (10 - 16)"  if !HUGETLB_PAGE
+	range 10 16  if !HUGETLB_PAGE
+	default "16" if HUGETLB_PAGE
+	default "10"
 
 config SMP
 	bool "Symmetric multi-processing support"
diff --git a/arch/ia64/include/asm/sparsemem.h b/arch/ia64/include/asm/sparsemem.h
index 84e8ce387b69..04f03a56c166 100644
--- a/arch/ia64/include/asm/sparsemem.h
+++ b/arch/ia64/include/asm/sparsemem.h
@@ -12,9 +12,9 @@
 #define SECTION_SIZE_BITS	(30)
 #define MAX_PHYSMEM_BITS	(50)
 #ifdef CONFIG_ARCH_FORCE_MAX_ORDER
-#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
+#if ((CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS)
 #undef SECTION_SIZE_BITS
-#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT)
+#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER + PAGE_SHIFT)
 #endif
 #endif
 
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index f993cb36c062..87cc2e8908b4 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -185,7 +185,7 @@ static int __init hugetlb_setup_sz(char *str)
 	size = memparse(str, &str);
 	if (*str || !is_power_of_2(size) || !(tr_pages & size) ||
 		size <= PAGE_SIZE ||
-		size >= (1UL << PAGE_SHIFT << MAX_ORDER)) {
+		size > (1UL << PAGE_SHIFT << MAX_ORDER)) {
 		printk(KERN_WARNING "Invalid huge page size specified\n");
 		return 1;
 	}
diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index 3b2f39508524..d3832e1ca7df 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -402,22 +402,20 @@ config SINGLE_MEMORY_CHUNK
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order" if ADVANCED
 	depends on !SINGLE_MEMORY_CHUNK
-	default "11"
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  For systems that have holes in their physical address space this
 	  value also defines the minimal size of the hole that allows
 	  freeing unused memory map.
 
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
-
 config 060_WRITETHROUGH
 	bool "Use write-through caching for 68060 supervisor accesses"
 	depends on ADVANCED && M68060
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 70d28976a40d..37116c811e60 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2142,24 +2142,22 @@ endchoice
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
-	default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
-	range 13 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
-	default "13" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
-	range 12 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
-	default "12" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
-	range 0 64
-	default "11"
+	range 13 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
+	default "13" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
+	range 12 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
+	default "12" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_32KB
+	range 11 63 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
+	default "11" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_16KB
+	range 0 63
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  The page size is not necessarily 4KB.  Keep this in mind
 	  when choosing a value for this option.
diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
index a582f72104f3..0cccaf8b7fdf 100644
--- a/arch/nios2/Kconfig
+++ b/arch/nios2/Kconfig
@@ -46,18 +46,16 @@ source "kernel/Kconfig.hz"
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 9 20
-	default "11"
+	range 8 19
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 endmenu
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 39d71d7701bd..d052cf27883e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -847,28 +847,26 @@ config DATA_SHIFT
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 8 9 if PPC64 && PPC_64K_PAGES
-	default "9" if PPC64 && PPC_64K_PAGES
-	range 13 13 if PPC64 && !PPC_64K_PAGES
-	default "13" if PPC64 && !PPC_64K_PAGES
-	range 9 64 if PPC32 && PPC_16K_PAGES
-	default "9" if PPC32 && PPC_16K_PAGES
-	range 7 64 if PPC32 && PPC_64K_PAGES
-	default "7" if PPC32 && PPC_64K_PAGES
-	range 5 64 if PPC32 && PPC_256K_PAGES
-	default "5" if PPC32 && PPC_256K_PAGES
-	range 11 64
-	default "11"
+	range 7 8 if PPC64 && PPC_64K_PAGES
+	default "8" if PPC64 && PPC_64K_PAGES
+	range 12 12 if PPC64 && !PPC_64K_PAGES
+	default "12" if PPC64 && !PPC_64K_PAGES
+	range 8 63 if PPC32 && PPC_16K_PAGES
+	default "8" if PPC32 && PPC_16K_PAGES
+	range 6 63 if PPC32 && PPC_64K_PAGES
+	default "6" if PPC32 && PPC_64K_PAGES
+	range 4 63 if PPC32 && PPC_256K_PAGES
+	default "4" if PPC32 && PPC_256K_PAGES
+	range 10 63
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 11 means that the largest free memory
+	  block is 2^10 pages.
 
 	  The page size is not necessarily 4KB.  For example, on 64-bit
 	  systems, 64KB pages can be enabled via CONFIG_PPC_64K_PAGES.  Keep
diff --git a/arch/powerpc/configs/85xx/ge_imp3a_defconfig b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
index e7672c186325..b8be8280a200 100644
--- a/arch/powerpc/configs/85xx/ge_imp3a_defconfig
+++ b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
@@ -30,7 +30,7 @@ CONFIG_PREEMPT=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_BINFMT_MISC=m
 CONFIG_MATH_EMULATION=y
-CONFIG_ARCH_FORCE_MAX_ORDER=17
+CONFIG_ARCH_FORCE_MAX_ORDER=16
 CONFIG_PCI=y
 CONFIG_PCIEPORTBUS=y
 CONFIG_PCI_MSI=y
diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config b/arch/powerpc/configs/fsl-emb-nonhw.config
index ab8a8c4530d9..3009b0efaf34 100644
--- a/arch/powerpc/configs/fsl-emb-nonhw.config
+++ b/arch/powerpc/configs/fsl-emb-nonhw.config
@@ -41,7 +41,7 @@ CONFIG_FIXED_PHY=y
 CONFIG_FONT_8x16=y
 CONFIG_FONT_8x8=y
 CONFIG_FONTS=y
-CONFIG_ARCH_FORCE_MAX_ORDER=13
+CONFIG_ARCH_FORCE_MAX_ORDER=12
 CONFIG_FRAMEBUFFER_CONSOLE=y
 CONFIG_FRAME_WARN=1024
 CONFIG_FTL=y
diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 7fcfba162e0d..81d7185e2ae8 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -97,7 +97,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
 	}
 
 	mmap_read_lock(mm);
-	chunk = (1UL << (PAGE_SHIFT + MAX_ORDER - 1)) /
+	chunk = (1UL << (PAGE_SHIFT + MAX_ORDER)) /
 			sizeof(struct vm_area_struct *);
 	chunk = min(chunk, entries);
 	for (entry = 0; entry < entries; entry += chunk) {
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index bc84a594ca62..8d63934783dc 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -652,7 +652,7 @@ void __init gigantic_hugetlb_cma_reserve(void)
 		order = mmu_psize_to_shift(MMU_PAGE_16G) - PAGE_SHIFT;
 
 	if (order) {
-		VM_WARN_ON(order < MAX_ORDER);
+		VM_WARN_ON(order <= MAX_ORDER);
 		hugetlb_cma_reserve(order);
 	}
 }
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9de9b2fb163d..8e29a57924ef 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1740,7 +1740,7 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 * DMA window can be larger than available memory, which will
 	 * cause errors later.
 	 */
-	const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
+	const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER);
 
 	/*
 	 * We create the default window as big as we can. The constraint is
diff --git a/arch/sh/configs/ecovec24_defconfig b/arch/sh/configs/ecovec24_defconfig
index b52e14ccb450..4d655e8d4d74 100644
--- a/arch/sh/configs/ecovec24_defconfig
+++ b/arch/sh/configs/ecovec24_defconfig
@@ -8,7 +8,7 @@ CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_CPU_SUBTYPE_SH7724=y
-CONFIG_ARCH_FORCE_MAX_ORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=11
 CONFIG_MEMORY_SIZE=0x10000000
 CONFIG_FLATMEM_MANUAL=y
 CONFIG_SH_ECOVEC=y
diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index 411fdc0901f7..e60e77c6edca 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -20,23 +20,21 @@ config PAGE_OFFSET
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	range 9 64 if PAGE_SIZE_16KB
-	default "9" if PAGE_SIZE_16KB
-	range 7 64 if PAGE_SIZE_64KB
-	default "7" if PAGE_SIZE_64KB
-	range 11 64
-	default "14" if !MMU
-	default "11"
+	range 8 63 if PAGE_SIZE_16KB
+	default "8" if PAGE_SIZE_16KB
+	range 6 63 if PAGE_SIZE_64KB
+	default "6" if PAGE_SIZE_64KB
+	range 10 63
+	default "13" if !MMU
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 	  The page size is not necessarily 4KB. Keep this in mind when
 	  choosing a value for this option.
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 4d3d1af90d52..099d0b31ea69 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -271,17 +271,15 @@ config ARCH_SPARSEMEM_DEFAULT
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "13"
+	default "12"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 13 means that the largest free memory block is 2^12 pages.
+	  increase this value. A value of 12 means that the largest free memory
+	  block is 2^12 pages.
 
 if SPARC64
 source "kernel/power/Kconfig"
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 384480971805..7d91ca6aa675 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -193,7 +193,7 @@ static void *dma_4v_alloc_coherent(struct device *dev, size_t size,
 
 	size = IO_PAGE_ALIGN(size);
 	order = get_order(size);
-	if (unlikely(order >= MAX_ORDER))
+	if (unlikely(order > MAX_ORDER))
 		return NULL;
 
 	npages = size >> IO_PAGE_SHIFT;
diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c
index 5b4de4a89dec..08ffd17d5ec3 100644
--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -897,7 +897,7 @@ void __init cheetah_ecache_flush_init(void)
 
 	/* Now allocate error trap reporting scoreboard. */
 	sz = NR_CPUS * (2 * sizeof(struct cheetah_err_info));
-	for (order = 0; order < MAX_ORDER; order++) {
+	for (order = 0; order <= MAX_ORDER; order++) {
 		if ((PAGE_SIZE << order) >= sz)
 			break;
 	}
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index bcb0c5d2abc2..2d1d91718263 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -773,17 +773,15 @@ config HIGHMEM
 
 config ARCH_FORCE_MAX_ORDER
 	int "Maximum zone order"
-	default "11"
+	default "10"
 	help
 	  The kernel memory allocator divides physically contiguous memory
 	  blocks into "zones", where each zone is a power of two number of
 	  pages.  This option selects the largest power of two that the kernel
 	  keeps in the memory allocator.  If you need to allocate very large
 	  blocks of physically contiguous memory, then you may need to
-	  increase this value.
-
-	  This config option is actually maximum order plus one. For example,
-	  a value of 11 means that the largest free memory block is 2^10 pages.
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
 
 endmenu
 
diff --git a/drivers/base/regmap/regmap-debugfs.c b/drivers/base/regmap/regmap-debugfs.c
index 817eda2075aa..c491fabe3617 100644
--- a/drivers/base/regmap/regmap-debugfs.c
+++ b/drivers/base/regmap/regmap-debugfs.c
@@ -226,8 +226,8 @@ static ssize_t regmap_read_debugfs(struct regmap *map, unsigned int from,
 	if (*ppos < 0 || !count)
 		return -EINVAL;
 
-	if (count > (PAGE_SIZE << (MAX_ORDER - 1)))
-		count = PAGE_SIZE << (MAX_ORDER - 1);
+	if (count > (PAGE_SIZE << MAX_ORDER))
+		count = PAGE_SIZE << MAX_ORDER;
 
 	buf = kmalloc(count, GFP_KERNEL);
 	if (!buf)
@@ -373,8 +373,8 @@ static ssize_t regmap_reg_ranges_read_file(struct file *file,
 	if (*ppos < 0 || !count)
 		return -EINVAL;
 
-	if (count > (PAGE_SIZE << (MAX_ORDER - 1)))
-		count = PAGE_SIZE << (MAX_ORDER - 1);
+	if (count > (PAGE_SIZE << MAX_ORDER))
+		count = PAGE_SIZE << MAX_ORDER;
 
 	buf = kmalloc(count, GFP_KERNEL);
 	if (!buf)
diff --git a/drivers/crypto/hisilicon/sgl.c b/drivers/crypto/hisilicon/sgl.c
index 2b6f2281cfd6..f30cf96b0a41 100644
--- a/drivers/crypto/hisilicon/sgl.c
+++ b/drivers/crypto/hisilicon/sgl.c
@@ -70,11 +70,11 @@ struct hisi_acc_sgl_pool *hisi_acc_create_sgl_pool(struct device *dev,
 			 HISI_ACC_SGL_ALIGN_SIZE);
 
 	/*
-	 * the pool may allocate a block of memory of size PAGE_SIZE * 2^(MAX_ORDER - 1),
+	 * the pool may allocate a block of memory of size PAGE_SIZE * 2^MAX_ORDER,
 	 * block size may exceed 2^31 on ia64, so the max of block size is 2^31
 	 */
-	block_size = 1 << (PAGE_SHIFT + MAX_ORDER <= 32 ?
-			   PAGE_SHIFT + MAX_ORDER - 1 : 31);
+	block_size = 1 << (PAGE_SHIFT + MAX_ORDER <= 31 ?
+			   PAGE_SHIFT + MAX_ORDER : 31);
 	sgl_num_per_block = block_size / sgl_size;
 	block_num = count / sgl_num_per_block;
 	remain_sgl = count % sgl_num_per_block;
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 72ce2c9f42fd..84498c7f845d 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -111,7 +111,7 @@ static int get_huge_pages(struct drm_i915_gem_object *obj)
 		do {
 			struct page *page;
 
-			GEM_BUG_ON(order >= MAX_ORDER);
+			GEM_BUG_ON(order > MAX_ORDER);
 			page = alloc_pages(GFP | __GFP_ZERO, order);
 			if (!page)
 				goto err;
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 21b61631f73a..85d19f425af6 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -64,11 +64,11 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[MAX_ORDER + 1];
+static struct ttm_pool_type global_uncached[MAX_ORDER + 1];
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER + 1];
+static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
@@ -382,7 +382,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	else
 		gfp_flags |= GFP_HIGHUSER;
 
-	for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
+	for (order = min_t(unsigned int, MAX_ORDER, __fls(num_pages));
 	     num_pages;
 	     order = min_t(unsigned int, order, __fls(num_pages))) {
 		bool apply_caching = false;
@@ -507,7 +507,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 
 	if (use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j <= MAX_ORDER; ++j)
 				ttm_pool_type_init(&pool->caching[i].orders[j],
 						   pool, i, j);
 	}
@@ -527,7 +527,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
 
 	if (pool->use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j <= MAX_ORDER; ++j)
 				ttm_pool_type_fini(&pool->caching[i].orders[j]);
 	}
 
@@ -581,7 +581,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
 	unsigned int i;
 
 	seq_puts(m, "\t ");
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i <= MAX_ORDER; ++i)
 		seq_printf(m, " ---%2u---", i);
 	seq_puts(m, "\n");
 }
@@ -592,7 +592,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i <= MAX_ORDER; ++i)
 		seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
 	seq_puts(m, "\n");
 }
@@ -701,7 +701,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i <= MAX_ORDER; ++i) {
 		ttm_pool_type_init(&global_write_combined[i], NULL,
 				   ttm_write_combined, i);
 		ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
@@ -734,7 +734,7 @@ void ttm_pool_mgr_fini(void)
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i <= MAX_ORDER; ++i) {
 		ttm_pool_type_fini(&global_write_combined[i]);
 		ttm_pool_type_fini(&global_uncached[i]);
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..c5ea361bf757 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -182,7 +182,7 @@
 #ifdef CONFIG_CMA_ALIGNMENT
 #define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + CONFIG_CMA_ALIGNMENT)
 #else
-#define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + MAX_ORDER - 1)
+#define Q_MAX_SZ_SHIFT			(PAGE_SHIFT + MAX_ORDER)
 #endif
 
 /*
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5ff09de6c48f..c867432919d8 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2438,8 +2438,8 @@ static bool its_parse_indirect_baser(struct its_node *its,
 	 * feature is not supported by hardware.
 	 */
 	new_order = max_t(u32, get_order(esz << ids), new_order);
-	if (new_order >= MAX_ORDER) {
-		new_order = MAX_ORDER - 1;
+	if (new_order > MAX_ORDER) {
+		new_order = MAX_ORDER;
 		ids = ilog2(PAGE_ORDER_TO_SIZE(new_order) / (int)esz);
 		pr_warn("ITS@%pa: %s Table too large, reduce ids %llu->%u\n",
 			&its->phys_base, its_base_type_string[type],
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index acd6d6b47434..eee05abbc0be 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -407,7 +407,7 @@ static void __cache_size_refresh(void)
  * If the allocation may fail we use __get_free_pages. Memory fragmentation
  * won't have a fatal effect here, but it just causes flushes of some other
  * buffers and more I/O will be performed. Don't use __get_free_pages if it
- * always fails (i.e. order >= MAX_ORDER).
+ * always fails (i.e. order > MAX_ORDER).
  *
  * If the allocation shouldn't fail we use __vmalloc. This is only for the
  * initial reserve allocation, so there's no risk of wasting all vmalloc
diff --git a/drivers/misc/genwqe/card_utils.c b/drivers/misc/genwqe/card_utils.c
index 1167463f26fb..361514cd575c 100644
--- a/drivers/misc/genwqe/card_utils.c
+++ b/drivers/misc/genwqe/card_utils.c
@@ -210,7 +210,7 @@ u32 genwqe_crc32(u8 *buff, size_t len, u32 init)
 void *__genwqe_alloc_consistent(struct genwqe_dev *cd, size_t size,
 			       dma_addr_t *dma_handle)
 {
-	if (get_order(size) >= MAX_ORDER)
+	if (get_order(size) > MAX_ORDER)
 		return NULL;
 
 	return dma_alloc_coherent(&cd->pci_dev->dev, size, dma_handle,
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index e5c6ff3d0c47..608f9df67eb8 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -75,7 +75,7 @@
  * pool for the 4MB. Thus the 16 Rx and Tx queues require 32 * 5 = 160
  * plus 16 for the TSO pools for a total of 176 LTB mappings per VNIC.
  */
-#define IBMVNIC_ONE_LTB_MAX	((u32)((1 << (MAX_ORDER - 1)) * PAGE_SIZE))
+#define IBMVNIC_ONE_LTB_MAX	((u32)((1 << MAX_ORDER) * PAGE_SIZE))
 #define IBMVNIC_ONE_LTB_SIZE	min((u32)(8 << 20), IBMVNIC_ONE_LTB_MAX)
 #define IBMVNIC_LTB_SET_SIZE	(38 << 20)
 
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index 886c564787f1..a852ab6c1f52 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -944,8 +944,8 @@ static phys_addr_t hvfb_get_phymem(struct hv_device *hdev,
 	if (request_size == 0)
 		return -1;
 
-	if (order < MAX_ORDER) {
-		/* Call alloc_pages if the size is less than 2^MAX_ORDER */
+	if (order <= MAX_ORDER) {
+		/* Call alloc_pages if the size is no greater than 2^MAX_ORDER */
 		page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
 		if (!page)
 			return -1;
@@ -975,7 +975,7 @@ static void hvfb_release_phymem(struct hv_device *hdev,
 {
 	unsigned int order = get_order(size);
 
-	if (order < MAX_ORDER)
+	if (order <= MAX_ORDER)
 		__free_pages(pfn_to_page(paddr >> PAGE_SHIFT), order);
 	else
 		dma_free_coherent(&hdev->device,
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 3f78a3a1eb75..5b15936a5214 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -33,7 +33,7 @@
 #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
 					     __GFP_NOMEMALLOC)
 /* The order of free page blocks to report to host */
-#define VIRTIO_BALLOON_HINT_BLOCK_ORDER (MAX_ORDER - 1)
+#define VIRTIO_BALLOON_HINT_BLOCK_ORDER MAX_ORDER
 /* The size of a free page block in bytes */
 #define VIRTIO_BALLOON_HINT_BLOCK_BYTES \
 	(1 << (VIRTIO_BALLOON_HINT_BLOCK_ORDER + PAGE_SHIFT))
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 0c2892ec6817..0e1253e3423a 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1120,13 +1120,13 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn,
  */
 static void virtio_mem_fake_online(unsigned long pfn, unsigned long nr_pages)
 {
-	unsigned long order = MAX_ORDER - 1;
+	unsigned long order = MAX_ORDER;
 	unsigned long i;
 
 	/*
 	 * We might get called for ranges that don't cover properly aligned
-	 * MAX_ORDER - 1 pages; however, we can only online properly aligned
-	 * pages with an order of MAX_ORDER - 1 at maximum.
+	 * MAX_ORDER pages; however, we can only online properly aligned
+	 * pages with an order of MAX_ORDER at maximum.
 	 */
 	while (!IS_ALIGNED(pfn | nr_pages, 1 << order))
 		order--;
@@ -1237,7 +1237,7 @@ static void virtio_mem_online_page(struct virtio_mem *vm,
 	bool do_online;
 
 	/*
-	 * We can get called with any order up to MAX_ORDER - 1. If our
+	 * We can get called with any order up to MAX_ORDER. If our
 	 * subblock size is smaller than that and we have a mixture of plugged
 	 * and unplugged subblocks within such a page, we have to process in
 	 * smaller granularity. In that case we'll adjust the order exactly once
diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index ba3525ccc27e..b3b7519a6519 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -70,7 +70,7 @@ int ramfs_nommu_expand_for_mapping(struct inode *inode, size_t newsize)
 
 	/* make various checks */
 	order = get_order(newsize);
-	if (unlikely(order >= MAX_ORDER))
+	if (unlikely(order > MAX_ORDER))
 		return -EFBIG;
 
 	ret = inode_newsize_ok(inode, newsize);
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index ef09b23d29e3..8ce14f9d202a 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -72,7 +72,7 @@ struct ttm_pool {
 	bool use_dma32;
 
 	struct {
-		struct ttm_pool_type orders[MAX_ORDER];
+		struct ttm_pool_type orders[MAX_ORDER + 1];
 	} caching[TTM_NUM_CACHING_TYPES];
 };
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3ec981a0d8b3..68485a264865 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -746,7 +746,7 @@ static inline unsigned huge_page_shift(struct hstate *h)
 
 static inline bool hstate_is_gigantic(struct hstate *h)
 {
-	return huge_page_order(h) >= MAX_ORDER;
+	return huge_page_order(h) > MAX_ORDER;
 }
 
 static inline unsigned int pages_per_huge_page(const struct hstate *h)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ca285ed3c6e0..e93faa3d7f1d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -25,11 +25,11 @@
 
 /* Free memory management - zoned buddy allocator.  */
 #ifndef CONFIG_ARCH_FORCE_MAX_ORDER
-#define MAX_ORDER 11
+#define MAX_ORDER 10
 #else
 #define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
 #endif
-#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
+#define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 
 /*
  * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed
@@ -92,7 +92,7 @@ static inline bool migratetype_is_mergeable(int mt)
 }
 
 #define for_each_migratetype_order(order, type) \
-	for (order = 0; order < MAX_ORDER; order++) \
+	for (order = 0; order <= MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
 
 extern int page_group_by_mobility_disabled;
@@ -632,7 +632,7 @@ struct zone {
 	ZONE_PADDING(_pad1_)
 
 	/* free areas of different sizes */
-	struct free_area	free_area[MAX_ORDER];
+	struct free_area	free_area[MAX_ORDER + 1];
 
 	/* zone flags, see below */
 	unsigned long		flags;
@@ -1379,7 +1379,7 @@ static inline bool movable_only_nodes(nodemask_t *nodes)
 #define SECTION_BLOCKFLAGS_BITS \
 	((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
 
-#if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS
+#if (MAX_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
 #endif
 
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 83c7248053a1..940efcffd374 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -41,14 +41,14 @@ extern unsigned int pageblock_order;
  * Huge pages are a constant size, but don't exceed the maximum allocation
  * granularity.
  */
-#define pageblock_order		min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER - 1)
+#define pageblock_order		min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER)
 
 #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
 
 #else /* CONFIG_HUGETLB_PAGE */
 
 /* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
-#define pageblock_order		(MAX_ORDER-1)
+#define pageblock_order		MAX_ORDER
 
 #endif /* CONFIG_HUGETLB_PAGE */
 
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0fefdf528e0d..568b5dfb3bd9 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -251,8 +251,8 @@ static inline unsigned int arch_slab_minalign(void)
  * to do various tricks to work around compiler limitations in order to
  * ensure proper constant folding.
  */
-#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
-				(MAX_ORDER + PAGE_SHIFT - 1) : 25)
+#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT) <= 25 ? \
+				(MAX_ORDER + PAGE_SHIFT) : 25)
 #define KMALLOC_SHIFT_MAX	KMALLOC_SHIFT_HIGH
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	5
@@ -265,7 +265,7 @@ static inline unsigned int arch_slab_minalign(void)
  * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
  */
 #define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
-#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
+#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	3
 #endif
@@ -278,7 +278,7 @@ static inline unsigned int arch_slab_minalign(void)
  * be allocated from the same page.
  */
 #define KMALLOC_SHIFT_HIGH	PAGE_SHIFT
-#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT - 1)
+#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	3
 #endif
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a0eb4d5cf557..245e2ee20718 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -471,7 +471,7 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_OFFSET(list_head, prev);
 	VMCOREINFO_OFFSET(vmap_area, va_start);
 	VMCOREINFO_OFFSET(vmap_area, list);
-	VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER);
+	VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER + 1);
 	log_buf_vmcoreinfo_setup();
 	VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES);
 	VMCOREINFO_NUMBER(NR_FREE_PAGES);
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 1bf6de398986..e20f168a34c7 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -84,8 +84,8 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
 	void *addr;
 	int ret = -ENOMEM;
 
-	/* Cannot allocate larger than MAX_ORDER-1 */
-	order = min(get_order(pool_size), MAX_ORDER-1);
+	/* Cannot allocate larger than MAX_ORDER */
+	order = min(get_order(pool_size), MAX_ORDER);
 
 	do {
 		pool_size = 1 << (PAGE_SHIFT + order);
@@ -190,7 +190,7 @@ static int __init dma_atomic_pool_init(void)
 
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
-	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
+	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER.
 	 */
 	if (!atomic_pool_size) {
 		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
diff --git a/mm/Kconfig b/mm/Kconfig
index 0331f1461f81..bbe31e85afee 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -307,7 +307,7 @@ config SHUFFLE_PAGE_ALLOCATOR
 	  the presence of a memory-side-cache. There are also incidental
 	  security benefits as it reduces the predictability of page
 	  allocations to compliment SLAB_FREELIST_RANDOM, but the
-	  default granularity of shuffling on the "MAX_ORDER - 1" i.e,
+	  default granularity of shuffling on the "MAX_ORDER" i.e,
 	  10th order of pages is selected based on cache utilization
 	  benefits on x86.
 
@@ -621,8 +621,8 @@ config HUGETLB_PAGE_SIZE_VARIABLE
 	  HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available
 	  on a platform.
 
-	  Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
-	  clamped down to MAX_ORDER - 1.
+	  Note that the pageblock_order cannot exceed MAX_ORDER and will be
+	  clamped down to MAX_ORDER.
 
 config CONTIG_ALLOC
 	def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
diff --git a/mm/compaction.c b/mm/compaction.c
index 640fa76228dd..4a282c658ac4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -586,7 +586,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 		if (PageCompound(page)) {
 			const unsigned int order = compound_order(page);
 
-			if (likely(order < MAX_ORDER)) {
+			if (likely(order <= MAX_ORDER)) {
 				blockpfn += (1UL << order) - 1;
 				cursor += (1UL << order) - 1;
 			}
@@ -941,7 +941,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			 * a valid page order. Consider only values in the
 			 * valid order range to prevent low_pfn overflow.
 			 */
-			if (freepage_order > 0 && freepage_order < MAX_ORDER)
+			if (freepage_order > 0 && freepage_order <= MAX_ORDER)
 				low_pfn += (1UL << freepage_order) - 1;
 			continue;
 		}
@@ -957,7 +957,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		if (PageCompound(page) && !cc->alloc_contig) {
 			const unsigned int order = compound_order(page);
 
-			if (likely(order < MAX_ORDER))
+			if (likely(order <= MAX_ORDER))
 				low_pfn += (1UL << order) - 1;
 			goto isolate_fail;
 		}
@@ -2118,7 +2118,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 
 	/* Direct compactor: Is a suitable page free? */
 	ret = COMPACT_NO_SUITABLE_PAGE;
-	for (order = cc->order; order < MAX_ORDER; order++) {
+	for (order = cc->order; order <= MAX_ORDER; order++) {
 		struct free_area *area = &cc->zone->free_area[order];
 		bool can_steal;
 
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index dc7df1254f0a..7e53c4a42047 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -1094,7 +1094,7 @@ debug_vm_pgtable_alloc_huge_page(struct pgtable_debug_args *args, int order)
 	struct page *page = NULL;
 
 #ifdef CONFIG_CONTIG_ALLOC
-	if (order >= MAX_ORDER) {
+	if (order > MAX_ORDER) {
 		page = alloc_contig_pages((1 << order), GFP_KERNEL,
 					  first_online_node, NULL);
 		if (page) {
@@ -1104,7 +1104,7 @@ debug_vm_pgtable_alloc_huge_page(struct pgtable_debug_args *args, int order)
 	}
 #endif
 
-	if (order < MAX_ORDER)
+	if (order <= MAX_ORDER)
 		page = alloc_pages(GFP_KERNEL, order);
 
 	return page;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3222b40a0f6d..9b1655950049 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -469,7 +469,7 @@ static int __init hugepage_init(void)
 	/*
 	 * hugepages can't be allocated by the buddy allocator
 	 */
-	MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER >= MAX_ORDER);
+	MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_ORDER);
 	/*
 	 * we use page->mapping and page->index in second tail page
 	 * as list_head: assuming THP order >= 2
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 28516881a1b2..15ff582687a3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1903,7 +1903,7 @@ pgoff_t hugetlb_basepage_index(struct page *page)
 	pgoff_t index = page_index(page_head);
 	unsigned long compound_idx;
 
-	if (compound_order(page_head) >= MAX_ORDER)
+	if (compound_order(page_head) > MAX_ORDER)
 		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
 	else
 		compound_idx = page - page_head;
@@ -4313,7 +4313,7 @@ static int __init default_hugepagesz_setup(char *s)
 	 * The number of default huge pages (for this size) could have been
 	 * specified as the first hugetlb parameter: hugepages=X.  If so,
 	 * then default_hstate_max_huge_pages is set.  If the default huge
-	 * page size is gigantic (>= MAX_ORDER), then the pages must be
+	 * page size is gigantic (> MAX_ORDER), then the pages must be
 	 * allocated here from bootmem allocator.
 	 */
 	if (default_hstate_max_huge_pages) {
diff --git a/mm/memblock.c b/mm/memblock.c
index b5d3026979fc..d1525463c05e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2030,7 +2030,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 	int order;
 
 	while (start < end) {
-		order = min(MAX_ORDER - 1UL, __ffs(start));
+		order = min_t(unsigned long, MAX_ORDER, __ffs(start));
 
 		while (start + (1UL << order) > end)
 			order--;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fad6d1f2262a..5540499007ae 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -596,7 +596,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 	unsigned long pfn;
 
 	/*
-	 * Online the pages in MAX_ORDER - 1 aligned chunks. The callback might
+	 * Online the pages in MAX_ORDER aligned chunks. The callback might
 	 * decide to not expose all pages to the buddy (e.g., expose them
 	 * later). We account all pages as being online and belonging to this
 	 * zone ("present").
@@ -605,7 +605,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 	 * this and the first chunk to online will be pageblock_nr_pages.
 	 */
 	for (pfn = start_pfn; pfn < end_pfn;) {
-		int order = min(MAX_ORDER - 1UL, __ffs(pfn));
+		int order = min_t(unsigned long, MAX_ORDER, __ffs(pfn));
 
 		(*online_page_callback)(pfn_to_page(pfn), order);
 		pfn += (1UL << order);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7e030d7cac81..07ad8074950f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -847,7 +847,7 @@ static int __init debug_guardpage_minorder_setup(char *buf)
 {
 	unsigned long res;
 
-	if (kstrtoul(buf, 10, &res) < 0 ||  res > MAX_ORDER / 2) {
+	if (kstrtoul(buf, 10, &res) < 0 ||  res > (MAX_ORDER + 1) / 2) {
 		pr_err("Bad debug_guardpage_minorder value\n");
 		return 0;
 	}
@@ -1065,7 +1065,7 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
 	unsigned long higher_page_pfn;
 	struct page *higher_page;
 
-	if (order >= MAX_ORDER - 2)
+	if (order >= MAX_ORDER - 1)
 		return false;
 
 	higher_page_pfn = buddy_pfn & pfn;
@@ -1120,7 +1120,7 @@ static inline void __free_one_page(struct page *page,
 	VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
 
-	while (order < MAX_ORDER - 1) {
+	while (order < MAX_ORDER) {
 		if (compaction_capture(capc, page, order, migratetype)) {
 			__mod_zone_freepage_state(zone, -(1 << order),
 								migratetype);
@@ -2559,7 +2559,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 	struct page *page;
 
 	/* Find a page of the appropriate size in the preferred list */
-	for (current_order = order; current_order < MAX_ORDER; ++current_order) {
+	for (current_order = order; current_order <= MAX_ORDER; ++current_order) {
 		area = &(zone->free_area[current_order]);
 		page = get_page_from_free_area(area, migratetype);
 		if (!page)
@@ -2934,7 +2934,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 			continue;
 
 		spin_lock_irqsave(&zone->lock, flags);
-		for (order = 0; order < MAX_ORDER; order++) {
+		for (order = 0; order <= MAX_ORDER; order++) {
 			struct free_area *area = &(zone->free_area[order]);
 
 			page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC);
@@ -3018,7 +3018,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	 * approximates finding the pageblock with the most free pages, which
 	 * would be too costly to do exactly.
 	 */
-	for (current_order = MAX_ORDER - 1; current_order >= min_order;
+	for (current_order = MAX_ORDER; current_order >= min_order;
 				--current_order) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
@@ -3044,7 +3044,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	return false;
 
 find_smallest:
-	for (current_order = order; current_order < MAX_ORDER;
+	for (current_order = order; current_order <= MAX_ORDER;
 							current_order++) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
@@ -3057,7 +3057,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	 * This should not happen - we already found a suitable fallback
 	 * when looking for the largest page.
 	 */
-	VM_BUG_ON(current_order == MAX_ORDER);
+	VM_BUG_ON(current_order == MAX_ORDER + 1);
 
 do_steal:
 	page = get_page_from_free_area(area, fallback_mt);
@@ -4005,7 +4005,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		return true;
 
 	/* For a high-order request, check at least one suitable page is free */
-	for (o = order; o < MAX_ORDER; o++) {
+	for (o = order; o <= MAX_ORDER; o++) {
 		struct free_area *area = &z->free_area[o];
 		int mt;
 
@@ -5480,7 +5480,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
 	 * There are several places where we assume that the order value is sane
 	 * so bail out early if the request is out of bound.
 	 */
-	if (WARN_ON_ONCE_GFP(order >= MAX_ORDER, gfp))
+	if (WARN_ON_ONCE_GFP(order > MAX_ORDER, gfp))
 		return NULL;
 
 	gfp &= gfp_allowed_mask;
@@ -6183,8 +6183,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 
 	for_each_populated_zone(zone) {
 		unsigned int order;
-		unsigned long nr[MAX_ORDER], flags, total = 0;
-		unsigned char types[MAX_ORDER];
+		unsigned long nr[MAX_ORDER + 1], flags, total = 0;
+		unsigned char types[MAX_ORDER + 1];
 
 		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
 			continue;
@@ -6192,7 +6192,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		printk(KERN_CONT "%s: ", zone->name);
 
 		spin_lock_irqsave(&zone->lock, flags);
-		for (order = 0; order < MAX_ORDER; order++) {
+		for (order = 0; order <= MAX_ORDER; order++) {
 			struct free_area *area = &zone->free_area[order];
 			int type;
 
@@ -6206,7 +6206,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			}
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
-		for (order = 0; order < MAX_ORDER; order++) {
+		for (order = 0; order <= MAX_ORDER; order++) {
 			printk(KERN_CONT "%lu*%lukB ",
 			       nr[order], K(1UL) << order);
 			if (nr[order])
@@ -7545,7 +7545,7 @@ static inline void setup_usemap(struct zone *zone) {}
 /* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
 void __init set_pageblock_order(void)
 {
-	unsigned int order = MAX_ORDER - 1;
+	unsigned int order = MAX_ORDER;
 
 	/* Check that pageblock_nr_pages has not already been setup */
 	if (pageblock_order)
@@ -9051,7 +9051,7 @@ void *__init alloc_large_system_hash(const char *tablename,
 			else
 				table = memblock_alloc_raw(size,
 							   SMP_CACHE_BYTES);
-		} else if (get_order(size) >= MAX_ORDER || hashdist) {
+		} else if (get_order(size) > MAX_ORDER || hashdist) {
 			table = vmalloc_huge(size, gfp_flags);
 			virt = true;
 			if (table)
@@ -9265,7 +9265,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	order = 0;
 	outer_start = start;
 	while (!PageBuddy(pfn_to_page(outer_start))) {
-		if (++order >= MAX_ORDER) {
+		if (++order > MAX_ORDER) {
 			outer_start = start;
 			break;
 		}
@@ -9524,7 +9524,7 @@ bool is_free_buddy_page(struct page *page)
 	unsigned long pfn = page_to_pfn(page);
 	unsigned int order;
 
-	for (order = 0; order < MAX_ORDER; order++) {
+	for (order = 0; order <= MAX_ORDER; order++) {
 		struct page *page_head = page - (pfn & ((1 << order) - 1));
 
 		if (PageBuddy(page_head) &&
@@ -9532,7 +9532,7 @@ bool is_free_buddy_page(struct page *page)
 			break;
 	}
 
-	return order < MAX_ORDER;
+	return order <= MAX_ORDER;
 }
 EXPORT_SYMBOL(is_free_buddy_page);
 
@@ -9583,7 +9583,7 @@ bool take_page_off_buddy(struct page *page)
 	bool ret = false;
 
 	spin_lock_irqsave(&zone->lock, flags);
-	for (order = 0; order < MAX_ORDER; order++) {
+	for (order = 0; order <= MAX_ORDER; order++) {
 		struct page *page_head = page - (pfn & ((1 << order) - 1));
 		int page_order = buddy_order(page_head);
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 9d73dc38e3d7..8d33120a81b2 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -226,7 +226,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype)
 	 */
 	if (PageBuddy(page)) {
 		order = buddy_order(page);
-		if (order >= pageblock_order && order < MAX_ORDER - 1) {
+		if (order >= pageblock_order && order <= MAX_ORDER) {
 			buddy = find_buddy_page_pfn(page, page_to_pfn(page),
 						    order, NULL);
 			if (buddy && !is_migrate_isolate_page(buddy)) {
@@ -289,11 +289,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * @skip_isolation:	the flag to skip the pageblock isolation in second
  *			isolate_single_pageblock()
  *
- * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
+ * Free and in-use pages can be as big as MAX_ORDER and contain more than one
  * pageblock. When not all pageblocks within a page are isolated at the same
  * time, free page accounting can go wrong. For example, in the case of
- * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
- * [         MAX_ORDER-1         ]
+ * MAX_ORDER = pageblock_order + 1, a MAX_ORDER page has two pagelbocks.
+ * [         MAX_ORDER           ]
  * [  pageblock0  |  pageblock1  ]
  * When either pageblock is isolated, if it is a free page, the page is not
  * split into separate migratetype lists, which is supposed to; if it is an
@@ -450,7 +450,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 				 * the free page to the right migratetype list.
 				 *
 				 * head_pfn is not used here as a hugetlb page order
-				 * can be bigger than MAX_ORDER-1, but after it is
+				 * can be bigger than MAX_ORDER, but after it is
 				 * freed, the free page order is not. Use pfn within
 				 * the range to find the head of the free page.
 				 */
@@ -458,7 +458,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 				outer_pfn = pfn;
 				while (!PageBuddy(pfn_to_page(outer_pfn))) {
 					/* stop if we cannot find the free page */
-					if (++order >= MAX_ORDER)
+					if (++order > MAX_ORDER)
 						goto failed;
 					outer_pfn &= ~0UL << order;
 				}
@@ -639,7 +639,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	int ret;
 
 	/*
-	 * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
+	 * Note: pageblock_order != MAX_ORDER. Then, chunks of free pages
 	 * are not aligned to pageblock_nr_pages.
 	 * Then we just check migratetype first.
 	 */
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 223bbf8674ec..80cf367362c3 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -318,7 +318,7 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 				unsigned long freepage_order;
 
 				freepage_order = buddy_order_unsafe(page);
-				if (freepage_order < MAX_ORDER)
+				if (freepage_order <= MAX_ORDER)
 					pfn += (1UL << freepage_order) - 1;
 				continue;
 			}
@@ -552,7 +552,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 		if (PageBuddy(page)) {
 			unsigned long freepage_order = buddy_order_unsafe(page);
 
-			if (freepage_order < MAX_ORDER)
+			if (freepage_order <= MAX_ORDER)
 				pfn += (1UL << freepage_order) - 1;
 			continue;
 		}
@@ -645,7 +645,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 			if (PageBuddy(page)) {
 				unsigned long order = buddy_order_unsafe(page);
 
-				if (order > 0 && order < MAX_ORDER)
+				if (order > 0 && order <= MAX_ORDER)
 					pfn += (1UL << order) - 1;
 				continue;
 			}
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 382958eef8a9..d52a55bca6d5 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -11,7 +11,7 @@
 #include "page_reporting.h"
 #include "internal.h"
 
-unsigned int page_reporting_order = MAX_ORDER;
+unsigned int page_reporting_order = MAX_ORDER + 1;
 module_param(page_reporting_order, uint, 0644);
 MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
 
@@ -244,7 +244,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
 		return err;
 
 	/* Process each free list starting from lowest order/mt */
-	for (order = page_reporting_order; order < MAX_ORDER; order++) {
+	for (order = page_reporting_order; order <= MAX_ORDER; order++) {
 		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
 			/* We do not pull pages from the isolate free list */
 			if (is_migrate_isolate(mt))
diff --git a/mm/shuffle.h b/mm/shuffle.h
index cec62984f7d3..a6bdf54f96f1 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -4,7 +4,7 @@
 #define _MM_SHUFFLE_H
 #include <linux/jump_label.h>
 
-#define SHUFFLE_ORDER (MAX_ORDER-1)
+#define SHUFFLE_ORDER MAX_ORDER
 
 #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
 DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
diff --git a/mm/slab.c b/mm/slab.c
index 10e96137b44f..530f418a4930 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -466,7 +466,7 @@ static int __init slab_max_order_setup(char *str)
 {
 	get_option(&str, &slab_max_order);
 	slab_max_order = slab_max_order < 0 ? 0 :
-				min(slab_max_order, MAX_ORDER - 1);
+				min(slab_max_order, MAX_ORDER);
 	slab_max_order_set = true;
 
 	return 1;
diff --git a/mm/slub.c b/mm/slub.c
index 862dbd9af4f5..5acf5407cbc6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3877,7 +3877,7 @@ static inline int calculate_order(unsigned int size)
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */
 	order = calc_slab_order(size, 1, MAX_ORDER, 1);
-	if (order < MAX_ORDER)
+	if (order <= MAX_ORDER)
 		return order;
 	return -ENOSYS;
 }
@@ -4388,7 +4388,7 @@ __setup("slub_min_order=", setup_slub_min_order);
 static int __init setup_slub_max_order(char *str)
 {
 	get_option(&str, (int *)&slub_max_order);
-	slub_max_order = min(slub_max_order, (unsigned int)MAX_ORDER - 1);
+	slub_max_order = min_t(unsigned int, slub_max_order, MAX_ORDER);
 
 	return 1;
 }
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 90af9a8572f5..9fc206477fb7 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1068,7 +1068,7 @@ static void fill_contig_page_info(struct zone *zone,
 	info->free_blocks_total = 0;
 	info->free_blocks_suitable = 0;
 
-	for (order = 0; order < MAX_ORDER; order++) {
+	for (order = 0; order <= MAX_ORDER; order++) {
 		unsigned long blocks;
 
 		/*
@@ -1101,7 +1101,7 @@ static int __fragmentation_index(unsigned int order, struct contig_page_info *in
 {
 	unsigned long requested = 1UL << order;
 
-	if (WARN_ON_ONCE(order >= MAX_ORDER))
+	if (WARN_ON_ONCE(order > MAX_ORDER))
 		return 0;
 
 	if (!info->free_blocks_total)
@@ -1474,7 +1474,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
 	int order;
 
 	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
-	for (order = 0; order < MAX_ORDER; ++order)
+	for (order = 0; order <= MAX_ORDER; ++order)
 		/*
 		 * Access to nr_free is lockless as nr_free is used only for
 		 * printing purposes. Use data_race to avoid KCSAN warning.
@@ -1503,7 +1503,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 					pgdat->node_id,
 					zone->name,
 					migratetype_names[mtype]);
-		for (order = 0; order < MAX_ORDER; ++order) {
+		for (order = 0; order <= MAX_ORDER; ++order) {
 			unsigned long freecount = 0;
 			struct free_area *area;
 			struct list_head *curr;
@@ -1543,7 +1543,7 @@ static void pagetypeinfo_showfree(struct seq_file *m, void *arg)
 
 	/* Print header */
 	seq_printf(m, "%-43s ", "Free pages count per migrate type at order");
-	for (order = 0; order < MAX_ORDER; ++order)
+	for (order = 0; order <= MAX_ORDER; ++order)
 		seq_printf(m, "%6d ", order);
 	seq_putc(m, '\n');
 
@@ -2168,7 +2168,7 @@ static void unusable_show_print(struct seq_file *m,
 	seq_printf(m, "Node %d, zone %8s ",
 				pgdat->node_id,
 				zone->name);
-	for (order = 0; order < MAX_ORDER; ++order) {
+	for (order = 0; order <= MAX_ORDER; ++order) {
 		fill_contig_page_info(zone, order, &info);
 		index = unusable_free_index(order, &info);
 		seq_printf(m, "%d.%03d ", index / 1000, index % 1000);
@@ -2220,7 +2220,7 @@ static void extfrag_show_print(struct seq_file *m,
 	seq_printf(m, "Node %d, zone %8s ",
 				pgdat->node_id,
 				zone->name);
-	for (order = 0; order < MAX_ORDER; ++order) {
+	for (order = 0; order <= MAX_ORDER; ++order) {
 		fill_contig_page_info(zone, order, &info);
 		index = __fragmentation_index(order, &info);
 		seq_printf(m, "%2d.%03d ", index / 1000, index % 1000);
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 854772dd52fd..9b66d6aeeb1a 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -843,7 +843,7 @@ long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev)
 		goto out;
 	/* the calculated number of cq entries fits to mlx5 cq allocation */
 	cqe_size_order = cache_line_size() == 128 ? 7 : 6;
-	smc_order = MAX_ORDER - cqe_size_order - 1;
+	smc_order = MAX_ORDER - cqe_size_order;
 	if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE)
 		cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2;
 	smcibdev->roce_cq_send = ib_create_cq(smcibdev->ibdev,
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 79e759aac543..e736847ef3ac 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -7368,6 +7368,14 @@ sub process {
 			}
 		}
 
+# check for MAX_ORDER uses as its semantics has changed.
+# MAX_ORDER now really means the max order of a page that can come out of
+# kernel buddy allocator
+        if ($line =~ /MAX_ORDER/) {
+            WARN("MAX_ORDER",
+                 "MAX_ORDER has changed its semantics. The max order of a page that can be allocated from buddy allocator is MAX_ORDER instead of MAX_ORDER - 1.")
+        }
+
 # Mode permission misuses where it seems decimal should be octal
 # This uses a shortcut match to avoid unnecessary uses of a slow foreach loop
 # o Ignore module_param*(...) uses with a decimal 0 permission as that has a
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c
index 64499056648a..51ad29940f05 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -38,7 +38,7 @@ static int param_set_bufsize(const char *val, const struct kernel_param *kp)
 
 	size = memparse(val, NULL);
 	order = get_order(size);
-	if (order >= MAX_ORDER)
+	if (order > MAX_ORDER)
 		return -EINVAL;
 	ima_maxorder = order;
 	ima_bufsize = PAGE_SIZE << order;
diff --git a/tools/testing/memblock/linux/mmzone.h b/tools/testing/memblock/linux/mmzone.h
index 7c2eb5c9bb54..d79748b263e7 100644
--- a/tools/testing/memblock/linux/mmzone.h
+++ b/tools/testing/memblock/linux/mmzone.h
@@ -17,10 +17,10 @@ enum zone_type {
 };
 
 #define MAX_NR_ZONES __MAX_NR_ZONES
-#define MAX_ORDER 11
-#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
+#define MAX_ORDER 10
+#define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 
-#define pageblock_order		(MAX_ORDER - 1)
+#define pageblock_order		MAX_ORDER
 #define pageblock_nr_pages	BIT(pageblock_order)
 
 struct zone {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 03/12] mm: replace MAX_ORDER when it is used to indicate max physical contiguity.
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 01/12] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 02/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 04/12] mm: adapt deferred struct page init to new MAX_ORDER Zi Yan
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

MAX_ORDER is limited at a memory section size, thus widely used as
a variable to indicate maximum physically contiguous page size. But this
limitation is no longer necessary as kernel only supports sparse memory
model. Add a new variable MAX_PHYS_CONTIG_ORDER to replace such uses of
MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 arch/sparc/mm/tsb.c                             |  4 ++--
 arch/um/kernel/um_arch.c                        |  4 ++--
 include/linux/pageblock-flags.h                 | 12 ++++++++++++
 kernel/dma/pool.c                               |  8 ++++----
 mm/hugetlb.c                                    |  2 +-
 mm/internal.h                                   |  8 ++++----
 mm/memory.c                                     |  4 ++--
 mm/memory_hotplug.c                             |  6 +++---
 mm/page_isolation.c                             |  2 +-
 mm/page_reporting.c                             |  4 ++--
 11 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ff33971e1630..ec519225b671 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3899,7 +3899,7 @@
 			[KNL] Minimal page reporting order
 			Format: <integer>
 			Adjust the minimal page reporting order. The page
-			reporting is disabled when it exceeds MAX_ORDER.
+			reporting is disabled when it exceeds MAX_PHYS_CONTIG_ORDER.
 
 	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
 			timeout > 0: seconds before rebooting
diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
index 912205787161..15c31d050dab 100644
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -402,8 +402,8 @@ void tsb_grow(struct mm_struct *mm, unsigned long tsb_index, unsigned long rss)
 	unsigned long new_rss_limit;
 	gfp_t gfp_flags;
 
-	if (max_tsb_size > (PAGE_SIZE << MAX_ORDER))
-		max_tsb_size = (PAGE_SIZE << MAX_ORDER);
+	if (max_tsb_size > (PAGE_SIZE << MAX_PHYS_CONTIG_ORDER))
+		max_tsb_size = (PAGE_SIZE << MAX_PHYS_CONTIG_ORDER);
 
 	new_cache_index = 0;
 	for (new_size = 8192; new_size < max_tsb_size; new_size <<= 1UL) {
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index e0de60e503b9..52a474f4f1c7 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -368,10 +368,10 @@ int __init linux_main(int argc, char **argv)
 	max_physmem = TASK_SIZE - uml_physmem - iomem_size - MIN_VMALLOC;
 
 	/*
-	 * Zones have to begin on a 1 << MAX_ORDER page boundary,
+	 * Zones have to begin on a 1 << MAX_PHYS_CONTIG_ORDER page boundary,
 	 * so this makes sure that's true for highmem
 	 */
-	max_physmem &= ~((1 << (PAGE_SHIFT + MAX_ORDER)) - 1);
+	max_physmem &= ~((1 << (PAGE_SHIFT + MAX_PHYS_CONTIG_ORDER)) - 1);
 	if (physmem_size + iomem_size > max_physmem) {
 		highmem = physmem_size + iomem_size - max_physmem;
 		physmem_size -= highmem;
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 940efcffd374..358b871b07ca 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -54,6 +54,18 @@ extern unsigned int pageblock_order;
 
 #define pageblock_nr_pages	(1UL << pageblock_order)
 
+/*
+ * memory section is only defined in sparsemem and in flatmem, pages are always
+ * physically contiguous, but we use MAX_ORDER since all users assume so.
+ */
+#ifdef CONFIG_FLATMEM
+#define MAX_PHYS_CONTIG_ORDER	MAX_ORDER
+#else /* SPARSEMEM */
+#define MAX_PHYS_CONTIG_ORDER	(min(PFN_SECTION_SHIFT, MAX_ORDER))
+#endif /* CONFIG_FLATMEM */
+
+#define MAX_PHYS_CONTIG_NR_PAGES	(1UL << MAX_PHYS_CONTIG_ORDER)
+
 /* Forward declaration */
 struct page;
 
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index e20f168a34c7..b10f1dd52871 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -84,8 +84,8 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
 	void *addr;
 	int ret = -ENOMEM;
 
-	/* Cannot allocate larger than MAX_ORDER */
-	order = min(get_order(pool_size), MAX_ORDER);
+	/* Cannot allocate larger than MAX_PHYS_CONTIG_ORDER */
+	order = min(get_order(pool_size), MAX_PHYS_CONTIG_ORDER);
 
 	do {
 		pool_size = 1 << (PAGE_SHIFT + order);
@@ -190,11 +190,11 @@ static int __init dma_atomic_pool_init(void)
 
 	/*
 	 * If coherent_pool was not used on the command line, default the pool
-	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER.
+	 * sizes to 128KB per 1GB of memory, min 128KB, max MAX_PHYS_CONTIG_ORDER.
 	 */
 	if (!atomic_pool_size) {
 		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
-		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
+		pages = min_t(unsigned long, pages, MAX_PHYS_CONTIG_NR_PAGES);
 		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
 	}
 	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 15ff582687a3..36eedeed1b22 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1903,7 +1903,7 @@ pgoff_t hugetlb_basepage_index(struct page *page)
 	pgoff_t index = page_index(page_head);
 	unsigned long compound_idx;
 
-	if (compound_order(page_head) > MAX_ORDER)
+	if (compound_order(page_head) > MAX_PHYS_CONTIG_ORDER)
 		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
 	else
 		compound_idx = page - page_head;
diff --git a/mm/internal.h b/mm/internal.h
index 4df67b6b8cce..1433e3a6fdd0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -302,7 +302,7 @@ static inline bool page_is_buddy(struct page *page, struct page *buddy,
  * satisfies the following equation:
  *     P = B & ~(1 << O)
  *
- * Assumption: *_mem_map is contiguous at least up to MAX_ORDER
+ * Assumption: *_mem_map is contiguous at least up to MAX_PHYS_CONTIG_ORDER
  */
 static inline unsigned long
 __find_buddy_pfn(unsigned long page_pfn, unsigned int order)
@@ -642,11 +642,11 @@ static inline void vunmap_range_noflush(unsigned long start, unsigned long end)
 /*
  * Return the mem_map entry representing the 'offset' subpage within
  * the maximally aligned gigantic page 'base'.  Handle any discontiguity
- * in the mem_map at MAX_ORDER_NR_PAGES boundaries.
+ * in the mem_map at MAX_PHYS_CONTIG_NR_PAGES boundaries.
  */
 static inline struct page *mem_map_offset(struct page *base, int offset)
 {
-	if (unlikely(offset >= MAX_ORDER_NR_PAGES))
+	if (unlikely(offset >= MAX_PHYS_CONTIG_NR_PAGES))
 		return nth_page(base, offset);
 	return base + offset;
 }
@@ -658,7 +658,7 @@ static inline struct page *mem_map_offset(struct page *base, int offset)
 static inline struct page *mem_map_next(struct page *iter,
 						struct page *base, int offset)
 {
-	if (unlikely((offset & (MAX_ORDER_NR_PAGES - 1)) == 0)) {
+	if (unlikely((offset & (MAX_PHYS_CONTIG_NR_PAGES - 1)) == 0)) {
 		unsigned long pfn = page_to_pfn(base) + offset;
 		if (!pfn_valid(pfn))
 			return NULL;
diff --git a/mm/memory.c b/mm/memory.c
index bd8e7e79be99..3b82945aaa3d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5660,7 +5660,7 @@ void clear_huge_page(struct page *page,
 	unsigned long addr = addr_hint &
 		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
 
-	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
+	if (unlikely(pages_per_huge_page > MAX_PHYS_CONTIG_NR_PAGES)) {
 		clear_gigantic_page(page, addr, pages_per_huge_page);
 		return;
 	}
@@ -5713,7 +5713,7 @@ void copy_user_huge_page(struct page *dst, struct page *src,
 		.vma = vma,
 	};
 
-	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
+	if (unlikely(pages_per_huge_page > MAX_PHYS_CONTIG_NR_PAGES)) {
 		copy_user_gigantic_page(dst, src, addr, vma,
 					pages_per_huge_page);
 		return;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5540499007ae..8930823e5067 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -596,16 +596,16 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 	unsigned long pfn;
 
 	/*
-	 * Online the pages in MAX_ORDER aligned chunks. The callback might
+	 * Online the pages in MAX_PHYS_CONTIG_ORDER aligned chunks. The callback might
 	 * decide to not expose all pages to the buddy (e.g., expose them
 	 * later). We account all pages as being online and belonging to this
 	 * zone ("present").
 	 * When using memmap_on_memory, the range might not be aligned to
-	 * MAX_ORDER_NR_PAGES - 1, but pageblock aligned. __ffs() will detect
+	 * MAX_PHYS_CONTIG_NR_PAGES - 1, but pageblock aligned. __ffs() will detect
 	 * this and the first chunk to online will be pageblock_nr_pages.
 	 */
 	for (pfn = start_pfn; pfn < end_pfn;) {
-		int order = min_t(unsigned long, MAX_ORDER, __ffs(pfn));
+		int order = min_t(unsigned long, MAX_PHYS_CONTIG_ORDER, __ffs(pfn));
 
 		(*online_page_callback)(pfn_to_page(pfn), order);
 		pfn += (1UL << order);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 8d33120a81b2..801835f91c44 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -226,7 +226,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype)
 	 */
 	if (PageBuddy(page)) {
 		order = buddy_order(page);
-		if (order >= pageblock_order && order <= MAX_ORDER) {
+		if (order >= pageblock_order && order <= MAX_PHYS_CONTIG_ORDER) {
 			buddy = find_buddy_page_pfn(page, page_to_pfn(page),
 						    order, NULL);
 			if (buddy && !is_migrate_isolate_page(buddy)) {
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index d52a55bca6d5..b48d6ad82998 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -11,7 +11,7 @@
 #include "page_reporting.h"
 #include "internal.h"
 
-unsigned int page_reporting_order = MAX_ORDER + 1;
+unsigned int page_reporting_order = MAX_PHYS_CONTIG_ORDER + 1;
 module_param(page_reporting_order, uint, 0644);
 MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
 
@@ -244,7 +244,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
 		return err;
 
 	/* Process each free list starting from lowest order/mt */
-	for (order = page_reporting_order; order <= MAX_ORDER; order++) {
+	for (order = page_reporting_order; order <= MAX_PHYS_CONTIG_ORDER; order++) {
 		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
 			/* We do not pull pages from the isolate free list */
 			if (is_migrate_isolate(mt))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 04/12] mm: adapt deferred struct page init to new MAX_ORDER.
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (2 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 03/12] mm: replace MAX_ORDER when it is used to indicate max physical contiguity Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 05/12] mm: prevent pageblock size being larger than section size Zi Yan
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

deferred_init only initializes first section of a zone and defers the
rest and the rest of the zone will be initialized in size of a section.
When MAX_ORDER grows beyond a section size, early_page_uninitialised()
did not prevent pages beyond first section from initialization, since it
only checked the starting pfn and assumes MAX_ORDER is smaller than
a section size. In addition, deferred_init_maxorder() uses
MAX_ORDER_NR_PAGES as the initialization unit, which can cause the
initialized chunk of memory overlapping with other initialization jobs.

For the first issue, make early_page_uninitialised() decrease the order
for non-deferred memory initialization when it is bigger than first
section. For the second issue, when adjust pfn alignment in
deferred_init_maxorder(), make sure the alignment is not bigger than
a section size.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/internal.h   |  2 +-
 mm/memblock.c   |  6 ++++--
 mm/page_alloc.c | 26 +++++++++++++++++++-------
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 1433e3a6fdd0..cbe745670c6e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -355,7 +355,7 @@ extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __putback_isolated_page(struct page *page, unsigned int order,
 				    int mt);
 extern void memblock_free_pages(struct page *page, unsigned long pfn,
-					unsigned int order);
+					unsigned int *order);
 extern void __free_pages_core(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned int order);
 extern void post_alloc_hook(struct page *page, unsigned int order,
diff --git a/mm/memblock.c b/mm/memblock.c
index d1525463c05e..dc2ce6df8fe3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1640,7 +1640,9 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
 	end = PFN_DOWN(base + size);
 
 	for (; cursor < end; cursor++) {
-		memblock_free_pages(pfn_to_page(cursor), cursor, 0);
+		unsigned int order = 0;
+
+		memblock_free_pages(pfn_to_page(cursor), cursor, &order);
 		totalram_pages_inc();
 	}
 }
@@ -2035,7 +2037,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 		while (start + (1UL << order) > end)
 			order--;
 
-		memblock_free_pages(pfn_to_page(start), start, order);
+		memblock_free_pages(pfn_to_page(start), start, &order);
 
 		start += (1UL << order);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 07ad8074950f..3f3af7cd5164 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -463,13 +463,19 @@ static inline bool deferred_pages_enabled(void)
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
-static inline bool __meminit early_page_uninitialised(unsigned long pfn)
+static inline bool __meminit early_page_uninitialised(unsigned long pfn, unsigned int *order)
 {
 	int nid = early_pfn_to_nid(pfn);
 
 	if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn)
 		return true;
 
+	/* clamp down order to not exceed first_deferred_pfn */
+	if (order)
+		*order = min_t(unsigned int,
+			       *order,
+			       ilog2(NODE_DATA(nid)->first_deferred_pfn - pfn));
+
 	return false;
 }
 
@@ -515,7 +521,7 @@ static inline bool deferred_pages_enabled(void)
 	return false;
 }
 
-static inline bool early_page_uninitialised(unsigned long pfn)
+static inline bool early_page_uninitialised(unsigned long pfn, unsigned int *order)
 {
 	return false;
 }
@@ -1644,7 +1650,7 @@ static void __meminit init_reserved_page(unsigned long pfn)
 	pg_data_t *pgdat;
 	int nid, zid;
 
-	if (!early_page_uninitialised(pfn))
+	if (!early_page_uninitialised(pfn, NULL))
 		return;
 
 	nid = early_pfn_to_nid(pfn);
@@ -1800,11 +1806,11 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
 #endif /* CONFIG_NUMA */
 
 void __init memblock_free_pages(struct page *page, unsigned long pfn,
-							unsigned int order)
+							unsigned int *order)
 {
-	if (early_page_uninitialised(pfn))
+	if (early_page_uninitialised(pfn, order))
 		return;
-	__free_pages_core(page, order);
+	__free_pages_core(page, *order);
 }
 
 /*
@@ -2030,7 +2036,13 @@ static unsigned long __init
 deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn,
 		       unsigned long *end_pfn)
 {
-	unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES);
+	/*
+	 * deferred_init_memmap_chunk gives out jobs with max size to
+	 * PAGES_PER_SECTION. Do not align mo_pfn beyond that.
+	 */
+	unsigned long align = min_t(unsigned long,
+				MAX_ORDER_NR_PAGES, PAGES_PER_SECTION);
+	unsigned long mo_pfn = ALIGN(*start_pfn + 1, align);
 	unsigned long spfn = *start_pfn, epfn = *end_pfn;
 	unsigned long nr_pages = 0;
 	u64 j = *i;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 05/12] mm: prevent pageblock size being larger than section size.
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (3 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 04/12] mm: adapt deferred struct page init to new MAX_ORDER Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 06/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore() Zi Yan
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

Only physical pages from a section can be guaranteed to be contiguous
and so far a pageblock can only group contiguous physical pages by
design. Set pageblock_order properly to prevent pageblock going beyond
section size.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/pageblock-flags.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 358b871b07ca..2679b2b4c079 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -47,8 +47,11 @@ extern unsigned int pageblock_order;
 
 #else /* CONFIG_HUGETLB_PAGE */
 
-/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
-#define pageblock_order		MAX_ORDER
+/*
+ * If huge pages are not used, group by MAX_ORDER_NR_PAGES or
+ * PAGES_PER_SECTION when MAX_ORDER_NR_PAGES is larger.
+ */
+#define pageblock_order		(min(PFN_SECTION_SHIFT, MAX_ORDER))
 
 #endif /* CONFIG_HUGETLB_PAGE */
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 06/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore()
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (4 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 05/12] mm: prevent pageblock size being larger than section size Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-23 10:36   ` David Hildenbrand
  2022-08-11 23:16 ` [RFC PATCH v2 07/12] virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER Zi Yan
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

MAX_ORDER_NR_PAGES can be increased when it becomes a boot time parameter
in later commits. To make sure read_kcore() reschedule its work in a
constant period, use pageblock_nr_pages instead for reschedule period,
since pageblock_nr_pages is a constant and either the same or half of
MAX_ORDER_NR_PAGES.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Ying Chen <chenying.kernel@bytedance.com>
Cc: Feng Zhou <zhoufeng.zf@bytedance.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 fs/proc/kcore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index dff921f7ca33..7dc09d211b48 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -491,7 +491,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 			}
 		}
 
-		if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) {
+		if (page_offline_frozen++ % pageblock_nr_pages == 0) {
 			page_offline_thaw();
 			cond_resched();
 			page_offline_freeze();
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 07/12] virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (5 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 06/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore() Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 08/12] mm/page_reporting: set page_reporting_order to -1 to prevent it running Zi Yan
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

virtio_balloon used MAX_ORDER to report free page blocks to host, as
MAX_ORDER becomes modifiable in later commits, the reported free size might
be too big. pageblock_order is either 1/2 of or the same as MAX_ORDER
currently. Use pageblock_order instead to make virtio_balloon have a
constant free page block report size when MAX_ORDER is changed in the later
commits.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/virtio/virtio_balloon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 5b15936a5214..51447737538b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -33,7 +33,7 @@
 #define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
 					     __GFP_NOMEMALLOC)
 /* The order of free page blocks to report to host */
-#define VIRTIO_BALLOON_HINT_BLOCK_ORDER MAX_ORDER
+#define VIRTIO_BALLOON_HINT_BLOCK_ORDER pageblock_order
 /* The size of a free page block in bytes */
 #define VIRTIO_BALLOON_HINT_BLOCK_BYTES \
 	(1 << (VIRTIO_BALLOON_HINT_BLOCK_ORDER + PAGE_SHIFT))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 08/12] mm/page_reporting: set page_reporting_order to -1 to prevent it running
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (6 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 07/12] virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

page_reporting_order was initialized to MAX_ORDER to prevent it running
before its value is overwritten. Use -1 instead to remove the
dependency on MAX_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/page_reporting.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index b48d6ad82998..001438f3dbeb 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -11,7 +11,11 @@
 #include "page_reporting.h"
 #include "internal.h"
 
-unsigned int page_reporting_order = MAX_PHYS_CONTIG_ORDER + 1;
+/*
+ * Set page_reporting_order to (unsigned int)-1 to prevent it running until the
+ * value is being overwritten
+ */
+unsigned int page_reporting_order = (unsigned int)-1;
 module_param(page_reporting_order, uint, 0644);
 MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER.
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (7 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 08/12] mm/page_reporting: set page_reporting_order to -1 to prevent it running Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-13  1:11   ` Randy Dunlap
  2022-08-11 23:16 ` [RFC PATCH v2 10/12] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

With SPARSEMEM_VMEMMAP, all struct page are virtually contigous,
thus kernel can manipulate arbitrarily large pages. By checking
PFN validity during buddy page merging process, all free pages in buddy
allocator's free area have their PFNs contiguous even if the system has
several not physically contiguous memory sections. With these two
conditions, it is OK to remove the restriction of
MAX_ORDER + PAGE_SHIFT < SECTION_SIZE_BITS and change MAX_ORDER freely.

Add SET_MAX_ORDER to allow MAX_ORDER adjustment when arch does not set
its own MAX_ORDER via ARCH_FORCE_MAX_ORDER. Make it depend
on SPARSEMEM_VMEMMAP, when MAX_ORDER is not limited by SECTION_SIZE_BITS.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 arch/Kconfig           |  4 ++++
 include/linux/mmzone.h | 17 ++++++++++++++---
 mm/Kconfig             | 14 ++++++++++++++
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index f330410da63a..24baee6c3feb 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -11,6 +11,10 @@ source "arch/$(SRCARCH)/Kconfig"
 
 menu "General architecture-dependent options"
 
+config ARCH_FORCE_MAX_ORDER
+    int
+    default "0"
+
 config CRASH_CORE
 	bool
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e93faa3d7f1d..b83b481e250b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -24,11 +24,14 @@
 #include <asm/page.h>
 
 /* Free memory management - zoned buddy allocator.  */
-#ifndef CONFIG_ARCH_FORCE_MAX_ORDER
-#define MAX_ORDER 10
-#else
+#ifdef CONFIG_SET_MAX_ORDER
+#define MAX_ORDER CONFIG_SET_MAX_ORDER
+#elif CONFIG_ARCH_FORCE_MAX_ORDER != 0
 #define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
+#else
+#define MAX_ORDER 10
 #endif
+
 #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 
 /*
@@ -1379,9 +1382,17 @@ static inline bool movable_only_nodes(nodemask_t *nodes)
 #define SECTION_BLOCKFLAGS_BITS \
 	((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
 
+/*
+ * The MAX_ORDER check is not necessary when CONFIG_SET_MAX_ORDER is set, since
+ * it depends on CONFIG_SPARSEMEM_VMEMMAP, where all struct page are virtually
+ * contiguous, thus > section size pages can be allocated and manipulated
+ * without worrying about non-contiguous struct page.
+ */
+#ifndef CONFIG_SET_MAX_ORDER
 #if (MAX_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS
 #error Allocator MAX_ORDER exceeds SECTION_SIZE
 #endif
+#endif /* CONFIG_SET_MAX_ORDER*/
 
 static inline unsigned long pfn_to_section_nr(unsigned long pfn)
 {
diff --git a/mm/Kconfig b/mm/Kconfig
index bbe31e85afee..e558f5679707 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -441,6 +441,20 @@ config SPARSEMEM_VMEMMAP
 	  pfn_to_page and page_to_pfn operations.  This is the most
 	  efficient option when sufficient kernel resources are available.
 
+config SET_MAX_ORDER
+	int "Set maximum order of buddy allocator"
+    depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER = 0)
+	range 10 255
+	default "10"
+	help
+	  The kernel memory allocator divides physically contiguous memory
+	  blocks into "zones", where each zone is a power of two number of
+	  pages.  This option selects the largest power of two that the kernel
+	  keeps in the memory allocator.  If you need to allocate very large
+	  blocks of physically contiguous memory, then you may need to
+	  increase this value. A value of 10 means that the largest free memory
+	  block is 2^10 pages.
+
 config HAVE_MEMBLOCK_PHYS_MAP
 	bool
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 10/12] mm: convert MAX_ORDER sized static arrays to dynamic ones.
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (8 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 11/12] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 12/12] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
  11 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

This prepares for the upcoming changes to make MAX_ORDER a boot time
parameter instead of compilation time constant. All static arrays with
MAX_ORDER size are converted to pointers and their memory is allocated
at runtime.

free_area array in struct zone is allocated using memblock_alloc_node()
at boot time and using kzalloc() when memory is hot-added.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 .../admin-guide/kdump/vmcoreinfo.rst          |  2 +-
 drivers/gpu/drm/ttm/ttm_device.c              |  7 ++-
 drivers/gpu/drm/ttm/ttm_pool.c                | 58 +++++++++++++++++--
 include/drm/ttm/ttm_pool.h                    |  4 +-
 include/linux/mmzone.h                        |  2 +-
 mm/page_alloc.c                               | 32 ++++++++--
 6 files changed, 87 insertions(+), 18 deletions(-)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index c572b5230fe0..a775462aa7c7 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -172,7 +172,7 @@ variables.
 Offset of the free_list's member. This value is used to compute the number
 of free pages.
 
-Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
+Each zone has a free_area structure array called free_area with length of MAX_ORDER + 1.
 The free_list represents a linked list of free page blocks.
 
 (list_head, next|prev)
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index e7147e304637..442a77bb5b4f 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -92,7 +92,9 @@ static int ttm_global_init(void)
 		>> PAGE_SHIFT;
 	num_dma32 = min(num_dma32, 2UL << (30 - PAGE_SHIFT));
 
-	ttm_pool_mgr_init(num_pages);
+	ret = ttm_pool_mgr_init(num_pages);
+	if (ret)
+		goto out;
 	ttm_tt_mgr_init(num_pages, num_dma32);
 
 	glob->dummy_read_page = alloc_page(__GFP_ZERO | GFP_DMA32);
@@ -218,7 +220,8 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
 	bdev->funcs = funcs;
 
 	ttm_sys_man_init(bdev);
-	ttm_pool_init(&bdev->pool, dev, use_dma_alloc, use_dma32);
+	if (ttm_pool_init(&bdev->pool, dev, use_dma_alloc, use_dma32))
+		return -ENOMEM;
 
 	bdev->vma_manager = vma_manager;
 	INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 85d19f425af6..d76f7d476421 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -64,11 +64,11 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER + 1];
-static struct ttm_pool_type global_uncached[MAX_ORDER + 1];
+static struct ttm_pool_type *global_write_combined;
+static struct ttm_pool_type *global_uncached;
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER + 1];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];
+static struct ttm_pool_type *global_dma32_write_combined;
+static struct ttm_pool_type *global_dma32_uncached;
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
@@ -493,8 +493,10 @@ EXPORT_SYMBOL(ttm_pool_free);
  * @use_dma32: true if GFP_DMA32 should be used
  *
  * Initialize the pool and its pool types.
+ *
+ * Returns: 0 on successe, negative error code otherwise
  */
-void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
+int ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 		   bool use_dma_alloc, bool use_dma32)
 {
 	unsigned int i, j;
@@ -506,11 +508,30 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 	pool->use_dma32 = use_dma32;
 
 	if (use_dma_alloc) {
-		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
+		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
+			pool->caching[i].orders =
+				kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+					GFP_KERNEL);
+			if (!pool->caching[i].orders) {
+				i--;
+				goto failed;
+			}
 			for (j = 0; j <= MAX_ORDER; ++j)
 				ttm_pool_type_init(&pool->caching[i].orders[j],
 						   pool, i, j);
+
+		}
+		return 0;
+
+failed:
+		for (; i >= 0; i--) {
+			for (j = 0; j <= MAX_ORDER; ++j)
+				ttm_pool_type_fini(&pool->caching[i].orders[j]);
+			kfree(pool->caching[i].orders);
+		}
+		return -ENOMEM;
 	}
+	return 0;
 }
 
 /**
@@ -701,6 +722,31 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
+	if (!global_write_combined) {
+		global_write_combined = kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+						GFP_KERNEL);
+		if (!global_write_combined)
+			return -ENOMEM;
+	}
+	if (!global_uncached) {
+		global_uncached = kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+					  GFP_KERNEL);
+		if (!global_uncached)
+			return -ENOMEM;
+	}
+	if (!global_dma32_write_combined) {
+		global_dma32_write_combined = kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+						      GFP_KERNEL);
+		if (!global_dma32_write_combined)
+			return -ENOMEM;
+	}
+	if (!global_dma32_uncached) {
+		global_dma32_uncached = kvcalloc(MAX_ORDER + 1, sizeof(struct ttm_pool_type),
+						GFP_KERNEL);
+		if (!global_dma32_uncached)
+			return -ENOMEM;
+	}
+
 	for (i = 0; i <= MAX_ORDER; ++i) {
 		ttm_pool_type_init(&global_write_combined[i], NULL,
 				   ttm_write_combined, i);
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 8ce14f9d202a..f5ce60f629ae 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -72,7 +72,7 @@ struct ttm_pool {
 	bool use_dma32;
 
 	struct {
-		struct ttm_pool_type orders[MAX_ORDER + 1];
+		struct ttm_pool_type *orders;
 	} caching[TTM_NUM_CACHING_TYPES];
 };
 
@@ -80,7 +80,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		   struct ttm_operation_ctx *ctx);
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt);
 
-void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
+int ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 		   bool use_dma_alloc, bool use_dma32);
 void ttm_pool_fini(struct ttm_pool *pool);
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b83b481e250b..60d8cce2aed8 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -635,7 +635,7 @@ struct zone {
 	ZONE_PADDING(_pad1_)
 
 	/* free areas of different sizes */
-	struct free_area	free_area[MAX_ORDER + 1];
+	struct free_area	*free_area;
 
 	/* zone flags, see below */
 	unsigned long		flags;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f3af7cd5164..941a94bb8cf0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6195,11 +6195,21 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 
 	for_each_populated_zone(zone) {
 		unsigned int order;
-		unsigned long nr[MAX_ORDER + 1], flags, total = 0;
-		unsigned char types[MAX_ORDER + 1];
+		unsigned long *nr, flags, total = 0;
+		unsigned char *types;
 
 		if (show_mem_node_skip(filter, zone_to_nid(zone), nodemask))
 			continue;
+
+		nr = kmalloc_array(MAX_ORDER + 1, sizeof(unsigned long), GFP_KERNEL);
+		if (!nr)
+			break;
+		types = kmalloc_array(MAX_ORDER + 1, sizeof(unsigned char), GFP_KERNEL);
+		if (!types) {
+			kfree(nr);
+			break;
+		}
+
 		show_node(zone);
 		printk(KERN_CONT "%s: ", zone->name);
 
@@ -7649,8 +7659,8 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
 	lruvec_init(&pgdat->__lruvec);
 }
 
-static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
-							unsigned long remaining_pages)
+static void __init zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
+					unsigned long remaining_pages, bool hotplug)
 {
 	atomic_long_set(&zone->managed_pages, remaining_pages);
 	zone_set_nid(zone, nid);
@@ -7659,6 +7669,16 @@ static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx,
 	spin_lock_init(&zone->lock);
 	zone_seqlock_init(zone);
 	zone_pcp_init(zone);
+	if (hotplug)
+		zone->free_area =
+			kcalloc_node(MAX_ORDER + 1, sizeof(struct free_area),
+				     GFP_KERNEL, nid);
+	else
+		zone->free_area =
+			memblock_alloc_node(sizeof(struct free_area) * (MAX_ORDER + 1),
+					    sizeof(struct free_area), nid);
+	BUG_ON(!zone->free_area);
+
 }
 
 /*
@@ -7697,7 +7717,7 @@ void __ref free_area_init_core_hotplug(struct pglist_data *pgdat)
 	}
 
 	for (z = 0; z < MAX_NR_ZONES; z++)
-		zone_init_internals(&pgdat->node_zones[z], z, nid, 0);
+		zone_init_internals(&pgdat->node_zones[z], z, nid, 0, true);
 }
 #endif
 
@@ -7760,7 +7780,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat)
 		 * when the bootmem allocator frees pages into the buddy system.
 		 * And all highmem pages will be managed by the buddy system.
 		 */
-		zone_init_internals(zone, j, nid, freesize);
+		zone_init_internals(zone, j, nid, freesize, false);
 
 		if (!size)
 			continue;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 11/12] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant.
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (9 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 10/12] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-11 23:16 ` [RFC PATCH v2 12/12] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
  11 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

For other MAX_ORDER uses (described below), there is no need or too much
hassle to convert certain static array to dynamic ones. Add
MIN_MAX_ORDER to serve as compile time constant in place of MAX_ORDER.

ARM64 hypervisor maintains its own free page list and does not import
any core kernel symbols, so soon-to-be runtime variable MAX_ORDER is not
accessible in ARM64 hypervisor code. Also there is no need to allocating
very large pages.

In SLAB/SLOB/SLUB, 2-D array kmalloc_caches uses MAX_ORDER in its second
dimension. It is too much hassle to allocate memory for kmalloc_caches
before any proper memory allocator is set up.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Quentin Perret <qperret@google.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h | 2 +-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c  | 2 +-
 include/linux/mmzone.h                | 3 +++
 include/linux/slab.h                  | 8 ++++----
 mm/slab.c                             | 2 +-
 mm/slub.c                             | 6 +++---
 6 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index fe5472a184a3..29b92f68ab69 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -16,7 +16,7 @@ struct hyp_pool {
 	 * API at EL2.
 	 */
 	hyp_spinlock_t lock;
-	struct list_head free_area[MAX_ORDER + 1];
+	struct list_head free_area[MIN_MAX_ORDER + 1];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
 	unsigned short max_order;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index d40f0b30b534..7ebbac3e2e76 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -241,7 +241,7 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
 	int i;
 
 	hyp_spin_lock_init(&pool->lock);
-	pool->max_order = min(MAX_ORDER, get_order((nr_pages + 1) << PAGE_SHIFT));
+	pool->max_order = min(MIN_MAX_ORDER, get_order((nr_pages + 1) << PAGE_SHIFT));
 	for (i = 0; i < pool->max_order; i++)
 		INIT_LIST_HEAD(&pool->free_area[i]);
 	pool->range_start = phys;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 60d8cce2aed8..b5774e4c2700 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -26,10 +26,13 @@
 /* Free memory management - zoned buddy allocator.  */
 #ifdef CONFIG_SET_MAX_ORDER
 #define MAX_ORDER CONFIG_SET_MAX_ORDER
+#define MIN_MAX_ORDER CONFIG_SET_MAX_ORDER
 #elif CONFIG_ARCH_FORCE_MAX_ORDER != 0
 #define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
+#define MIN_MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
 #else
 #define MAX_ORDER 10
+#define MIN_MAX_ORDER MAX_ORDER
 #endif
 
 #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 568b5dfb3bd9..e34b2c9bda09 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -251,8 +251,8 @@ static inline unsigned int arch_slab_minalign(void)
  * to do various tricks to work around compiler limitations in order to
  * ensure proper constant folding.
  */
-#define KMALLOC_SHIFT_HIGH	((MAX_ORDER + PAGE_SHIFT) <= 25 ? \
-				(MAX_ORDER + PAGE_SHIFT) : 25)
+#define KMALLOC_SHIFT_HIGH	((MIN_MAX_ORDER + PAGE_SHIFT) <= 25 ? \
+				(MIN_MAX_ORDER + PAGE_SHIFT) : 25)
 #define KMALLOC_SHIFT_MAX	KMALLOC_SHIFT_HIGH
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	5
@@ -265,7 +265,7 @@ static inline unsigned int arch_slab_minalign(void)
  * (PAGE_SIZE*2).  Larger requests are passed to the page allocator.
  */
 #define KMALLOC_SHIFT_HIGH	(PAGE_SHIFT + 1)
-#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT)
+#define KMALLOC_SHIFT_MAX	(MIN_MAX_ORDER + PAGE_SHIFT)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	3
 #endif
@@ -278,7 +278,7 @@ static inline unsigned int arch_slab_minalign(void)
  * be allocated from the same page.
  */
 #define KMALLOC_SHIFT_HIGH	PAGE_SHIFT
-#define KMALLOC_SHIFT_MAX	(MAX_ORDER + PAGE_SHIFT)
+#define KMALLOC_SHIFT_MAX	(MIN_MAX_ORDER + PAGE_SHIFT)
 #ifndef KMALLOC_SHIFT_LOW
 #define KMALLOC_SHIFT_LOW	3
 #endif
diff --git a/mm/slab.c b/mm/slab.c
index 530f418a4930..23798c32bb38 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -466,7 +466,7 @@ static int __init slab_max_order_setup(char *str)
 {
 	get_option(&str, &slab_max_order);
 	slab_max_order = slab_max_order < 0 ? 0 :
-				min(slab_max_order, MAX_ORDER);
+				min(slab_max_order, MIN_MAX_ORDER);
 	slab_max_order_set = true;
 
 	return 1;
diff --git a/mm/slub.c b/mm/slub.c
index 5acf5407cbc6..940fe48ea298 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3876,8 +3876,8 @@ static inline int calculate_order(unsigned int size)
 	/*
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */
-	order = calc_slab_order(size, 1, MAX_ORDER, 1);
-	if (order <= MAX_ORDER)
+	order = calc_slab_order(size, 1, MIN_MAX_ORDER, 1);
+	if (order <= MIN_MAX_ORDER)
 		return order;
 	return -ENOSYS;
 }
@@ -4388,7 +4388,7 @@ __setup("slub_min_order=", setup_slub_min_order);
 static int __init setup_slub_max_order(char *str)
 {
 	get_option(&str, (int *)&slub_max_order);
-	slub_max_order = min_t(unsigned int, slub_max_order, MAX_ORDER);
+	slub_max_order = min_t(unsigned int, slub_max_order, MIN_MAX_ORDER);
 
 	return 1;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 12/12] mm: make MAX_ORDER a kernel boot time parameter.
  2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
                   ` (10 preceding siblings ...)
  2022-08-11 23:16 ` [RFC PATCH v2 11/12] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
@ 2022-08-11 23:16 ` Zi Yan
  2022-08-13  1:11   ` Randy Dunlap
  11 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2022-08-11 23:16 UTC (permalink / raw)
  To: linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

From: Zi Yan <ziy@nvidia.com>

With the new buddy_alloc_max_order, users can specify larger MAX_ORDER
than set in CONFIG_ARCH_MAX_ORDER or CONFIG_SET_MAX_ORDER.
It can be set any value >= CONFIG_ARCH_MAX_ORDER or CONFIG_SET_MAX_ORDER,
but < 256 (limited by vmscan scan_control and per-cpu free page list).

Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-doc@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 include/linux/mmzone.h                        |  8 +++++
 mm/Kconfig                                    | 13 +++++++
 mm/page_alloc.c                               | 34 ++++++++++++++++++-
 mm/vmscan.c                                   |  1 -
 5 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ec519225b671..0f71233ae396 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -494,6 +494,11 @@
 	bttv.pll=	See Documentation/admin-guide/media/bttv.rst
 	bttv.tuner=
 
+	buddy_alloc_max_order=	[KNL] This parameter adjusts the size of largest
+			pages that can be allocated from kernel buddy allocator. The largest
+			page size is 2^buddy_alloc_max_order * PAGE_SIZE.
+            Format: integer
+
 	bulk_remove=off	[PPC]  This parameter disables the use of the pSeries
 			firmware feature for flushing multiple hpte entries
 			at a time.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b5774e4c2700..90121d25d660 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,6 +35,14 @@
 #define MIN_MAX_ORDER MAX_ORDER
 #endif
 
+/* remap MAX_ORDER to buddy_alloc_max_order for boot time adjustment */
+#ifdef CONFIG_BOOT_TIME_MAX_ORDER
+/* Defined in mm/page_alloc.c */
+extern int buddy_alloc_max_order;
+#undef MAX_ORDER
+#define MAX_ORDER buddy_alloc_max_order
+#endif /* CONFIG_BOOT_TIME_MAX_ORDER */
+
 #define MAX_ORDER_NR_PAGES (1 << MAX_ORDER)
 
 /*
diff --git a/mm/Kconfig b/mm/Kconfig
index e558f5679707..acccb919d72d 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -455,6 +455,19 @@ config SET_MAX_ORDER
 	  increase this value. A value of 10 means that the largest free memory
 	  block is 2^10 pages.
 
+config BOOT_TIME_MAX_ORDER
+	bool "Set maximum order of buddy allocator at boot time"
+	depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER != 0 || SET_MAX_ORDER != 0)
+	help
+	  It enables users to set the maximum order of buddy allocator at system
+      boot time instead of a static MACRO set at compilation time. Systems with
+      a lot of memory might want to allocate large pages whereas it is much
+      less feasible and desirable for systems with less memory. This option
+      allows different systems to control the largest page they want to
+      allocate. By default, MAX_ORDER will be set to ARCH_FORCE_MAX_ORDER or
+      SET_MAX_ORDER, whichever is non-zero, when the boot time parameter is not
+      set. The maximum of MAX_ORDER is currently limited at 256.
+
 config HAVE_MEMBLOCK_PHYS_MAP
 	bool
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 941a94bb8cf0..4c4d68da1922 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1581,7 +1581,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 
 		order = pindex_to_order(pindex);
 		nr_pages = 1 << order;
-		BUILD_BUG_ON(MAX_ORDER >= (1<<NR_PCP_ORDER_WIDTH));
+		BUILD_BUG_ON(MIN_MAX_ORDER >= (1<<NR_PCP_ORDER_WIDTH));
 		do {
 			int mt;
 
@@ -9679,3 +9679,35 @@ bool has_managed_dma(void)
 	return false;
 }
 #endif /* CONFIG_ZONE_DMA */
+
+#ifdef CONFIG_BOOT_TIME_MAX_ORDER
+int buddy_alloc_max_order = MIN_MAX_ORDER;
+EXPORT_SYMBOL(buddy_alloc_max_order);
+
+static int __init buddy_alloc_set(char *val)
+{
+	int ret;
+	unsigned long max_order;
+
+	ret = kstrtoul(val, 10, &max_order);
+
+	if (ret < 0)
+		return -EINVAL;
+
+	/*
+	 * max_order is also limited at below locations:
+	 * 1. scan_control in mm/vmscan.c uses s8 field for order, max_order cannot
+	 * be bigger than S8_MAX before the field is changed.
+	 * 2. free_pcppages_bulk has max_order upper limit.
+	 */
+	if (max_order > MIN_MAX_ORDER && max_order <= S8_MAX &&
+	    max_order <= (1<<NR_PCP_ORDER_WIDTH))
+		buddy_alloc_max_order = max_order;
+	else
+		buddy_alloc_max_order = MIN_MAX_ORDER;
+
+	return 0;
+}
+
+early_param("buddy_alloc_max_order", buddy_alloc_set);
+#endif /* CONFIG_BOOT_TIME_MAX_ORDER */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 06eeeae038dd..9d4fde8705d9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3816,7 +3816,6 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 	 * scan_control uses s8 fields for order, priority, and reclaim_idx.
 	 * Confirm they are large enough for max values.
 	 */
-	BUILD_BUG_ON(MAX_ORDER > S8_MAX);
 	BUILD_BUG_ON(DEF_PRIORITY > S8_MAX);
 	BUILD_BUG_ON(MAX_NR_ZONES > S8_MAX);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER.
  2022-08-11 23:16 ` [RFC PATCH v2 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
@ 2022-08-13  1:11   ` Randy Dunlap
  2022-08-13  2:37     ` Zi Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Randy Dunlap @ 2022-08-13  1:11 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

Hi--

On 8/11/22 16:16, Zi Yan wrote:

> diff --git a/mm/Kconfig b/mm/Kconfig
> index bbe31e85afee..e558f5679707 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -441,6 +441,20 @@ config SPARSEMEM_VMEMMAP
>  	  pfn_to_page and page_to_pfn operations.  This is the most
>  	  efficient option when sufficient kernel resources are available.
>  
> +config SET_MAX_ORDER
> +	int "Set maximum order of buddy allocator"
> +    depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER = 0)
> +	range 10 255
> +	default "10"
> +	help
> +	  The kernel memory allocator divides physically contiguous memory
> +	  blocks into "zones", where each zone is a power of two number of
> +	  pages.  This option selects the largest power of two that the kernel
> +	  keeps in the memory allocator.  If you need to allocate very large
> +	  blocks of physically contiguous memory, then you may need to
> +	  increase this value. A value of 10 means that the largest free memory
> +	  block is 2^10 pages.

Please make sure that all lines of help text are indented with one tab + 2 spaces,
as specified in Documentation/process/coding-style.rst.

thanks.
-- 
~Randy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 12/12] mm: make MAX_ORDER a kernel boot time parameter.
  2022-08-11 23:16 ` [RFC PATCH v2 12/12] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
@ 2022-08-13  1:11   ` Randy Dunlap
  2022-08-13  2:38     ` Zi Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Randy Dunlap @ 2022-08-13  1:11 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

Hi--

On 8/11/22 16:16, Zi Yan wrote:
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e558f5679707..acccb919d72d 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -455,6 +455,19 @@ config SET_MAX_ORDER
>  	  increase this value. A value of 10 means that the largest free memory
>  	  block is 2^10 pages.
>  
> +config BOOT_TIME_MAX_ORDER
> +	bool "Set maximum order of buddy allocator at boot time"
> +	depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER != 0 || SET_MAX_ORDER != 0)
> +	help
> +	  It enables users to set the maximum order of buddy allocator at system
> +      boot time instead of a static MACRO set at compilation time. Systems with
> +      a lot of memory might want to allocate large pages whereas it is much
> +      less feasible and desirable for systems with less memory. This option
> +      allows different systems to control the largest page they want to
> +      allocate. By default, MAX_ORDER will be set to ARCH_FORCE_MAX_ORDER or
> +      SET_MAX_ORDER, whichever is non-zero, when the boot time parameter is not
> +      set. The maximum of MAX_ORDER is currently limited at 256.

Please make sure that all lines of help text are indented with one tab + 2 spaces,
as specified in Documentation/process/coding-style.rst.

Thanks.
-- 
~Randy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER.
  2022-08-13  1:11   ` Randy Dunlap
@ 2022-08-13  2:37     ` Zi Yan
  2022-08-13  2:40       ` Randy Dunlap
  0 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2022-08-13  2:37 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-mm, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]


On 12 Aug 2022, at 21:11, Randy Dunlap wrote:

> Hi--
>
> On 8/11/22 16:16, Zi Yan wrote:
>
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index bbe31e85afee..e558f5679707 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -441,6 +441,20 @@ config SPARSEMEM_VMEMMAP
>>  	  pfn_to_page and page_to_pfn operations.  This is the most
>>  	  efficient option when sufficient kernel resources are available.
>>
>> +config SET_MAX_ORDER
>> +	int "Set maximum order of buddy allocator"
>> +    depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER = 0)
>> +	range 10 255
>> +	default "10"
>> +	help
>> +	  The kernel memory allocator divides physically contiguous memory
>> +	  blocks into "zones", where each zone is a power of two number of
>> +	  pages.  This option selects the largest power of two that the kernel
>> +	  keeps in the memory allocator.  If you need to allocate very large
>> +	  blocks of physically contiguous memory, then you may need to
>> +	  increase this value. A value of 10 means that the largest free memory
>> +	  block is 2^10 pages.
>
> Please make sure that all lines of help text are indented with one tab + 2 spaces,
> as specified in Documentation/process/coding-style.rst.

I guess you mean the wrong indentation of "depends on" here, since all
the help text is correctly indented. Thanks. I fixed it locally.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 12/12] mm: make MAX_ORDER a kernel boot time parameter.
  2022-08-13  1:11   ` Randy Dunlap
@ 2022-08-13  2:38     ` Zi Yan
  0 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-13  2:38 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-mm, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]



On 12 Aug 2022, at 21:11, Randy Dunlap wrote:

> Hi--
>
> On 8/11/22 16:16, Zi Yan wrote:
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index e558f5679707..acccb919d72d 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -455,6 +455,19 @@ config SET_MAX_ORDER
>>  	  increase this value. A value of 10 means that the largest free memory
>>  	  block is 2^10 pages.
>>
>> +config BOOT_TIME_MAX_ORDER
>> +	bool "Set maximum order of buddy allocator at boot time"
>> +	depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER != 0 || SET_MAX_ORDER != 0)
>> +	help
>> +	  It enables users to set the maximum order of buddy allocator at system
>> +      boot time instead of a static MACRO set at compilation time. Systems with
>> +      a lot of memory might want to allocate large pages whereas it is much
>> +      less feasible and desirable for systems with less memory. This option
>> +      allows different systems to control the largest page they want to
>> +      allocate. By default, MAX_ORDER will be set to ARCH_FORCE_MAX_ORDER or
>> +      SET_MAX_ORDER, whichever is non-zero, when the boot time parameter is not
>> +      set. The maximum of MAX_ORDER is currently limited at 256.
>
> Please make sure that all lines of help text are indented with one tab + 2 spaces,
> as specified in Documentation/process/coding-style.rst.

Thanks. I fixed it locally.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER.
  2022-08-13  2:37     ` Zi Yan
@ 2022-08-13  2:40       ` Randy Dunlap
  0 siblings, 0 replies; 21+ messages in thread
From: Randy Dunlap @ 2022-08-13  2:40 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, Mike Rapoport, linux-kernel



On 8/12/22 19:37, Zi Yan wrote:
> 
> On 12 Aug 2022, at 21:11, Randy Dunlap wrote:
> 
>> Hi--
>>
>> On 8/11/22 16:16, Zi Yan wrote:
>>
>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>> index bbe31e85afee..e558f5679707 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -441,6 +441,20 @@ config SPARSEMEM_VMEMMAP
>>>  	  pfn_to_page and page_to_pfn operations.  This is the most
>>>  	  efficient option when sufficient kernel resources are available.
>>>
>>> +config SET_MAX_ORDER
>>> +	int "Set maximum order of buddy allocator"
>>> +    depends on SPARSEMEM_VMEMMAP && (ARCH_FORCE_MAX_ORDER = 0)
>>> +	range 10 255
>>> +	default "10"
>>> +	help
>>> +	  The kernel memory allocator divides physically contiguous memory
>>> +	  blocks into "zones", where each zone is a power of two number of
>>> +	  pages.  This option selects the largest power of two that the kernel
>>> +	  keeps in the memory allocator.  If you need to allocate very large
>>> +	  blocks of physically contiguous memory, then you may need to
>>> +	  increase this value. A value of 10 means that the largest free memory
>>> +	  block is 2^10 pages.
>>
>> Please make sure that all lines of help text are indented with one tab + 2 spaces,
>> as specified in Documentation/process/coding-style.rst.
> 
> I guess you mean the wrong indentation of "depends on" here, since all
> the help text is correctly indented. Thanks. I fixed it locally.

Oops, yes. Thanks.

-- 
~Randy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 01/12] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER
  2022-08-11 23:16 ` [RFC PATCH v2 01/12] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER Zi Yan
@ 2022-08-13 15:36   ` Mike Rapoport
  2022-08-15 12:53     ` Zi Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Rapoport @ 2022-08-13 15:36 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, linux-kernel

On Thu, Aug 11, 2022 at 07:16:32PM -0400, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> This Kconfig option is used by individual arch to set its desired
> MAX_ORDER. Rename it to reflect its actual use.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Shawn Guo <shawnguo@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: Ley Foon Tan <ley.foon.tan@intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Chris Zankel <chris@zankel.net>
> Cc: linux-snps-arc@lists.infradead.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-oxnas@groups.io
> Cc: linux-csky@vger.kernel.org
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-mips@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linux-xtensa@linux-xtensa.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arc/Kconfig                             | 2 +-
>  arch/arm/Kconfig                             | 2 +-
>  arch/arm/configs/imx_v6_v7_defconfig         | 2 +-
>  arch/arm/configs/milbeaut_m10v_defconfig     | 2 +-
>  arch/arm/configs/oxnas_v6_defconfig          | 2 +-
>  arch/arm/configs/sama7_defconfig             | 2 +-
>  arch/arm64/Kconfig                           | 2 +-
>  arch/csky/Kconfig                            | 2 +-
>  arch/ia64/Kconfig                            | 2 +-
>  arch/ia64/include/asm/sparsemem.h            | 6 +++---
>  arch/m68k/Kconfig.cpu                        | 2 +-
>  arch/mips/Kconfig                            | 2 +-
>  arch/nios2/Kconfig                           | 2 +-
>  arch/powerpc/Kconfig                         | 2 +-
>  arch/powerpc/configs/85xx/ge_imp3a_defconfig | 2 +-
>  arch/powerpc/configs/fsl-emb-nonhw.config    | 2 +-
>  arch/sh/configs/ecovec24_defconfig           | 2 +-
>  arch/sh/mm/Kconfig                           | 2 +-
>  arch/sparc/Kconfig                           | 2 +-
>  arch/xtensa/Kconfig                          | 2 +-
>  include/linux/mmzone.h                       | 4 ++--
>  21 files changed, 24 insertions(+), 24 deletions(-)

This misses arch/loongarch.

Other than that I think its a good cleanup regardless of the rest of the
series.

Acked-by: Mike Rapoport <rppt@linux.ibm.com>

> 
> diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
> index 9e3653253ef2..d9a13ccf89a3 100644
> --- a/arch/arc/Kconfig
> +++ b/arch/arc/Kconfig
> @@ -554,7 +554,7 @@ config ARC_BUILTIN_DTB_NAME
>  
>  endmenu	 # "ARC Architecture Configuration"
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	default "12" if ARC_HUGEPAGE_16M
>  	default "11"
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 87badeae3181..e6c8ee56ac52 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1434,7 +1434,7 @@ config ARM_MODULE_PLTS
>  	  Disabling this is usually safe for small single-platform
>  	  configurations. If unsure, say y.
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	default "12" if SOC_AM33XX
>  	default "9" if SA1111
> diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
> index 01012537a9b9..fb283059daa0 100644
> --- a/arch/arm/configs/imx_v6_v7_defconfig
> +++ b/arch/arm/configs/imx_v6_v7_defconfig
> @@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
>  CONFIG_SMP=y
>  CONFIG_ARM_PSCI=y
>  CONFIG_HIGHMEM=y
> -CONFIG_FORCE_MAX_ZONEORDER=14
> +CONFIG_ARCH_FORCE_MAX_ORDER=14
>  CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
>  CONFIG_KEXEC=y
>  CONFIG_CPU_FREQ=y
> diff --git a/arch/arm/configs/milbeaut_m10v_defconfig b/arch/arm/configs/milbeaut_m10v_defconfig
> index 58810e98de3d..8620061e19a8 100644
> --- a/arch/arm/configs/milbeaut_m10v_defconfig
> +++ b/arch/arm/configs/milbeaut_m10v_defconfig
> @@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
>  # CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
>  # CONFIG_ARM_PATCH_IDIV is not set
>  CONFIG_HIGHMEM=y
> -CONFIG_FORCE_MAX_ZONEORDER=12
> +CONFIG_ARCH_FORCE_MAX_ORDER=12
>  CONFIG_SECCOMP=y
>  CONFIG_KEXEC=y
>  CONFIG_EFI=y
> diff --git a/arch/arm/configs/oxnas_v6_defconfig b/arch/arm/configs/oxnas_v6_defconfig
> index 600f78b363dd..5c163a9d1429 100644
> --- a/arch/arm/configs/oxnas_v6_defconfig
> +++ b/arch/arm/configs/oxnas_v6_defconfig
> @@ -12,7 +12,7 @@ CONFIG_ARCH_OXNAS=y
>  CONFIG_MACH_OX820=y
>  CONFIG_SMP=y
>  CONFIG_NR_CPUS=16
> -CONFIG_FORCE_MAX_ZONEORDER=12
> +CONFIG_ARCH_FORCE_MAX_ORDER=12
>  CONFIG_SECCOMP=y
>  CONFIG_ARM_APPENDED_DTB=y
>  CONFIG_ARM_ATAG_DTB_COMPAT=y
> diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig
> index 0384030d8b25..8b2cf6ddd568 100644
> --- a/arch/arm/configs/sama7_defconfig
> +++ b/arch/arm/configs/sama7_defconfig
> @@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
>  # CONFIG_CACHE_L2X0 is not set
>  # CONFIG_ARM_PATCH_IDIV is not set
>  # CONFIG_CPU_SW_DOMAIN_PAN is not set
> -CONFIG_FORCE_MAX_ZONEORDER=15
> +CONFIG_ARCH_FORCE_MAX_ORDER=15
>  CONFIG_UACCESS_WITH_MEMCPY=y
>  # CONFIG_ATAGS is not set
>  CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 571cc234d0b3..c6fcd8746f60 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1401,7 +1401,7 @@ config XEN
>  	help
>  	  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int
>  	default "14" if ARM64_64K_PAGES
>  	default "12" if ARM64_16K_PAGES
> diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
> index 3cbc2dc62baf..adee6ab36862 100644
> --- a/arch/csky/Kconfig
> +++ b/arch/csky/Kconfig
> @@ -332,7 +332,7 @@ config HIGHMEM
>  	select KMAP_LOCAL
>  	default y
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	default "11"
>  
> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> index 26ac8ea15a9e..c6e06cdc738f 100644
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -200,7 +200,7 @@ config IA64_CYCLONE
>  	  Say Y here to enable support for IBM EXA Cyclone time source.
>  	  If you're unsure, answer N.
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "MAX_ORDER (11 - 17)"  if !HUGETLB_PAGE
>  	range 11 17  if !HUGETLB_PAGE
>  	default "17" if HUGETLB_PAGE
> diff --git a/arch/ia64/include/asm/sparsemem.h b/arch/ia64/include/asm/sparsemem.h
> index 42ed5248fae9..84e8ce387b69 100644
> --- a/arch/ia64/include/asm/sparsemem.h
> +++ b/arch/ia64/include/asm/sparsemem.h
> @@ -11,10 +11,10 @@
>  
>  #define SECTION_SIZE_BITS	(30)
>  #define MAX_PHYSMEM_BITS	(50)
> -#ifdef CONFIG_FORCE_MAX_ZONEORDER
> -#if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
> +#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
> +#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
>  #undef SECTION_SIZE_BITS
> -#define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT)
> +#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT)
>  #endif
>  #endif
>  
> diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
> index e0e9e31339c1..3b2f39508524 100644
> --- a/arch/m68k/Kconfig.cpu
> +++ b/arch/m68k/Kconfig.cpu
> @@ -399,7 +399,7 @@ config SINGLE_MEMORY_CHUNK
>  	  order" to save memory that could be wasted for unused memory map.
>  	  Say N if not sure.
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order" if ADVANCED
>  	depends on !SINGLE_MEMORY_CHUNK
>  	default "11"
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index ec21f8999249..70d28976a40d 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -2140,7 +2140,7 @@ config PAGE_SIZE_64KB
>  
>  endchoice
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
>  	default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
> diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
> index 4167f1eb4cd8..a582f72104f3 100644
> --- a/arch/nios2/Kconfig
> +++ b/arch/nios2/Kconfig
> @@ -44,7 +44,7 @@ menu "Kernel features"
>  
>  source "kernel/Kconfig.hz"
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	range 9 20
>  	default "11"
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 4c466acdc70d..39d71d7701bd 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -845,7 +845,7 @@ config DATA_SHIFT
>  	  in that case. If PIN_TLB is selected, it must be aligned to 8M as
>  	  8M pages will be pinned.
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	range 8 9 if PPC64 && PPC_64K_PAGES
>  	default "9" if PPC64 && PPC_64K_PAGES
> diff --git a/arch/powerpc/configs/85xx/ge_imp3a_defconfig b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
> index f29c166998af..e7672c186325 100644
> --- a/arch/powerpc/configs/85xx/ge_imp3a_defconfig
> +++ b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
> @@ -30,7 +30,7 @@ CONFIG_PREEMPT=y
>  # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
>  CONFIG_BINFMT_MISC=m
>  CONFIG_MATH_EMULATION=y
> -CONFIG_FORCE_MAX_ZONEORDER=17
> +CONFIG_ARCH_FORCE_MAX_ORDER=17
>  CONFIG_PCI=y
>  CONFIG_PCIEPORTBUS=y
>  CONFIG_PCI_MSI=y
> diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config b/arch/powerpc/configs/fsl-emb-nonhw.config
> index f14c6dbd7346..ab8a8c4530d9 100644
> --- a/arch/powerpc/configs/fsl-emb-nonhw.config
> +++ b/arch/powerpc/configs/fsl-emb-nonhw.config
> @@ -41,7 +41,7 @@ CONFIG_FIXED_PHY=y
>  CONFIG_FONT_8x16=y
>  CONFIG_FONT_8x8=y
>  CONFIG_FONTS=y
> -CONFIG_FORCE_MAX_ZONEORDER=13
> +CONFIG_ARCH_FORCE_MAX_ORDER=13
>  CONFIG_FRAMEBUFFER_CONSOLE=y
>  CONFIG_FRAME_WARN=1024
>  CONFIG_FTL=y
> diff --git a/arch/sh/configs/ecovec24_defconfig b/arch/sh/configs/ecovec24_defconfig
> index e699e2e04128..b52e14ccb450 100644
> --- a/arch/sh/configs/ecovec24_defconfig
> +++ b/arch/sh/configs/ecovec24_defconfig
> @@ -8,7 +8,7 @@ CONFIG_MODULES=y
>  CONFIG_MODULE_UNLOAD=y
>  # CONFIG_BLK_DEV_BSG is not set
>  CONFIG_CPU_SUBTYPE_SH7724=y
> -CONFIG_FORCE_MAX_ZONEORDER=12
> +CONFIG_ARCH_FORCE_MAX_ORDER=12
>  CONFIG_MEMORY_SIZE=0x10000000
>  CONFIG_FLATMEM_MANUAL=y
>  CONFIG_SH_ECOVEC=y
> diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
> index ba569cfb4368..411fdc0901f7 100644
> --- a/arch/sh/mm/Kconfig
> +++ b/arch/sh/mm/Kconfig
> @@ -18,7 +18,7 @@ config PAGE_OFFSET
>  	default "0x80000000" if MMU
>  	default "0x00000000"
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	range 9 64 if PAGE_SIZE_16KB
>  	default "9" if PAGE_SIZE_16KB
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index 1c852bb530ec..4d3d1af90d52 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -269,7 +269,7 @@ config ARCH_SPARSEMEM_ENABLE
>  config ARCH_SPARSEMEM_DEFAULT
>  	def_bool y if SPARC64
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	default "13"
>  	help
> diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
> index 12ac277282ba..bcb0c5d2abc2 100644
> --- a/arch/xtensa/Kconfig
> +++ b/arch/xtensa/Kconfig
> @@ -771,7 +771,7 @@ config HIGHMEM
>  
>  	  If unsure, say Y.
>  
> -config FORCE_MAX_ZONEORDER
> +config ARCH_FORCE_MAX_ORDER
>  	int "Maximum zone order"
>  	default "11"
>  	help
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 8f571dc7c524..ca285ed3c6e0 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -24,10 +24,10 @@
>  #include <asm/page.h>
>  
>  /* Free memory management - zoned buddy allocator.  */
> -#ifndef CONFIG_FORCE_MAX_ZONEORDER
> +#ifndef CONFIG_ARCH_FORCE_MAX_ORDER
>  #define MAX_ORDER 11
>  #else
> -#define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
> +#define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
>  #endif
>  #define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
>  
> -- 
> 2.35.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 01/12] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER
  2022-08-13 15:36   ` Mike Rapoport
@ 2022-08-15 12:53     ` Zi Yan
  0 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2022-08-15 12:53 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-mm, David Hildenbrand, Matthew Wilcox, Vlastimil Babka,
	Kirill A . Shutemov, Mike Kravetz, John Hubbard, Yang Shi,
	David Rientjes, James Houghton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 13138 bytes --]

On 13 Aug 2022, at 11:36, Mike Rapoport wrote:

> On Thu, Aug 11, 2022 at 07:16:32PM -0400, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> This Kconfig option is used by individual arch to set its desired
>> MAX_ORDER. Rename it to reflect its actual use.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Cc: Vineet Gupta <vgupta@synopsys.com>
>> Cc: Shawn Guo <shawnguo@kernel.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Guo Ren <guoren@kernel.org>
>> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
>> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
>> Cc: Ley Foon Tan <ley.foon.tan@intel.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Chris Zankel <chris@zankel.net>
>> Cc: linux-snps-arc@lists.infradead.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-oxnas@groups.io
>> Cc: linux-csky@vger.kernel.org
>> Cc: linux-ia64@vger.kernel.org
>> Cc: linux-m68k@lists.linux-m68k.org
>> Cc: linux-mips@vger.kernel.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-sh@vger.kernel.org
>> Cc: sparclinux@vger.kernel.org
>> Cc: linux-xtensa@linux-xtensa.org
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  arch/arc/Kconfig                             | 2 +-
>>  arch/arm/Kconfig                             | 2 +-
>>  arch/arm/configs/imx_v6_v7_defconfig         | 2 +-
>>  arch/arm/configs/milbeaut_m10v_defconfig     | 2 +-
>>  arch/arm/configs/oxnas_v6_defconfig          | 2 +-
>>  arch/arm/configs/sama7_defconfig             | 2 +-
>>  arch/arm64/Kconfig                           | 2 +-
>>  arch/csky/Kconfig                            | 2 +-
>>  arch/ia64/Kconfig                            | 2 +-
>>  arch/ia64/include/asm/sparsemem.h            | 6 +++---
>>  arch/m68k/Kconfig.cpu                        | 2 +-
>>  arch/mips/Kconfig                            | 2 +-
>>  arch/nios2/Kconfig                           | 2 +-
>>  arch/powerpc/Kconfig                         | 2 +-
>>  arch/powerpc/configs/85xx/ge_imp3a_defconfig | 2 +-
>>  arch/powerpc/configs/fsl-emb-nonhw.config    | 2 +-
>>  arch/sh/configs/ecovec24_defconfig           | 2 +-
>>  arch/sh/mm/Kconfig                           | 2 +-
>>  arch/sparc/Kconfig                           | 2 +-
>>  arch/xtensa/Kconfig                          | 2 +-
>>  include/linux/mmzone.h                       | 4 ++--
>>  21 files changed, 24 insertions(+), 24 deletions(-)
>
> This misses arch/loongarch.

Will add it.

>
> Other than that I think its a good cleanup regardless of the rest of the
> series.
>
> Acked-by: Mike Rapoport <rppt@linux.ibm.com>

Thanks. I will send this out separately.

>
>>
>> diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
>> index 9e3653253ef2..d9a13ccf89a3 100644
>> --- a/arch/arc/Kconfig
>> +++ b/arch/arc/Kconfig
>> @@ -554,7 +554,7 @@ config ARC_BUILTIN_DTB_NAME
>>
>>  endmenu	 # "ARC Architecture Configuration"
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	default "12" if ARC_HUGEPAGE_16M
>>  	default "11"
>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>> index 87badeae3181..e6c8ee56ac52 100644
>> --- a/arch/arm/Kconfig
>> +++ b/arch/arm/Kconfig
>> @@ -1434,7 +1434,7 @@ config ARM_MODULE_PLTS
>>  	  Disabling this is usually safe for small single-platform
>>  	  configurations. If unsure, say y.
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	default "12" if SOC_AM33XX
>>  	default "9" if SA1111
>> diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
>> index 01012537a9b9..fb283059daa0 100644
>> --- a/arch/arm/configs/imx_v6_v7_defconfig
>> +++ b/arch/arm/configs/imx_v6_v7_defconfig
>> @@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
>>  CONFIG_SMP=y
>>  CONFIG_ARM_PSCI=y
>>  CONFIG_HIGHMEM=y
>> -CONFIG_FORCE_MAX_ZONEORDER=14
>> +CONFIG_ARCH_FORCE_MAX_ORDER=14
>>  CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
>>  CONFIG_KEXEC=y
>>  CONFIG_CPU_FREQ=y
>> diff --git a/arch/arm/configs/milbeaut_m10v_defconfig b/arch/arm/configs/milbeaut_m10v_defconfig
>> index 58810e98de3d..8620061e19a8 100644
>> --- a/arch/arm/configs/milbeaut_m10v_defconfig
>> +++ b/arch/arm/configs/milbeaut_m10v_defconfig
>> @@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
>>  # CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
>>  # CONFIG_ARM_PATCH_IDIV is not set
>>  CONFIG_HIGHMEM=y
>> -CONFIG_FORCE_MAX_ZONEORDER=12
>> +CONFIG_ARCH_FORCE_MAX_ORDER=12
>>  CONFIG_SECCOMP=y
>>  CONFIG_KEXEC=y
>>  CONFIG_EFI=y
>> diff --git a/arch/arm/configs/oxnas_v6_defconfig b/arch/arm/configs/oxnas_v6_defconfig
>> index 600f78b363dd..5c163a9d1429 100644
>> --- a/arch/arm/configs/oxnas_v6_defconfig
>> +++ b/arch/arm/configs/oxnas_v6_defconfig
>> @@ -12,7 +12,7 @@ CONFIG_ARCH_OXNAS=y
>>  CONFIG_MACH_OX820=y
>>  CONFIG_SMP=y
>>  CONFIG_NR_CPUS=16
>> -CONFIG_FORCE_MAX_ZONEORDER=12
>> +CONFIG_ARCH_FORCE_MAX_ORDER=12
>>  CONFIG_SECCOMP=y
>>  CONFIG_ARM_APPENDED_DTB=y
>>  CONFIG_ARM_ATAG_DTB_COMPAT=y
>> diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig
>> index 0384030d8b25..8b2cf6ddd568 100644
>> --- a/arch/arm/configs/sama7_defconfig
>> +++ b/arch/arm/configs/sama7_defconfig
>> @@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
>>  # CONFIG_CACHE_L2X0 is not set
>>  # CONFIG_ARM_PATCH_IDIV is not set
>>  # CONFIG_CPU_SW_DOMAIN_PAN is not set
>> -CONFIG_FORCE_MAX_ZONEORDER=15
>> +CONFIG_ARCH_FORCE_MAX_ORDER=15
>>  CONFIG_UACCESS_WITH_MEMCPY=y
>>  # CONFIG_ATAGS is not set
>>  CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 571cc234d0b3..c6fcd8746f60 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1401,7 +1401,7 @@ config XEN
>>  	help
>>  	  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int
>>  	default "14" if ARM64_64K_PAGES
>>  	default "12" if ARM64_16K_PAGES
>> diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
>> index 3cbc2dc62baf..adee6ab36862 100644
>> --- a/arch/csky/Kconfig
>> +++ b/arch/csky/Kconfig
>> @@ -332,7 +332,7 @@ config HIGHMEM
>>  	select KMAP_LOCAL
>>  	default y
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	default "11"
>>
>> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
>> index 26ac8ea15a9e..c6e06cdc738f 100644
>> --- a/arch/ia64/Kconfig
>> +++ b/arch/ia64/Kconfig
>> @@ -200,7 +200,7 @@ config IA64_CYCLONE
>>  	  Say Y here to enable support for IBM EXA Cyclone time source.
>>  	  If you're unsure, answer N.
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "MAX_ORDER (11 - 17)"  if !HUGETLB_PAGE
>>  	range 11 17  if !HUGETLB_PAGE
>>  	default "17" if HUGETLB_PAGE
>> diff --git a/arch/ia64/include/asm/sparsemem.h b/arch/ia64/include/asm/sparsemem.h
>> index 42ed5248fae9..84e8ce387b69 100644
>> --- a/arch/ia64/include/asm/sparsemem.h
>> +++ b/arch/ia64/include/asm/sparsemem.h
>> @@ -11,10 +11,10 @@
>>
>>  #define SECTION_SIZE_BITS	(30)
>>  #define MAX_PHYSMEM_BITS	(50)
>> -#ifdef CONFIG_FORCE_MAX_ZONEORDER
>> -#if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
>> +#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
>> +#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
>>  #undef SECTION_SIZE_BITS
>> -#define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT)
>> +#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT)
>>  #endif
>>  #endif
>>
>> diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
>> index e0e9e31339c1..3b2f39508524 100644
>> --- a/arch/m68k/Kconfig.cpu
>> +++ b/arch/m68k/Kconfig.cpu
>> @@ -399,7 +399,7 @@ config SINGLE_MEMORY_CHUNK
>>  	  order" to save memory that could be wasted for unused memory map.
>>  	  Say N if not sure.
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order" if ADVANCED
>>  	depends on !SINGLE_MEMORY_CHUNK
>>  	default "11"
>> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
>> index ec21f8999249..70d28976a40d 100644
>> --- a/arch/mips/Kconfig
>> +++ b/arch/mips/Kconfig
>> @@ -2140,7 +2140,7 @@ config PAGE_SIZE_64KB
>>
>>  endchoice
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
>>  	default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
>> diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig
>> index 4167f1eb4cd8..a582f72104f3 100644
>> --- a/arch/nios2/Kconfig
>> +++ b/arch/nios2/Kconfig
>> @@ -44,7 +44,7 @@ menu "Kernel features"
>>
>>  source "kernel/Kconfig.hz"
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	range 9 20
>>  	default "11"
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 4c466acdc70d..39d71d7701bd 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -845,7 +845,7 @@ config DATA_SHIFT
>>  	  in that case. If PIN_TLB is selected, it must be aligned to 8M as
>>  	  8M pages will be pinned.
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	range 8 9 if PPC64 && PPC_64K_PAGES
>>  	default "9" if PPC64 && PPC_64K_PAGES
>> diff --git a/arch/powerpc/configs/85xx/ge_imp3a_defconfig b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
>> index f29c166998af..e7672c186325 100644
>> --- a/arch/powerpc/configs/85xx/ge_imp3a_defconfig
>> +++ b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
>> @@ -30,7 +30,7 @@ CONFIG_PREEMPT=y
>>  # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
>>  CONFIG_BINFMT_MISC=m
>>  CONFIG_MATH_EMULATION=y
>> -CONFIG_FORCE_MAX_ZONEORDER=17
>> +CONFIG_ARCH_FORCE_MAX_ORDER=17
>>  CONFIG_PCI=y
>>  CONFIG_PCIEPORTBUS=y
>>  CONFIG_PCI_MSI=y
>> diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config b/arch/powerpc/configs/fsl-emb-nonhw.config
>> index f14c6dbd7346..ab8a8c4530d9 100644
>> --- a/arch/powerpc/configs/fsl-emb-nonhw.config
>> +++ b/arch/powerpc/configs/fsl-emb-nonhw.config
>> @@ -41,7 +41,7 @@ CONFIG_FIXED_PHY=y
>>  CONFIG_FONT_8x16=y
>>  CONFIG_FONT_8x8=y
>>  CONFIG_FONTS=y
>> -CONFIG_FORCE_MAX_ZONEORDER=13
>> +CONFIG_ARCH_FORCE_MAX_ORDER=13
>>  CONFIG_FRAMEBUFFER_CONSOLE=y
>>  CONFIG_FRAME_WARN=1024
>>  CONFIG_FTL=y
>> diff --git a/arch/sh/configs/ecovec24_defconfig b/arch/sh/configs/ecovec24_defconfig
>> index e699e2e04128..b52e14ccb450 100644
>> --- a/arch/sh/configs/ecovec24_defconfig
>> +++ b/arch/sh/configs/ecovec24_defconfig
>> @@ -8,7 +8,7 @@ CONFIG_MODULES=y
>>  CONFIG_MODULE_UNLOAD=y
>>  # CONFIG_BLK_DEV_BSG is not set
>>  CONFIG_CPU_SUBTYPE_SH7724=y
>> -CONFIG_FORCE_MAX_ZONEORDER=12
>> +CONFIG_ARCH_FORCE_MAX_ORDER=12
>>  CONFIG_MEMORY_SIZE=0x10000000
>>  CONFIG_FLATMEM_MANUAL=y
>>  CONFIG_SH_ECOVEC=y
>> diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
>> index ba569cfb4368..411fdc0901f7 100644
>> --- a/arch/sh/mm/Kconfig
>> +++ b/arch/sh/mm/Kconfig
>> @@ -18,7 +18,7 @@ config PAGE_OFFSET
>>  	default "0x80000000" if MMU
>>  	default "0x00000000"
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	range 9 64 if PAGE_SIZE_16KB
>>  	default "9" if PAGE_SIZE_16KB
>> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
>> index 1c852bb530ec..4d3d1af90d52 100644
>> --- a/arch/sparc/Kconfig
>> +++ b/arch/sparc/Kconfig
>> @@ -269,7 +269,7 @@ config ARCH_SPARSEMEM_ENABLE
>>  config ARCH_SPARSEMEM_DEFAULT
>>  	def_bool y if SPARC64
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	default "13"
>>  	help
>> diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
>> index 12ac277282ba..bcb0c5d2abc2 100644
>> --- a/arch/xtensa/Kconfig
>> +++ b/arch/xtensa/Kconfig
>> @@ -771,7 +771,7 @@ config HIGHMEM
>>
>>  	  If unsure, say Y.
>>
>> -config FORCE_MAX_ZONEORDER
>> +config ARCH_FORCE_MAX_ORDER
>>  	int "Maximum zone order"
>>  	default "11"
>>  	help
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 8f571dc7c524..ca285ed3c6e0 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -24,10 +24,10 @@
>>  #include <asm/page.h>
>>
>>  /* Free memory management - zoned buddy allocator.  */
>> -#ifndef CONFIG_FORCE_MAX_ZONEORDER
>> +#ifndef CONFIG_ARCH_FORCE_MAX_ORDER
>>  #define MAX_ORDER 11
>>  #else
>> -#define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
>> +#define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER
>>  #endif
>>  #define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
>>
>> -- 
>> 2.35.1
>>
>
> -- 
> Sincerely yours,
> Mike.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 06/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore()
  2022-08-11 23:16 ` [RFC PATCH v2 06/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore() Zi Yan
@ 2022-08-23 10:36   ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2022-08-23 10:36 UTC (permalink / raw)
  To: Zi Yan, linux-mm
  Cc: Matthew Wilcox, Vlastimil Babka, Kirill A . Shutemov,
	Mike Kravetz, John Hubbard, Yang Shi, David Rientjes,
	James Houghton, Mike Rapoport, linux-kernel

On 12.08.22 01:16, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> MAX_ORDER_NR_PAGES can be increased when it becomes a boot time parameter
> in later commits. To make sure read_kcore() reschedule its work in a
> constant period, use pageblock_nr_pages instead for reschedule period,
> since pageblock_nr_pages is a constant and either the same or half of
> MAX_ORDER_NR_PAGES.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Ying Chen <chenying.kernel@bytedance.com>
> Cc: Feng Zhou <zhoufeng.zf@bytedance.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  fs/proc/kcore.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index dff921f7ca33..7dc09d211b48 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -491,7 +491,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  			}
>  		}
>  
> -		if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) {
> +		if (page_offline_frozen++ % pageblock_nr_pages == 0) {
>  			page_offline_thaw();
>  			cond_resched();
>  			page_offline_freeze();

Yeah, the exact number doesn't actually matter here.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-08-23 13:36 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-11 23:16 [RFC PATCH v2 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 01/12] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER Zi Yan
2022-08-13 15:36   ` Mike Rapoport
2022-08-15 12:53     ` Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 02/12] mm: rectify MAX_ORDER semantics to be the largest page order from buddy allocator Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 03/12] mm: replace MAX_ORDER when it is used to indicate max physical contiguity Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 04/12] mm: adapt deferred struct page init to new MAX_ORDER Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 05/12] mm: prevent pageblock size being larger than section size Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 06/12] fs: proc: use pageblock_nr_pages for reschedule period in read_kcore() Zi Yan
2022-08-23 10:36   ` David Hildenbrand
2022-08-11 23:16 ` [RFC PATCH v2 07/12] virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 08/12] mm/page_reporting: set page_reporting_order to -1 to prevent it running Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 09/12] mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER Zi Yan
2022-08-13  1:11   ` Randy Dunlap
2022-08-13  2:37     ` Zi Yan
2022-08-13  2:40       ` Randy Dunlap
2022-08-11 23:16 ` [RFC PATCH v2 10/12] mm: convert MAX_ORDER sized static arrays to dynamic ones Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 11/12] mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant Zi Yan
2022-08-11 23:16 ` [RFC PATCH v2 12/12] mm: make MAX_ORDER a kernel boot time parameter Zi Yan
2022-08-13  1:11   ` Randy Dunlap
2022-08-13  2:38     ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).