linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 10/10] mm, treewide: Redefine MAX_ORDER sanely
Date: Fri, 17 Mar 2023 09:17:59 -0400	[thread overview]
Message-ID: <A37CA8BD-F6BD-4EDF-8F47-7FC81037B867@nvidia.com> (raw)
In-Reply-To: <20230316232144.b7ic4cif4kjiabws@box.shutemov.name>

[-- Attachment #1: Type: text/plain, Size: 11422 bytes --]

On 16 Mar 2023, at 19:21, Kirill A. Shutemov wrote:

> On Thu, Mar 16, 2023 at 01:09:30PM -0400, Zi Yan wrote:
>>> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
>>> index 86fd88492870..c267b8c61e97 100644
>>> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
>>> +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
>>> @@ -172,7 +172,7 @@ variables.
>>>  Offset of the free_list's member. This value is used to compute the number
>>>  of free pages.
>>>
>>> -Each zone has a free_area structure array called free_area[MAX_ORDER].
>>> +Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
>>>  The free_list represents a linked list of free page blocks.
>>>
>>>  (list_head, next|prev)
>>
>> In vmcoreinfo.rst, line 192:
>>
>> - (zone.free_area, MAX_ORDER)
>> + (zone.free_area, MAX_ORDER + 1)
>
> Okay.
>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 6221a1d057dd..50da4f26fad5 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -3969,7 +3969,7 @@
>>>  			[KNL] Minimal page reporting order
>>>  			Format: <integer>
>>>  			Adjust the minimal page reporting order. The page
>>> -			reporting is disabled when it exceeds (MAX_ORDER-1).
>>> +			reporting is disabled when it exceeds MAX_ORDER.
>>>
>>>  	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
>>>  			timeout > 0: seconds before rebooting
>>
>> line 942:
>> -		    possible value is MAX_ORDER/2.  Setting this parameter
>> +			possible value is (MAX_ORDER + 1)/2.  Setting this parameter
>>
>
> I don't think it worth it. See below, on the relevant code change.
>
>>> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
>>> index d6bbdb7830b2..273a0fe7910a 100644
>>> --- a/kernel/events/ring_buffer.c
>>> +++ b/kernel/events/ring_buffer.c
>>> @@ -609,8 +609,8 @@ static struct page *rb_alloc_aux_page(int node, int order)
>>>  {
>>>  	struct page *page;
>>>
>>> -	if (order >= MAX_ORDER)
>>> -		order = MAX_ORDER - 1;
>>> +	if (order > MAX_ORDER)
>>> +		order = MAX_ORDER;
>>>
>>>  	do {
>>>  		page = alloc_pages_node(node, PERF_AUX_GFP, order);
>>
>> line 817:
>>
>> -	if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
>> +	if (order_base_2(size) > PAGE_SHIFT+MAX_ORDER)
>
> Right.
>
>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>> index 4751031f3f05..fc059969d7ba 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -346,9 +346,9 @@ config SHUFFLE_PAGE_ALLOCATOR
>>>  	  the presence of a memory-side-cache. There are also incidental
>>>  	  security benefits as it reduces the predictability of page
>>>  	  allocations to compliment SLAB_FREELIST_RANDOM, but the
>>> -	  default granularity of shuffling on the "MAX_ORDER - 1" i.e,
>>> -	  10th order of pages is selected based on cache utilization
>>> -	  benefits on x86.
>>> +	  default granularity of shuffling on the MAX_ORDER i.e, 10th
>>> +	  order of pages is selected based on cache utilization benefits
>>> +	  on x86.
>>>
>>>  	  While the randomization improves cache utilization it may
>>>  	  negatively impact workloads on platforms without a cache. For
>>
>> line 669:
>>
>> -	  Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
>> -	  clamped down to MAX_ORDER - 1.
>> +	  Note that the pageblock_order cannot exceed MAX_ORDER and will be
>> +	  clamped down to MAX_ORDER.
>>
>
> Okay. Missed that.
>
>>> diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
>>> index 7fb794242fad..ffedf4dbc49d 100644
>>> --- a/mm/kmsan/init.c
>>> +++ b/mm/kmsan/init.c
>>> @@ -96,7 +96,7 @@ void __init kmsan_init_shadow(void)
>>>  struct metadata_page_pair {
>>>  	struct page *shadow, *origin;
>>>  };
>>> -static struct metadata_page_pair held_back[MAX_ORDER] __initdata;
>>> +static struct metadata_page_pair held_back[MAX_ORDER + 1] __initdata;
>>>
>>>  /*
>>>   * Eager metadata allocation. When the memblock allocator is freeing pages to
>>
>> line 144: this one I am not sure if the original code is wrong or not.
>>
>> -	.order = MAX_ORDER,
>> +	.order = MAX_ORDER + 1,
>
> I think the original code is wrong, but the initialization seems unused:
> it got overridden in kmsan_memblock_discard() before the first use.
>
>>> @@ -211,8 +211,8 @@ static void kmsan_memblock_discard(void)
>>>  	 *    order=N-1,
>>>  	 *  - repeat.
>>>  	 */
>>> -	collect.order = MAX_ORDER - 1;
>>> -	for (int i = MAX_ORDER - 1; i >= 0; i--) {
>>> +	collect.order = MAX_ORDER;
>>> +	for (int i = MAX_ORDER; i >= 0; i--) {
>>>  		if (held_back[i].shadow)
>>>  			smallstack_push(&collect, held_back[i].shadow);
>>>  		if (held_back[i].origin)
>>> diff --git a/mm/memblock.c b/mm/memblock.c
>>> index 25fd0626a9e7..338b8cb0793e 100644
>>> --- a/mm/memblock.c
>>> +++ b/mm/memblock.c
>>> @@ -2043,7 +2043,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
>>>  	int order;
>>>
>>>  	while (start < end) {
>>> -		order = min(MAX_ORDER - 1UL, __ffs(start));
>>> +		order = min(MAX_ORDER, __ffs(start));
>>
>> while you are here, maybe using min_t is better.
>>
>> order = min_t(unsigned long, MAX_ORDER, __ffs(start));
>
> Already addressed by fixup.
>
>>>
>>>  		while (start + (1UL << order) > end)
>>>  			order--;
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index db3b270254f1..86291c79a764 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -596,7 +596,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
>>>  	unsigned long pfn;
>>>
>>>  	/*
>>> -	 * Online the pages in MAX_ORDER - 1 aligned chunks. The callback might
>>> +	 * Online the pages in MAX_ORDER aligned chunks. The callback might
>>>  	 * decide to not expose all pages to the buddy (e.g., expose them
>>>  	 * later). We account all pages as being online and belonging to this
>>>  	 * zone ("present").
>>> @@ -605,7 +605,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
>>>  	 * this and the first chunk to online will be pageblock_nr_pages.
>>>  	 */
>>>  	for (pfn = start_pfn; pfn < end_pfn;) {
>>> -		int order = min(MAX_ORDER - 1UL, __ffs(pfn));
>>> +		int order = min(MAX_ORDER, __ffs(pfn));
>>
>> ditto
>>
>> int order = min_t(unsigned long, MAX_ORDER, __ffs(pfn));
>
> Ditto.
>
>>>
>>>  		(*online_page_callback)(pfn_to_page(pfn), order);
>>>  		pfn += (1UL << order);
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ac1fc986af44..66700f27b4c6 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>
>> line 842: it might make a difference when MAX_ORDER is odd.
>>
>> -	if (kstrtoul(buf, 10, &res) < 0 ||  res > MAX_ORDER / 2) {
>> +	if (kstrtoul(buf, 10, &res) < 0 ||  res > (MAX_ORDER + 1) / 2) {
>
> I don't think it worth the complication: the upper limit here is pretty
> arbitrary and +1 doesn't really make a difference. I would rather keep it
> simple.
>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index 32eb6b50fe18..0e19c0d647e6 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -4171,8 +4171,8 @@ static inline int calculate_order(unsigned int size)
>>>  	/*
>>>  	 * Doh this slab cannot be placed using slub_max_order.
>>>  	 */
>>> -	order = calc_slab_order(size, 1, MAX_ORDER - 1, 1);
>>> -	if (order < MAX_ORDER)
>>> +	order = calc_slab_order(size, 1, MAX_ORDER, 1);
>>> +	if (order <= MAX_ORDER)
>>>  		return order;
>>>  	return -ENOSYS;
>>>  }
>>> @@ -4697,7 +4697,7 @@ __setup("slub_min_order=", setup_slub_min_order);
>>>  static int __init setup_slub_max_order(char *str)
>>>  {
>>>  	get_option(&str, (int *)&slub_max_order);
>>> -	slub_max_order = min(slub_max_order, (unsigned int)MAX_ORDER - 1);
>>> +	slub_max_order = min(slub_max_order, (unsigned int)MAX_ORDER);
>>
>> maybe min_t is better?
>>
>> slub_max_order = min_t(unsigned int, slub_max_order, MAX_ORDER);
>
> Fair enough.
>
> ...
>
>> The changes look good to me. I added some missing changes inline, although the line
>> number might not be exact. Feel free to add Reviewed-by: Zi Yan <ziy@nvidia.com>.
>>
>> Do you think it is worth adding a MAX_ORDER check in checkpatch.pl to warn people
>> the meaning of MAX_ORDER has changed? Something like:
>>
>> # check for MAX_ORDER uses as its semantics has changed.
>> # MAX_ORDER now really means the max order of a page that can come out of
>> # kernel buddy allocator
>>         if ($line =~ /MAX_ORDER/) {
>>             WARN("MAX_ORDER",
>>                  "MAX_ORDER has changed its semantics. The max order of a page that can be allocated from buddy allocator is MAX_ORDER instead of MAX_ORDER - 1.")
>>         }
>>
>
> We can add, if you think it is helpful. I don't feel strongly about this.
>
> Below is fixup I made based on your feedback:
>
> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> index c267b8c61e97..e488bb4e13c4 100644
> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> @@ -189,7 +189,7 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
>  information. Makedumpfile gets the start address of the vmalloc region
>  from this.
>
> -(zone.free_area, MAX_ORDER)
> +(zone.free_area, MAX_ORDER + 1)
>  ---------------------------
>
>  Free areas descriptor. User-space tools use this value to iterate the
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 273a0fe7910a..a0433f37b024 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -814,7 +814,7 @@ struct perf_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
>  	size = sizeof(struct perf_buffer);
>  	size += nr_pages * sizeof(void *);
>
> -	if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
> +	if (order_base_2(size) > PAGE_SHIFT+MAX_ORDER)
>  		goto fail;
>
>  	node = (cpu == -1) ? cpu : cpu_to_node(cpu);
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 467844de48e5..6ee3b48ed298 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -666,8 +666,8 @@ config HUGETLB_PAGE_SIZE_VARIABLE
>  	  HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available
>  	  on a platform.
>
> -	  Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
> -	  clamped down to MAX_ORDER - 1.
> +	  Note that the pageblock_order cannot exceed MAX_ORDER and will be
> +	  clamped down to MAX_ORDER.
>
>  config CONTIG_ALLOC
>  	def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
> diff --git a/mm/slub.c b/mm/slub.c
> index 0e19c0d647e6..f49d669ff604 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4697,7 +4697,7 @@ __setup("slub_min_order=", setup_slub_min_order);
>  static int __init setup_slub_max_order(char *str)
>  {
>  	get_option(&str, (int *)&slub_max_order);
> -	slub_max_order = min(slub_max_order, (unsigned int)MAX_ORDER);
> +	slub_max_order = min_t(unsigned int, slub_max_order, MAX_ORDER);
>
>  	return 1;
>  }
> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov

LGTM. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  reply	other threads:[~2023-03-17 13:18 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-15 11:31 [PATCH 00/10] Fix confusion around MAX_ORDER Kirill A. Shutemov
2023-03-15 11:31 ` [PATCH 01/10] sparc/mm: Fix MAX_ORDER usage in tsb_grow() Kirill A. Shutemov
2023-03-16  3:04   ` Mike Kravetz
2023-03-17  8:46     ` Mike Rapoport
2023-03-17  8:35   ` Mike Rapoport
2023-03-21  7:48   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 02/10] um: Fix MAX_ORDER usage in linux_main() Kirill A. Shutemov
2023-03-17  8:49   ` Mike Rapoport
2023-03-21  7:49   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 03/10] floppy: Fix MAX_ORDER usage Kirill A. Shutemov
2023-03-17  8:53   ` Mike Rapoport
2023-03-21  7:53   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 04/10] drm/i915: Fix MAX_ORDER usage in i915_gem_object_get_pages_internal() Kirill A. Shutemov
2023-03-15 14:18   ` Tvrtko Ursulin
2023-03-15 15:28     ` Kirill A. Shutemov
2023-03-15 15:35       ` Tvrtko Ursulin
2023-03-15 15:38         ` Kirill A. Shutemov
2023-03-16  8:55           ` Tvrtko Ursulin
2023-03-21  7:57             ` Vlastimil Babka
2023-03-21  7:55   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 05/10] genwqe: Fix MAX_ORDER usage Kirill A. Shutemov
2023-03-21  7:59   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 06/10] perf/core: Fix MAX_ORDER usage in rb_alloc_aux_page() Kirill A. Shutemov
2023-03-21  8:00   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 07/10] mm/page_reporting: Fix MAX_ORDER usage in page_reporting_register() Kirill A. Shutemov
2023-03-21  8:01   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 08/10] mm/slub: Fix MAX_ORDER usage in calculate_order() Kirill A. Shutemov
2023-03-16 11:18   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 09/10] iommu: Fix MAX_ORDER usage in __iommu_dma_alloc_pages() Kirill A. Shutemov
2023-03-15 12:18   ` Robin Murphy
2023-03-22 12:20     ` Joerg Roedel
2023-03-15 16:07   ` Jacob Pan
2023-03-21  8:05   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 10/10] mm, treewide: Redefine MAX_ORDER sanely Kirill A. Shutemov
2023-03-15 15:06   ` kernel test robot
2023-03-15 15:26     ` Kirill A. Shutemov
2023-03-15 15:26   ` kernel test robot
2023-03-15 15:38     ` Kirill A. Shutemov
2023-03-16 17:09   ` Zi Yan
2023-03-16 23:21     ` Kirill A. Shutemov
2023-03-17 13:17       ` Zi Yan [this message]
2023-03-16 18:15   ` Mike Kravetz
2023-03-16 23:00     ` Mike Kravetz
2023-03-16 23:30     ` Kirill A. Shutemov
2023-03-17  1:57       ` Vineet Gupta
2023-03-21 11:22   ` Vlastimil Babka
2023-03-22  3:26   ` Michael Ellerman
2023-03-21 16:38 ` [PATCH 00/10] Fix confusion around MAX_ORDER Mel Gorman
2023-03-23 15:03   ` David Laight
2023-09-27 17:11 ` Paolo Bonzini
2023-09-28  7:50   ` Mikulas Patocka
2023-09-28 16:57     ` Paolo Bonzini
2023-10-17 10:46       ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A37CA8BD-F6BD-4EDF-8F47-7FC81037B867@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).