All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 10/10] mm, treewide: Redefine MAX_ORDER sanely
Date: Fri, 17 Mar 2023 09:17:59 -0400	[thread overview]
Message-ID: <A37CA8BD-F6BD-4EDF-8F47-7FC81037B867@nvidia.com> (raw)
In-Reply-To: <20230316232144.b7ic4cif4kjiabws@box.shutemov.name>

[-- Attachment #1: Type: text/plain, Size: 11422 bytes --]

On 16 Mar 2023, at 19:21, Kirill A. Shutemov wrote:

> On Thu, Mar 16, 2023 at 01:09:30PM -0400, Zi Yan wrote:
>>> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
>>> index 86fd88492870..c267b8c61e97 100644
>>> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
>>> +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
>>> @@ -172,7 +172,7 @@ variables.
>>>  Offset of the free_list's member. This value is used to compute the number
>>>  of free pages.
>>>
>>> -Each zone has a free_area structure array called free_area[MAX_ORDER].
>>> +Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
>>>  The free_list represents a linked list of free page blocks.
>>>
>>>  (list_head, next|prev)
>>
>> In vmcoreinfo.rst, line 192:
>>
>> - (zone.free_area, MAX_ORDER)
>> + (zone.free_area, MAX_ORDER + 1)
>
> Okay.
>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 6221a1d057dd..50da4f26fad5 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -3969,7 +3969,7 @@
>>>  			[KNL] Minimal page reporting order
>>>  			Format: <integer>
>>>  			Adjust the minimal page reporting order. The page
>>> -			reporting is disabled when it exceeds (MAX_ORDER-1).
>>> +			reporting is disabled when it exceeds MAX_ORDER.
>>>
>>>  	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
>>>  			timeout > 0: seconds before rebooting
>>
>> line 942:
>> -		    possible value is MAX_ORDER/2.  Setting this parameter
>> +			possible value is (MAX_ORDER + 1)/2.  Setting this parameter
>>
>
> I don't think it worth it. See below, on the relevant code change.
>
>>> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
>>> index d6bbdb7830b2..273a0fe7910a 100644
>>> --- a/kernel/events/ring_buffer.c
>>> +++ b/kernel/events/ring_buffer.c
>>> @@ -609,8 +609,8 @@ static struct page *rb_alloc_aux_page(int node, int order)
>>>  {
>>>  	struct page *page;
>>>
>>> -	if (order >= MAX_ORDER)
>>> -		order = MAX_ORDER - 1;
>>> +	if (order > MAX_ORDER)
>>> +		order = MAX_ORDER;
>>>
>>>  	do {
>>>  		page = alloc_pages_node(node, PERF_AUX_GFP, order);
>>
>> line 817:
>>
>> -	if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
>> +	if (order_base_2(size) > PAGE_SHIFT+MAX_ORDER)
>
> Right.
>
>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>> index 4751031f3f05..fc059969d7ba 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -346,9 +346,9 @@ config SHUFFLE_PAGE_ALLOCATOR
>>>  	  the presence of a memory-side-cache. There are also incidental
>>>  	  security benefits as it reduces the predictability of page
>>>  	  allocations to compliment SLAB_FREELIST_RANDOM, but the
>>> -	  default granularity of shuffling on the "MAX_ORDER - 1" i.e,
>>> -	  10th order of pages is selected based on cache utilization
>>> -	  benefits on x86.
>>> +	  default granularity of shuffling on the MAX_ORDER i.e, 10th
>>> +	  order of pages is selected based on cache utilization benefits
>>> +	  on x86.
>>>
>>>  	  While the randomization improves cache utilization it may
>>>  	  negatively impact workloads on platforms without a cache. For
>>
>> line 669:
>>
>> -	  Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
>> -	  clamped down to MAX_ORDER - 1.
>> +	  Note that the pageblock_order cannot exceed MAX_ORDER and will be
>> +	  clamped down to MAX_ORDER.
>>
>
> Okay. Missed that.
>
>>> diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
>>> index 7fb794242fad..ffedf4dbc49d 100644
>>> --- a/mm/kmsan/init.c
>>> +++ b/mm/kmsan/init.c
>>> @@ -96,7 +96,7 @@ void __init kmsan_init_shadow(void)
>>>  struct metadata_page_pair {
>>>  	struct page *shadow, *origin;
>>>  };
>>> -static struct metadata_page_pair held_back[MAX_ORDER] __initdata;
>>> +static struct metadata_page_pair held_back[MAX_ORDER + 1] __initdata;
>>>
>>>  /*
>>>   * Eager metadata allocation. When the memblock allocator is freeing pages to
>>
>> line 144: this one I am not sure if the original code is wrong or not.
>>
>> -	.order = MAX_ORDER,
>> +	.order = MAX_ORDER + 1,
>
> I think the original code is wrong, but the initialization seems unused:
> it got overridden in kmsan_memblock_discard() before the first use.
>
>>> @@ -211,8 +211,8 @@ static void kmsan_memblock_discard(void)
>>>  	 *    order=N-1,
>>>  	 *  - repeat.
>>>  	 */
>>> -	collect.order = MAX_ORDER - 1;
>>> -	for (int i = MAX_ORDER - 1; i >= 0; i--) {
>>> +	collect.order = MAX_ORDER;
>>> +	for (int i = MAX_ORDER; i >= 0; i--) {
>>>  		if (held_back[i].shadow)
>>>  			smallstack_push(&collect, held_back[i].shadow);
>>>  		if (held_back[i].origin)
>>> diff --git a/mm/memblock.c b/mm/memblock.c
>>> index 25fd0626a9e7..338b8cb0793e 100644
>>> --- a/mm/memblock.c
>>> +++ b/mm/memblock.c
>>> @@ -2043,7 +2043,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
>>>  	int order;
>>>
>>>  	while (start < end) {
>>> -		order = min(MAX_ORDER - 1UL, __ffs(start));
>>> +		order = min(MAX_ORDER, __ffs(start));
>>
>> while you are here, maybe using min_t is better.
>>
>> order = min_t(unsigned long, MAX_ORDER, __ffs(start));
>
> Already addressed by fixup.
>
>>>
>>>  		while (start + (1UL << order) > end)
>>>  			order--;
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index db3b270254f1..86291c79a764 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -596,7 +596,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
>>>  	unsigned long pfn;
>>>
>>>  	/*
>>> -	 * Online the pages in MAX_ORDER - 1 aligned chunks. The callback might
>>> +	 * Online the pages in MAX_ORDER aligned chunks. The callback might
>>>  	 * decide to not expose all pages to the buddy (e.g., expose them
>>>  	 * later). We account all pages as being online and belonging to this
>>>  	 * zone ("present").
>>> @@ -605,7 +605,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
>>>  	 * this and the first chunk to online will be pageblock_nr_pages.
>>>  	 */
>>>  	for (pfn = start_pfn; pfn < end_pfn;) {
>>> -		int order = min(MAX_ORDER - 1UL, __ffs(pfn));
>>> +		int order = min(MAX_ORDER, __ffs(pfn));
>>
>> ditto
>>
>> int order = min_t(unsigned long, MAX_ORDER, __ffs(pfn));
>
> Ditto.
>
>>>
>>>  		(*online_page_callback)(pfn_to_page(pfn), order);
>>>  		pfn += (1UL << order);
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ac1fc986af44..66700f27b4c6 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>
>> line 842: it might make a difference when MAX_ORDER is odd.
>>
>> -	if (kstrtoul(buf, 10, &res) < 0 ||  res > MAX_ORDER / 2) {
>> +	if (kstrtoul(buf, 10, &res) < 0 ||  res > (MAX_ORDER + 1) / 2) {
>
> I don't think it worth the complication: the upper limit here is pretty
> arbitrary and +1 doesn't really make a difference. I would rather keep it
> simple.
>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index 32eb6b50fe18..0e19c0d647e6 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -4171,8 +4171,8 @@ static inline int calculate_order(unsigned int size)
>>>  	/*
>>>  	 * Doh this slab cannot be placed using slub_max_order.
>>>  	 */
>>> -	order = calc_slab_order(size, 1, MAX_ORDER - 1, 1);
>>> -	if (order < MAX_ORDER)
>>> +	order = calc_slab_order(size, 1, MAX_ORDER, 1);
>>> +	if (order <= MAX_ORDER)
>>>  		return order;
>>>  	return -ENOSYS;
>>>  }
>>> @@ -4697,7 +4697,7 @@ __setup("slub_min_order=", setup_slub_min_order);
>>>  static int __init setup_slub_max_order(char *str)
>>>  {
>>>  	get_option(&str, (int *)&slub_max_order);
>>> -	slub_max_order = min(slub_max_order, (unsigned int)MAX_ORDER - 1);
>>> +	slub_max_order = min(slub_max_order, (unsigned int)MAX_ORDER);
>>
>> maybe min_t is better?
>>
>> slub_max_order = min_t(unsigned int, slub_max_order, MAX_ORDER);
>
> Fair enough.
>
> ...
>
>> The changes look good to me. I added some missing changes inline, although the line
>> number might not be exact. Feel free to add Reviewed-by: Zi Yan <ziy@nvidia.com>.
>>
>> Do you think it is worth adding a MAX_ORDER check in checkpatch.pl to warn people
>> the meaning of MAX_ORDER has changed? Something like:
>>
>> # check for MAX_ORDER uses as its semantics has changed.
>> # MAX_ORDER now really means the max order of a page that can come out of
>> # kernel buddy allocator
>>         if ($line =~ /MAX_ORDER/) {
>>             WARN("MAX_ORDER",
>>                  "MAX_ORDER has changed its semantics. The max order of a page that can be allocated from buddy allocator is MAX_ORDER instead of MAX_ORDER - 1.")
>>         }
>>
>
> We can add, if you think it is helpful. I don't feel strongly about this.
>
> Below is fixup I made based on your feedback:
>
> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> index c267b8c61e97..e488bb4e13c4 100644
> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> @@ -189,7 +189,7 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
>  information. Makedumpfile gets the start address of the vmalloc region
>  from this.
>
> -(zone.free_area, MAX_ORDER)
> +(zone.free_area, MAX_ORDER + 1)
>  ---------------------------
>
>  Free areas descriptor. User-space tools use this value to iterate the
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 273a0fe7910a..a0433f37b024 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -814,7 +814,7 @@ struct perf_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
>  	size = sizeof(struct perf_buffer);
>  	size += nr_pages * sizeof(void *);
>
> -	if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
> +	if (order_base_2(size) > PAGE_SHIFT+MAX_ORDER)
>  		goto fail;
>
>  	node = (cpu == -1) ? cpu : cpu_to_node(cpu);
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 467844de48e5..6ee3b48ed298 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -666,8 +666,8 @@ config HUGETLB_PAGE_SIZE_VARIABLE
>  	  HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available
>  	  on a platform.
>
> -	  Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
> -	  clamped down to MAX_ORDER - 1.
> +	  Note that the pageblock_order cannot exceed MAX_ORDER and will be
> +	  clamped down to MAX_ORDER.
>
>  config CONTIG_ALLOC
>  	def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
> diff --git a/mm/slub.c b/mm/slub.c
> index 0e19c0d647e6..f49d669ff604 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4697,7 +4697,7 @@ __setup("slub_min_order=", setup_slub_min_order);
>  static int __init setup_slub_max_order(char *str)
>  {
>  	get_option(&str, (int *)&slub_max_order);
> -	slub_max_order = min(slub_max_order, (unsigned int)MAX_ORDER);
> +	slub_max_order = min_t(unsigned int, slub_max_order, MAX_ORDER);
>
>  	return 1;
>  }
> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov

LGTM. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  reply	other threads:[~2023-03-17 13:18 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-15 11:31 [PATCH 00/10] Fix confusion around MAX_ORDER Kirill A. Shutemov
2023-03-15 11:31 ` [PATCH 01/10] sparc/mm: Fix MAX_ORDER usage in tsb_grow() Kirill A. Shutemov
2023-03-16  3:04   ` Mike Kravetz
2023-03-17  8:46     ` Mike Rapoport
2023-03-17  8:35   ` Mike Rapoport
2023-03-21  7:48   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 02/10] um: Fix MAX_ORDER usage in linux_main() Kirill A. Shutemov
2023-03-17  8:49   ` Mike Rapoport
2023-03-21  7:49   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 03/10] floppy: Fix MAX_ORDER usage Kirill A. Shutemov
2023-03-17  8:53   ` Mike Rapoport
2023-03-21  7:53   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 04/10] drm/i915: Fix MAX_ORDER usage in i915_gem_object_get_pages_internal() Kirill A. Shutemov
2023-03-15 14:18   ` Tvrtko Ursulin
2023-03-15 15:28     ` Kirill A. Shutemov
2023-03-15 15:35       ` Tvrtko Ursulin
2023-03-15 15:38         ` Kirill A. Shutemov
2023-03-16  8:55           ` Tvrtko Ursulin
2023-03-21  7:57             ` Vlastimil Babka
2023-03-21  7:55   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 05/10] genwqe: Fix MAX_ORDER usage Kirill A. Shutemov
2023-03-21  7:59   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 06/10] perf/core: Fix MAX_ORDER usage in rb_alloc_aux_page() Kirill A. Shutemov
2023-03-21  8:00   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 07/10] mm/page_reporting: Fix MAX_ORDER usage in page_reporting_register() Kirill A. Shutemov
2023-03-21  8:01   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 08/10] mm/slub: Fix MAX_ORDER usage in calculate_order() Kirill A. Shutemov
2023-03-16 11:18   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 09/10] iommu: Fix MAX_ORDER usage in __iommu_dma_alloc_pages() Kirill A. Shutemov
2023-03-15 12:18   ` Robin Murphy
2023-03-22 12:20     ` Joerg Roedel
2023-03-15 16:07   ` Jacob Pan
2023-03-21  8:05   ` Vlastimil Babka
2023-03-15 11:31 ` [PATCH 10/10] mm, treewide: Redefine MAX_ORDER sanely Kirill A. Shutemov
2023-03-15 15:06   ` kernel test robot
2023-03-15 15:26     ` Kirill A. Shutemov
2023-03-15 15:26   ` kernel test robot
2023-03-15 15:38     ` Kirill A. Shutemov
2023-03-16 17:09   ` Zi Yan
2023-03-16 23:21     ` Kirill A. Shutemov
2023-03-17 13:17       ` Zi Yan [this message]
2023-03-16 18:15   ` Mike Kravetz
2023-03-16 23:00     ` Mike Kravetz
2023-03-16 23:30     ` Kirill A. Shutemov
2023-03-17  1:57       ` Vineet Gupta
2023-03-21 11:22   ` Vlastimil Babka
2023-03-22  3:26   ` Michael Ellerman
2023-03-21 16:38 ` [PATCH 00/10] Fix confusion around MAX_ORDER Mel Gorman
2023-03-23 15:03   ` David Laight
2023-09-27 17:11 ` Paolo Bonzini
2023-09-27 17:11   ` [dm-devel] " Paolo Bonzini
2023-09-28  7:50   ` Mikulas Patocka
2023-09-28  7:50     ` [dm-devel] " Mikulas Patocka
2023-09-28 16:57     ` Paolo Bonzini
2023-09-28 16:57       ` [dm-devel] " Paolo Bonzini
2023-10-17 10:46       ` Pavel Machek
2023-10-17 10:46         ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A37CA8BD-F6BD-4EDF-8F47-7FC81037B867@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.