Re: [RFC PATCH 13/15] mm: convert MAX_ORDER sized static arrays to dynamic ones.

From: Zi Yan <ziy@nvidia.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Michal Hocko <mhocko@kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org, Dave Young <dyoung@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>, David Airlie <airlied@linux.ie>,
	kexec@lists.infradead.org, linux-doc@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [RFC PATCH 13/15] mm: convert MAX_ORDER sized static arrays to dynamic ones.
Date: Fri, 06 Aug 2021 10:00:29 -0400	[thread overview]
Message-ID: <81C3245C-6A47-4C1B-B392-CC08FBF15712@nvidia.com> (raw)
In-Reply-To: <bdec12bd-9188-9f3e-c442-aa33e25303a6@amd.com>

[-- Attachment #1: Type: text/plain, Size: 3309 bytes --]

On 6 Aug 2021, at 5:37, Christian König wrote:

> Am 05.08.21 um 21:58 schrieb Zi Yan:
>> On 5 Aug 2021, at 15:16, Christian König wrote:
>>
>>> Am 05.08.21 um 21:02 schrieb Zi Yan:
>>>> From: Zi Yan <ziy@nvidia.com>
>>>>
>>>> This prepares for the upcoming changes to make MAX_ORDER a boot time
>>>> parameter instead of compilation time constant. All static arrays with
>>>> MAX_ORDER size are converted to pointers and their memory is allocated
>>>> at runtime.
>>> Well in general I strongly suggest to not use the patter kmalloc(sizeof(some struct) * MAX_ORDER,...) instead use kmalloc_array, kcalloc etc..
>>>
>>> Then when a array is embedded at the end of a structure you can use a trailing array and the struct_size() macro to determine the allocation size.
>> Sure. Will fix it.
>>
>>> Additional to that separating the patch into changes for TTM to make the maximum allocation order independent from MAX_ORDER would be rather good to have I think.
>> Can you elaborate a little bit more on “make the maximum allocation order independent from MAX_ORDER”?
>
> My idea was that you have a single patch to give the maximum order as parameter to the TTM pool.

Got it. No problem.
>
>>  From what I understand of ttm_pool_alloc(), it tries to get num_pages pages by allocating as large pages as possible, starting from MAX_ORDER. What is the rationale behind this algorithm? Why not just call alloc_page(order=0) num_pages times?
>
> What we do here is essentially transparent huge pages for GPUs.
>
> In opposite to CPU which can usually only use fixed sizes like 4KiB, 2MiB, 1GiB at least AMD GPUs can use any power of two.

FYI, Matthew Wilcox’s memory folio patchset adds support for arbitrary THP sizes[1]. You might want to check it out. :)

>
> So it makes sense to allocate big pages as much as possible and only fallback to 4KiB pages when necessary.
>
> Down side is that we potentially exhaust larger orders for other use cases.
>
>> Is it mean to reduce the number of calls to alloc_page?
>
> That is a nice to have side effect, but the performance improvement for the TLB is the main reason here.
>
>> The allocated pages do not need to get as high as MAX_ORDER, is that the case?
>
> Actually we would really like to have pages as large as 1GiB for best TLB utilization :)
>
>> If yes, I probably can keep ttm pool as static arrays with length of MIN_MAX_ORDER, which I introduce in Patch 14 as the lower bound of boot time parameter MAX_ORDER. Let me know your thoughts.
>
> Well you could do something like MAX_MAX_ORDER with a value of at least 19 (1GiB with 4KiB pages IIRC). And then limit to the real available max_order when you make that a run time option.
>
> Since the array elements consists only of a linked list and a few extra parameters / pointers we probably won't need more than a page or two for those.

Thanks for you explanation. Now I understand that ttm_pool does want pages as large as possible for performance reasons. I will keep the existing code, so that ttm_pool can get the largest pages from buddy allocator. I will separate the changes to TTM to a single patch like you suggested.

[1] https://lore.kernel.org/linux-mm/20210715033704.692967-1-willy@infradead.org

—
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]