On 1 Mar 2021, at 20:59, Roman Gushchin wrote:

> On Wed, Feb 24, 2021 at 05:35:36PM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Hi all,
>>
>> I have rebased my 1GB PUD THP support patches on v5.11-mmotm-2021-02-18-18-29
>> and the code is available at
>> https://github.com/x-y-z/linux-1gb-thp/tree/1gb_thp_v5.11-mmotm-2021-02-18-18-29
>> if you want to give it a try. The actual 49 patches are not sent out with this
>> cover letter. :)
>>
>> Instead of asking for code review, I would like to discuss on the concerns I got
>> from previous RFCs. I think there are two major ones:
>>
>> 1. 1GB page allocation. Current implementation allocates 1GB pages from CMA
>>    regions that are reserved at boot time like hugetlbfs. The concerns on
>>    using CMA is that an educated guess is needed to avoid depleting kernel
>>    memory in case CMA regions are set too large. Recently David Rientjes
>>    proposes to use process_madvise() for hugepage collapse, which is an
>>    alternative [1] but might not work for 1GB pages, since there is no way of
>>    _allocating_ a 1GB page to which collapse pages. I proposed a similar
>>    approach at LSF/MM 2019, generating physically contiguous memory after pages
>>    are allocated [2], which is usable for 1GB THPs. This approach does in-place
>>    huge page promotion thus does not require page allocation.
>
> Well, I don't think there an alternative to cma as now. When the memory is almost
> filled at least once, any subsequent activity leading to substantial slab allocations
> (e.g. run git gc) will fragment the memory, so that there are virtually no chances
> to find a continuous GB.
>
> It's possible in theory to reduce the fragmentation on 1GB scale by grouping
> non-movable pageblocks, but it seems a separate project.

My experiments showed that finding continuous GBs is possible, but I agree that
CMA is more reliable and 1GB scale defragmentation should be a separate project.


—
Best Regards,
Yan Zi