Re: [PATCH] mm: hugetlb_vmemmap: provide stronger vmemmap allocaction gurantees

From: Mike Kravetz <mike.kravetz@oracle.com>
To: David Rientjes <rientjes@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, muchun.song@linux.dev,
	souravpanda@google.com, Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] mm: hugetlb_vmemmap: provide stronger vmemmap allocaction gurantees
Date: Wed, 12 Apr 2023 12:57:23 -0700	[thread overview]
Message-ID: <20230412195723.GA4759@monkey> (raw)
In-Reply-To: <63736432-5cef-f67c-c809-cc19b236a7f4@google.com>

On 04/12/23 10:54, David Rientjes wrote:
> On Wed, 12 Apr 2023, Pasha Tatashin wrote:
> 
> > HugeTLB pages have a struct page optimizations where struct pages for tail
> > pages are freed. However, when HugeTLB pages are destroyed, the memory for
> > struct pages (vmemmap) need to be allocated again.
> > 
> > Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> > but given that this flag makes very little effort to actually reclaim
> > memory the returning of huge pages back to the system can be problem. Lets
> > use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> > reclaim without causing ooms, but at least it may perform a few retries,
> > and will fail only when there is genuinely little amount of unused memory
> > in the system.
> > 
> 
> Thanks Pasha, this definitely makes sense.  We want to free the hugetlb 
> page back to the system so it would be a shame to have to strand it in the 
> hugetlb pool because we can't allocate the tail pages (we want to free 
> more memory than we're allocating).

Agree.

The hugetlb vmemmmap freeing series went through more than 20 revisions
before being merged.  One issue with much discussion was the need to
allocate vmemmap pages when hugetlb pages were returned to buddy.

It looks like the current set of GFP flags was suggested here:
https://lore.kernel.org/linux-mm/YC4ji+pMhtOs+KVM@dhcp22.suse.cz/

Although, it was also mentioned that __GFP_RETRY_MAYFAIL could be used
instead of __GFP_NORETRY here:
https://lore.kernel.org/linux-mm/YCafit5ruRJ+SL8I@dhcp22.suse.cz/

Adding Michal on Cc: since these were his suggestions.

> 
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Suggested-by: David Rientjes <rientjes@google.com>
> > ---
> >  mm/hugetlb_vmemmap.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index a559037cce00..c4226d2af7cc 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -475,9 +475,12 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
> >  	 * the range is mapped to the page which @vmemmap_reuse is mapped to.
> >  	 * When a HugeTLB page is freed to the buddy allocator, previously
> >  	 * discarded vmemmap pages must be allocated and remapping.
> > +	 *
> > +	 * Use __GFP_RETRY_MAYFAIL to fail only when there is genuinely little
> > +	 * unused memory in the system.
> >  	 */
> >  	ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse,
> > -				  GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
> > +				  GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE);
> >  	if (!ret) {
> >  		ClearHPageVmemmapOptimized(head);
> >  		static_branch_dec(&hugetlb_optimize_vmemmap_key);
> 
> The behavior of __GFP_RETRY_MAYFAIL is different for high-order memory (at 
> least larger than PAGE_ALLOC_COSTLY_ORDER).  The order that we're 
> allocating would depend on the implementation of alloc_vmemmap_page_list() 
> so likely best to move the gfp mask to that function.

Good point.

-- 
Mike Kravetz