linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, viro@zeniv.linux.org.uk,
	akpm@linux-foundation.org, paulmck@kernel.org,
	mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com,
	rdunlap@infradead.org, oneukum@suse.com,
	anshuman.khandual@arm.com, jroedel@suse.de,
	almasrymina@google.com, rientjes@google.com, willy@infradead.org,
	osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com,
	naoya.horiguchi@nec.com, joao.m.martins@oracle.com
Cc: duanxiongchun@bytedance.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Fri, 12 Feb 2021 15:15:08 +0100	[thread overview]
Message-ID: <2afd12e0-60cd-0f9e-99a8-8ded09644504@redhat.com> (raw)
In-Reply-To: <72e772bc-7103-62da-d834-059eb5a3ce5b@oracle.com>

On 11.02.21 19:05, Mike Kravetz wrote:
> On 2/8/21 12:50 AM, Muchun Song wrote:
>> When we free a HugeTLB page to the buddy allocator, we should allocate the
>> vmemmap pages associated with it. But we may cannot allocate vmemmap pages
>> when the system is under memory pressure, in this case, we just refuse to
>> free the HugeTLB page instead of looping forever trying to allocate the
>> pages.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> ---
>>   include/linux/mm.h   |  2 ++
>>   mm/hugetlb.c         | 19 ++++++++++++-
>>   mm/hugetlb_vmemmap.c | 30 +++++++++++++++++++++
>>   mm/hugetlb_vmemmap.h |  6 +++++
>>   mm/sparse-vmemmap.c  | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   5 files changed, 130 insertions(+), 2 deletions(-)
> 
> Muchun has done a great job simplifying this patch series and addressing
> issues as they are brought up.  This patch addresses the issue which seems
> to be the biggest stumbling block to this series.  The need to allocate
> vmemmap pages to dissolve a hugetlb page to the buddy allocator.  The way
> it is addressed in this patch is to simply fail to dissolve the hugetlb
> page if the vmmemmap pages can not be allocated.  IMO, this is an 'acceptable'
> strategy.  If we find ourselves in this situation then we are likely to be
> hitting other corner cases in the system.  I wish there was a perfect way
> to address this issue, but we have been unable to come up with one.
> 
> There was a decent discussion about this is a previous version of the
> series starting here:
> https://lore.kernel.org/linux-mm/20210126092942.GA10602@linux/
> In this thread various other options were suggested and discussed.
> 
> I would like to come to some agreement on an acceptable way to handle this
> specific issue.  IMO, it makes little sense to continue refining other
> parts of this series if we can not figure out how to move forward on this
> issue.
> 
> It would be great if David H, David R and Michal could share their opinions
> on this.  No need to review details the code yet (unless you want), but
> let's start a discussion on how to move past this issue if we can.

So a summary from my side:

We might fail freeing a huge page at any point in time iff we are low on 
kernel (!CMA, !ZONE_MOVABLE) memory. While we could play games with 
allocating the vmemmap from a huge page itself in some cases (e.g., 
!CMA, !ZONE_MOVABLE), simply retrying is way easier and we don't turn 
the huge page forever unusable.

Corner cases might be having many huge pages in ZONE_MOVABLE, freeing 
them all at once and eating up a lot of kernel memory. But then, the 
same setup would already be problematic nowadays where we simply always 
consume that kernel memory for the vmemmap.

I think this problem only really becomes visible in corner cases. And 
someone actively has to enable new behavior.


1. Failing to free a huge page triggered by the user (decrease nr_pages):

Bad luck. Try again later.

2. Failing to free a surplus huge page when freed by the application:

Bad luck. But who will try again later?

3. Failing to dissolve a free huge page on ZONE_MOVABLE via offline_pages()

This is a bit unfortunate if we have plenty of ZONE_MOVABLE memory but 
are low on kernel memory. For example, migration of huge pages would 
still work, however, dissolving the free page does not work. I'd say 
this is a corner cases. When the system is that much under memory 
pressure, offlining/unplug can be expected to fail.

4. Failing to dissolve a huge page on CMA/ZONE_MOVABLE via 
alloc_contig_range() - once we have that handling in place. Mainly 
affects CMA and virtio-mem.

Similar to 3. However, we didn't even take care of huge pages *at all* 
for now (neither migrate nor dissolve). So actually don't make the 
current state any worse. virito-mem will handle migration errors 
gracefully. CMA might be able to fallback on other free areas within the 
CMA region.


I'd say, document the changed behavior properly so people are aware that 
there might be issues in corner cases with huge pages on CMA / ZONE_MOVABLE.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-02-12 14:15 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:50 [PATCH v15 0/8] Free some vmemmap pages of HugeTLB page Muchun Song
2021-02-08  8:50 ` [PATCH v15 1/8] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-02-08  8:50 ` [PATCH v15 2/8] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-02-08  8:50 ` [PATCH v15 3/8] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-02-08  8:50 ` [PATCH v15 4/8] mm: hugetlb: alloc " Muchun Song
2021-02-11 18:05   ` Mike Kravetz
2021-02-12 14:15     ` David Hildenbrand [this message]
2021-02-12 15:32   ` Michal Hocko
2021-02-15 10:05     ` [External] " Muchun Song
2021-02-15 10:33       ` Michal Hocko
2021-02-15 11:51         ` Muchun Song
2021-02-15 12:00           ` Muchun Song
2021-02-15 12:18             ` Michal Hocko
2021-02-15 12:44               ` Muchun Song
2021-02-15 13:19                 ` Michal Hocko
2021-02-15 15:36                   ` Muchun Song
2021-02-15 16:27                     ` Michal Hocko
2021-02-15 17:48                       ` Muchun Song
2021-02-15 18:19                         ` Muchun Song
2021-02-15 19:39                           ` Michal Hocko
2021-02-16  4:34                             ` Muchun Song
2021-02-16  8:15                               ` Michal Hocko
2021-02-16  8:20                                 ` David Hildenbrand
2021-02-16  9:03                                 ` Muchun Song
2021-02-15 19:02                         ` Michal Hocko
2021-02-16  8:13                           ` David Hildenbrand
2021-02-16  8:21                             ` Michal Hocko
2021-02-16 19:44                       ` Mike Kravetz
2021-02-17  8:13                         ` Michal Hocko
2021-02-18  1:00                           ` Mike Kravetz
2021-02-18  3:20                             ` Muchun Song
2021-02-18  8:21                               ` Michal Hocko
2021-02-15 12:24           ` Michal Hocko
2021-02-08  8:50 ` [PATCH v15 5/8] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-02-08  8:50 ` [PATCH v15 6/8] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2021-02-08  8:50 ` [PATCH v15 7/8] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-02-08  8:50 ` [PATCH v15 8/8] mm: hugetlb: optimize the code with the help of the compiler Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2afd12e0-60cd-0f9e-99a8-8ded09644504@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).