All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <songmuchun@bytedance.com>
To: Michal Hocko <mhocko@suse.com>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	"Peter Zijlstra" <peterz@infradead.org>,
	viro@zeniv.linux.org.uk,
	"Andrew Morton" <akpm@linux-foundation.org>,
	paulmck@kernel.org, mchehab+huawei@kernel.org,
	pawan.kumar.gupta@linux.intel.com,
	"Randy Dunlap" <rdunlap@infradead.org>,
	oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
	"Mina Almasry" <almasrymina@google.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	"David Hildenbrand" <david@redhat.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Xiongchun duan" <duanxiongchun@bytedance.com>,
	linux-doc@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	"Linux Memory Management List" <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Mon, 15 Feb 2021 18:05:06 +0800	[thread overview]
Message-ID: <CAMZfGtXgVUvCejpxu1o5WDvmQ7S88rWqGi3DAGM6j5NHJgtdcg@mail.gmail.com> (raw)
In-Reply-To: <YCafit5ruRJ+SL8I@dhcp22.suse.cz>

On Fri, Feb 12, 2021 at 11:32 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 08-02-21 16:50:09, Muchun Song wrote:
> > When we free a HugeTLB page to the buddy allocator, we should allocate the
> > vmemmap pages associated with it. But we may cannot allocate vmemmap pages
> > when the system is under memory pressure, in this case, we just refuse to
> > free the HugeTLB page instead of looping forever trying to allocate the
> > pages.
>
> Thanks for simplifying the implementation from your early proposal!
>
> This will not be looping for ever. The allocation will usually trigger
> the OOM killer and sooner or later there will be a memory to allocate
> from or the system panics when there are no eligible tasks to kill. This
> is just a side note.
>
> I think the changelog could benefit from a more explicit documentation
> of those error failures. There are different cases when the hugetlb page
> is freed. It can be due to an admin intervention (decrease the pool),
> overcommit, migration, dissolving and likely some others. Most of them
> should be fine to stay in the pool which would just increase the surplus
> pages in the pool. I am not so sure about dissolving path.

Thanks. I will update the changelog.

> [...]
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index 0209b736e0b4..3d85e3ab7caa 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -169,6 +169,8 @@
> >   * (last) level. So this type of HugeTLB page can be optimized only when its
> >   * size of the struct page structs is greater than 2 pages.
> >   */
> > +#define pr_fmt(fmt)  "HugeTLB: " fmt
> > +
> >  #include "hugetlb_vmemmap.h"
> >
> >  /*
> > @@ -198,6 +200,34 @@ static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h)
> >       return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT;
> >  }
> >
> > +int alloc_huge_page_vmemmap(struct hstate *h, struct page *head)
> > +{
> > +     int ret;
> > +     unsigned long vmemmap_addr = (unsigned long)head;
> > +     unsigned long vmemmap_end, vmemmap_reuse;
> > +
> > +     if (!free_vmemmap_pages_per_hpage(h))
> > +             return 0;
> > +
> > +     vmemmap_addr += RESERVE_VMEMMAP_SIZE;
> > +     vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h);
> > +     vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
> > +
> > +     /*
> > +      * The pages which the vmemmap virtual address range [@vmemmap_addr,
> > +      * @vmemmap_end) are mapped to are freed to the buddy allocator, and
> > +      * the range is mapped to the page which @vmemmap_reuse is mapped to.
> > +      * When a HugeTLB page is freed to the buddy allocator, previously
> > +      * discarded vmemmap pages must be allocated and remapping.
> > +      */
> > +     ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
> > +                               GFP_ATOMIC | __GFP_NOWARN | __GFP_THISNODE);
>
> I do not think that this is a good allocation mode. GFP_ATOMIC is a non
> sleeping allocation and a medium memory pressure might cause it to
> fail prematurely. I do not think this is really an atomic context which
> couldn't afford memory reclaim. I also do not think we want to grant

Because alloc_huge_page_vmemmap is called under hugetlb_lock
now. So using GFP_ATOMIC indeed makes the code more simpler.
From the document of the kernel, I learned that __GFP_NOMEMALLOC
can be used to explicitly forbid access to emergency reserves. So if
we do not want to use the reserve memory. How about replacing it to

GFP_ATOMIC | __GFP_NOMEMALLOC | __GFP_NOWARN | __GFP_THISNODE

Thanks.

> access to memory reserve is reasonable. Just think of a huge number of
> hugetlb pages being freed which can deplete the memory reserve for
> atomic allocations. I think that you want
>         GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN | __GFP_THISNODE
>
> for an initial implementation. The justification being that the
> allocation should at least try to reclaim but it shouldn't cause any
> major disruption because the failure is not fatal. If the failure rate
> would be impractically high then just drop NORETRY part. You can replace
> it by __GFP_RETRY_MAYFAIL but that shouldn't be strictly necessary
> because __GFP_THISNODE on its own implies on OOM killer, but that is
> kinda ugly to rely on.
> --
> Michal Hocko
> SUSE Labs

  reply	other threads:[~2021-02-15 10:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:50 [PATCH v15 0/8] Free some vmemmap pages of HugeTLB page Muchun Song
2021-02-08  8:50 ` [PATCH v15 1/8] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-02-08  8:50 ` [PATCH v15 2/8] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-02-08  8:50 ` [PATCH v15 3/8] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-02-08  8:50 ` [PATCH v15 4/8] mm: hugetlb: alloc " Muchun Song
2021-02-11 18:05   ` Mike Kravetz
2021-02-12 14:15     ` David Hildenbrand
2021-02-12 15:32   ` Michal Hocko
2021-02-15 10:05     ` Muchun Song [this message]
2021-02-15 10:33       ` [External] " Michal Hocko
2021-02-15 11:51         ` Muchun Song
2021-02-15 12:00           ` Muchun Song
2021-02-15 12:18             ` Michal Hocko
2021-02-15 12:44               ` Muchun Song
2021-02-15 13:19                 ` Michal Hocko
2021-02-15 15:36                   ` Muchun Song
2021-02-15 16:27                     ` Michal Hocko
2021-02-15 17:48                       ` Muchun Song
2021-02-15 18:19                         ` Muchun Song
2021-02-15 19:39                           ` Michal Hocko
2021-02-16  4:34                             ` Muchun Song
2021-02-16  8:15                               ` Michal Hocko
2021-02-16  8:20                                 ` David Hildenbrand
2021-02-16  9:03                                 ` Muchun Song
2021-02-15 19:02                         ` Michal Hocko
2021-02-16  8:13                           ` David Hildenbrand
2021-02-16  8:21                             ` Michal Hocko
2021-02-16 19:44                       ` Mike Kravetz
2021-02-17  8:13                         ` Michal Hocko
2021-02-18  1:00                           ` Mike Kravetz
2021-02-18  3:20                             ` Muchun Song
2021-02-18  8:21                               ` Michal Hocko
2021-02-15 12:24           ` Michal Hocko
2021-02-08  8:50 ` [PATCH v15 5/8] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-02-08  8:50 ` [PATCH v15 6/8] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2021-02-08  8:50 ` [PATCH v15 7/8] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-02-08  8:50 ` [PATCH v15 8/8] mm: hugetlb: optimize the code with the help of the compiler Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZfGtXgVUvCejpxu1o5WDvmQ7S88rWqGi3DAGM6j5NHJgtdcg@mail.gmail.com \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.