All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <songmuchun@bytedance.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>,
	Michal Hocko <mhocko@suse.com>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	David Hildenbrand <david@redhat.com>,
	Chen Huang <chenhuang5@huawei.com>,
	"Bodeddula, Balasubramaniam" <bodeddub@amazon.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Xiongchun duan <duanxiongchun@bytedance.com>,
	fam.zheng@bytedance.com, zhengqi.arch@bytedance.com,
	linux-doc@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [External] Re: [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages
Date: Fri, 11 Jun 2021 15:52:52 +0800	[thread overview]
Message-ID: <CAMZfGtU6D28AzoGsVdddrf54P_O-134j2dEMu6gn+uiBJkdi9Q@mail.gmail.com> (raw)
In-Reply-To: <1c910c9a-d5fd-8eb8-526d-bb1f71833c30@oracle.com>

On Fri, Jun 11, 2021 at 6:35 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 6/9/21 5:13 AM, Muchun Song wrote:
> > If the vmemmap is huge PMD mapped, we should split the huge PMD firstly
> > and then we can change the PTE page table entry. In this patch, we add
> > the ability of splitting the huge PMD mapping of vmemmap pages.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/mm.h   |  2 +-
> >  mm/hugetlb.c         | 42 ++++++++++++++++++++++++++++++++++--
> >  mm/hugetlb_vmemmap.c |  3 ++-
> >  mm/sparse-vmemmap.c  | 61 +++++++++++++++++++++++++++++++++++++++++++++-------
> >  4 files changed, 96 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index cadc8cc2c715..b97e1486c5c1 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3056,7 +3056,7 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
> >  #endif
> >
> >  void vmemmap_remap_free(unsigned long start, unsigned long end,
> > -                     unsigned long reuse);
> > +                     unsigned long reuse, struct list_head *pgtables);
> >  int vmemmap_remap_alloc(unsigned long start, unsigned long end,
> >                       unsigned long reuse, gfp_t gfp_mask);
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index c3b2a8a494d6..3137c72d9cc7 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1609,6 +1609,13 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid)
> >  static void __prep_new_huge_page(struct hstate *h, struct page *page)
> >  {
> >       free_huge_page_vmemmap(h, page);
> > +     /*
> > +      * Because we store preallocated pages on @page->lru,
> > +      * vmemmap_pgtable_free() must be called before the
> > +      * initialization of @page->lru in INIT_LIST_HEAD().
> > +      */
> > +     vmemmap_pgtable_free(&page->lru);
> > +
> >       INIT_LIST_HEAD(&page->lru);
> >       set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> >       hugetlb_set_page_subpool(page, NULL);
> > @@ -1775,14 +1782,29 @@ static struct page *alloc_fresh_huge_page(struct hstate *h,
> >               nodemask_t *node_alloc_noretry)
> >  {
> >       struct page *page;
> > +     LIST_HEAD(pgtables);
> > +
> > +     if (vmemmap_pgtable_prealloc(h, &pgtables))
> > +             return NULL;
>
> In the previous two patches I asked:
> - Can we wait until later to prealloc vmemmap pages for gigantic pages
>   allocated from bootmem?
> - Should we fail to add a hugetlb page to the pool if we can not do
>   vmemmap optimization?
>
>
> Depending on the answers to those questions, we may be able to eliminate
> these vmemmap_pgtable_prealloc/vmemmap_pgtable_free calls in hugetlb.c.
> What about adding the calls to free_huge_page_vmemmap?
> At the beginning of free_huge_page_vmemmap, allocate any vmemmap pgtable
> pages.  If it fails, skip optimization.  We can free any pages before
> returning to the caller.

You are right because we've introduced HPageVmemmapOptimized flag.
It can be useful here. If failing to optimize vmemmap is allowed, we can
eliminate allocating/freeing page table helpers. Thanks for your reminder.

>
> Since we also know the page/address in the page table can we check to see
> if it is already PTE mapped.  If so, can we then skip allocation?

Good point. We need to allocate 512 page tables when splitting
1 GB huge page. If we fail to allocate page tables in the middle
of processing of remapping, we should restore the previous
mapping. I just want to clarify something for myself.

Thanks, Mike. I'll try in the next version.


> --
> Mike Kravetz

  reply	other threads:[~2021-06-11  7:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 12:13 [PATCH 0/5] Split huge PMD mapping of vmemmap pages Muchun Song
2021-06-09 12:13 ` [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables Muchun Song
2021-06-10 21:49   ` Mike Kravetz
2021-06-09 12:13 ` [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator Muchun Song
2021-06-10 22:13   ` Mike Kravetz
2021-06-09 12:13 ` [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages Muchun Song
2021-06-10 22:35   ` Mike Kravetz
2021-06-11  7:52     ` Muchun Song [this message]
2021-06-11 12:35       ` [External] " Muchun Song
2021-06-09 12:13 ` [PATCH 4/5] mm: sparsemem: use huge PMD mapping for " Muchun Song
2021-06-10 22:49   ` Mike Kravetz
2021-06-09 12:13 ` [PATCH 5/5] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Muchun Song
2021-06-10 21:32 ` [PATCH 0/5] Split huge PMD mapping of vmemmap pages Mike Kravetz
2021-06-11  3:23   ` [External] " Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZfGtU6D28AzoGsVdddrf54P_O-134j2dEMu6gn+uiBJkdi9Q@mail.gmail.com \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=bodeddub@amazon.com \
    --cc=chenhuang5@huawei.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=song.bao.hua@hisilicon.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.