All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <songmuchun@bytedance.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	bp@alien8.de, "X86 ML" <x86@kernel.org>,
	hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com,
	"Randy Dunlap" <rdunlap@infradead.org>,
	oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
	"Mina Almasry" <almasrymina@google.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Michal Hocko" <mhocko@suse.com>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	"David Hildenbrand" <david@redhat.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Xiongchun duan" <duanxiongchun@bytedance.com>,
	fam.zheng@bytedance.com, linux-doc@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	"Linux Memory Management List" <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [External] Re: [PATCH v20 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Wed, 21 Apr 2021 11:42:31 +0800	[thread overview]
Message-ID: <CAMZfGtWMSjYS_Xqb8qXfvzsQCZG7Vn2hUxpxiOqLrPXgy80Suw@mail.gmail.com> (raw)
In-Reply-To: <8de3d7a0-f100-5d50-fe54-b83af07570f4@oracle.com>

On Wed, Apr 21, 2021 at 1:48 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 4/20/21 1:46 AM, Muchun Song wrote:
> > On Tue, Apr 20, 2021 at 7:20 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >>
> >> On 4/15/21 1:40 AM, Muchun Song wrote:
> >>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> >>> index 0abed7e766b8..6e970a7d3480 100644
> >>> --- a/include/linux/hugetlb.h
> >>> +++ b/include/linux/hugetlb.h
> >>> @@ -525,6 +525,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> >>>   *   code knows it has only reference.  All other examinations and
> >>>   *   modifications require hugetlb_lock.
> >>>   * HPG_freed - Set when page is on the free lists.
> >>> + * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
> >>>   *   Synchronization: hugetlb_lock held for examination and modification.
> >>
> >> I like the per-page flag.  In previous versions of the series, you just
> >> checked the free_vmemmap_pages_per_hpage() to determine if vmemmmap
> >> should be allocated.  Is there any change in functionality that makes is
> >> necessary to set the flag in each page, or is it mostly for flexibility
> >> going forward?
> >
> > Actually, only the routine of dissolving the page cares whether
> > the page is on the buddy free list when update_and_free_page
> > returns. But we cannot change the return type of the
> > update_and_free_page (e.g. change return type from 'void' to 'int').
> > Why? If the hugepage is freed through a kworker, we cannot
> > know the return value when update_and_free_page returns.
> > So adding a return value seems odd.
> >
> > In the dissolving routine, We can allocate vmemmap pages first,
> > if it is successful, then we can make sure that
> > update_and_free_page can successfully free page. So I need
> > some stuff to mark the page which does not need to allocate
> > vmemmap pages.
> >
> > On the surface, we seem to have a straightforward method
> > to do this.
> >
> > Add a new parameter 'alloc_vmemmap' to update_and_free_page() to
> > indicate that the caller is already allocated the vmemmap pages.
> > update_and_free_page() do not need to allocate. Just like below.
> >
> >    void update_and_free_page(struct hstate *h, struct page *page, bool atomic,
> >            bool alloc_vmemmap)
> >    {
> >        if (alloc_vmemmap)
> >            // allocate vmemmap pages
> >    }
> >
> > But if the page is freed through a kworker. How to pass
> > 'alloc_vmemmap' to the kworker? We can embed this
> > information into the per-page flag. So if we introduce
> > HPG_vmemmap_optimized, the parameter of
> > alloc_vmemmap is also necessary.
> >
> > So it seems that introducing HPG_vmemmap_optimized is
> > a good choice.
>
> Thanks for the explanation!
>
> Agree that the flag is a good choice.  How about adding a comment like
> this above the alloc_huge_page_vmemmap call in dissolve_free_huge_page?
>
> /*
>  * Normally update_and_free_page will allocate required vmemmmap before
>  * freeing the page.  update_and_free_page will fail to free the page
>  * if it can not allocate required vmemmap.  We need to adjust
>  * max_huge_pages if the page is not freed.  Attempt to allocate
>  * vmemmmap here so that we can take appropriate action on failure.
>  */

Thanks. I will add this comment.

>
> ...
> >>> +static void add_hugetlb_page(struct hstate *h, struct page *page,
> >>> +                          bool adjust_surplus)
> >>> +{
> >>
> >> We need to be a bit careful with hugepage specific flags that may be
> >> set.  The routine remove_hugetlb_page which is called for 'page' before
> >> this routine will not clear any of the hugepage specific flags.  If the
> >> calling path goes through free_huge_page, most but not all flags are
> >> cleared.
> >>
> >> We had a discussion about clearing the page->private field in Oscar's
> >> series.  In the case of 'new' pages we can assume page->private is
> >> cleared, but perhaps we should not make that assumption here.  Since we
> >> hope to rarely call this routine, it might be safer to do something
> >> like:
> >>
> >>         set_page_private(page, 0);
> >>         SetHPageVmemmapOptimized(page);
> >
> > Agree. Thanks for your reminder. I will fix this.
> >
> >>
> >>> +     int nid = page_to_nid(page);
> >>> +
> >>> +     lockdep_assert_held(&hugetlb_lock);
> >>> +
> >>> +     INIT_LIST_HEAD(&page->lru);
> >>> +     h->nr_huge_pages++;
> >>> +     h->nr_huge_pages_node[nid]++;
> >>> +
> >>> +     if (adjust_surplus) {
> >>> +             h->surplus_huge_pages++;
> >>> +             h->surplus_huge_pages_node[nid]++;
> >>> +     }
> >>> +
> >>> +     set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> >>> +
> >>> +     /*
> >>> +      * The refcount can possibly be increased by memory-failure or
> >>> +      * soft_offline handlers.
> >>> +      */
> >>> +     if (likely(put_page_testzero(page))) {
> >>
> >> In the existing code there is no such test.  Is the need for the test
> >> because of something introduced in the new code?
> >
> > No.
> >
> >> Or, should this test be in the existing code?
> >
> > Yes. gather_surplus_pages should be fixed. I can fix it
> > in a separate patch.
> >
> > The possible bad scenario:
> >
> > CPU0:                           CPU1:
> >                                 set_compound_page_dtor(HUGETLB_PAGE_DTOR);
> > memory_failure_hugetlb
> >   get_hwpoison_page
> >     __get_hwpoison_page
> >       get_page_unless_zero
> >                                 put_page_testzero()
> >
> >   put_page(page)
> >
> >
> > More details and discussion can refer to:
> >
> > https://lore.kernel.org/linux-doc/CAMZfGtVRSBkKe=tKAKLY8dp_hywotq3xL+EJZNjXuSKt3HK3bQ@mail.gmail.com/
> >
>
> Thanks you!  I did not remember that discussion.
>
> It would be helpful to add a separate patch for gather_surplus_pages.
> Otherwise, we have the VM_BUG_ON there and not in add_hugetlb_page.
>

Agree. Will do.

> --
> Mike Kravetz

  reply	other threads:[~2021-04-21  3:43 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-15  8:39 [PATCH v20 0/9] Free some vmemmap pages of HugeTLB page Muchun Song
2021-04-15  8:39 ` [PATCH v20 1/9] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-04-15  8:39 ` [PATCH v20 2/9] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-04-15  8:39 ` [PATCH v20 3/9] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-04-16 18:54   ` Mike Kravetz
2021-04-15  8:40 ` [PATCH v20 4/9] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-04-16 21:10   ` Mike Kravetz
2021-04-17  2:55     ` [External] " Muchun Song
2021-04-15  8:40 ` [PATCH v20 5/9] mm: hugetlb: defer freeing of HugeTLB pages Muchun Song
2021-04-16 23:55   ` Mike Kravetz
2021-04-17  4:13     ` [External] " Muchun Song
2021-04-15  8:40 ` [PATCH v20 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Muchun Song
2021-04-19 23:19   ` Mike Kravetz
2021-04-20  8:46     ` [External] " Muchun Song
2021-04-20 17:48       ` Mike Kravetz
2021-04-21  3:42         ` Muchun Song [this message]
2021-04-15  8:40 ` [PATCH v20 7/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-04-19 23:41   ` Mike Kravetz
2021-04-15  8:40 ` [PATCH v20 8/9] mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled Muchun Song
2021-04-20 10:35   ` Oscar Salvador
2021-04-21  3:41     ` [External] " Muchun Song
2021-04-21  7:33       ` Oscar Salvador
2021-04-21  9:06         ` Muchun Song
2021-04-15  8:40 ` [PATCH v20 9/9] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZfGtWMSjYS_Xqb8qXfvzsQCZG7Vn2hUxpxiOqLrPXgy80Suw@mail.gmail.com \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.