Linux-Doc Archive on lore.kernel.org
 help / color / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	bp@alien8.de, "X86 ML" <x86@kernel.org>,
	hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com,
	"Randy Dunlap" <rdunlap@infradead.org>,
	oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
	"Mina Almasry" <almasrymina@google.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Michal Hocko" <mhocko@suse.com>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	"David Hildenbrand" <david@redhat.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Xiongchun duan" <duanxiongchun@bytedance.com>,
	fam.zheng@bytedance.com, linux-doc@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	"Linux Memory Management List" <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [External] Re: [PATCH v20 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Tue, 20 Apr 2021 10:48:34 -0700
Message-ID: <8de3d7a0-f100-5d50-fe54-b83af07570f4@oracle.com> (raw)
In-Reply-To: <CAMZfGtV+_mNRumR1RBWiu6OOqhUsTZyBvp--39CJHEEFKMX5Eg@mail.gmail.com>

On 4/20/21 1:46 AM, Muchun Song wrote:
> On Tue, Apr 20, 2021 at 7:20 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> On 4/15/21 1:40 AM, Muchun Song wrote:
>>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>>> index 0abed7e766b8..6e970a7d3480 100644
>>> --- a/include/linux/hugetlb.h
>>> +++ b/include/linux/hugetlb.h
>>> @@ -525,6 +525,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>>>   *   code knows it has only reference.  All other examinations and
>>>   *   modifications require hugetlb_lock.
>>>   * HPG_freed - Set when page is on the free lists.
>>> + * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
>>>   *   Synchronization: hugetlb_lock held for examination and modification.
>>
>> I like the per-page flag.  In previous versions of the series, you just
>> checked the free_vmemmap_pages_per_hpage() to determine if vmemmmap
>> should be allocated.  Is there any change in functionality that makes is
>> necessary to set the flag in each page, or is it mostly for flexibility
>> going forward?
> 
> Actually, only the routine of dissolving the page cares whether
> the page is on the buddy free list when update_and_free_page
> returns. But we cannot change the return type of the
> update_and_free_page (e.g. change return type from 'void' to 'int').
> Why? If the hugepage is freed through a kworker, we cannot
> know the return value when update_and_free_page returns.
> So adding a return value seems odd.
> 
> In the dissolving routine, We can allocate vmemmap pages first,
> if it is successful, then we can make sure that
> update_and_free_page can successfully free page. So I need
> some stuff to mark the page which does not need to allocate
> vmemmap pages.
> 
> On the surface, we seem to have a straightforward method
> to do this.
> 
> Add a new parameter 'alloc_vmemmap' to update_and_free_page() to
> indicate that the caller is already allocated the vmemmap pages.
> update_and_free_page() do not need to allocate. Just like below.
> 
>    void update_and_free_page(struct hstate *h, struct page *page, bool atomic,
>            bool alloc_vmemmap)
>    {
>        if (alloc_vmemmap)
>            // allocate vmemmap pages
>    }
> 
> But if the page is freed through a kworker. How to pass
> 'alloc_vmemmap' to the kworker? We can embed this
> information into the per-page flag. So if we introduce
> HPG_vmemmap_optimized, the parameter of
> alloc_vmemmap is also necessary.
> 
> So it seems that introducing HPG_vmemmap_optimized is
> a good choice.

Thanks for the explanation!

Agree that the flag is a good choice.  How about adding a comment like
this above the alloc_huge_page_vmemmap call in dissolve_free_huge_page?

/*
 * Normally update_and_free_page will allocate required vmemmmap before
 * freeing the page.  update_and_free_page will fail to free the page
 * if it can not allocate required vmemmap.  We need to adjust
 * max_huge_pages if the page is not freed.  Attempt to allocate
 * vmemmmap here so that we can take appropriate action on failure.
 */

...
>>> +static void add_hugetlb_page(struct hstate *h, struct page *page,
>>> +                          bool adjust_surplus)
>>> +{
>>
>> We need to be a bit careful with hugepage specific flags that may be
>> set.  The routine remove_hugetlb_page which is called for 'page' before
>> this routine will not clear any of the hugepage specific flags.  If the
>> calling path goes through free_huge_page, most but not all flags are
>> cleared.
>>
>> We had a discussion about clearing the page->private field in Oscar's
>> series.  In the case of 'new' pages we can assume page->private is
>> cleared, but perhaps we should not make that assumption here.  Since we
>> hope to rarely call this routine, it might be safer to do something
>> like:
>>
>>         set_page_private(page, 0);
>>         SetHPageVmemmapOptimized(page);
> 
> Agree. Thanks for your reminder. I will fix this.
> 
>>
>>> +     int nid = page_to_nid(page);
>>> +
>>> +     lockdep_assert_held(&hugetlb_lock);
>>> +
>>> +     INIT_LIST_HEAD(&page->lru);
>>> +     h->nr_huge_pages++;
>>> +     h->nr_huge_pages_node[nid]++;
>>> +
>>> +     if (adjust_surplus) {
>>> +             h->surplus_huge_pages++;
>>> +             h->surplus_huge_pages_node[nid]++;
>>> +     }
>>> +
>>> +     set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
>>> +
>>> +     /*
>>> +      * The refcount can possibly be increased by memory-failure or
>>> +      * soft_offline handlers.
>>> +      */
>>> +     if (likely(put_page_testzero(page))) {
>>
>> In the existing code there is no such test.  Is the need for the test
>> because of something introduced in the new code?
> 
> No.
> 
>> Or, should this test be in the existing code?
> 
> Yes. gather_surplus_pages should be fixed. I can fix it
> in a separate patch.
> 
> The possible bad scenario:
> 
> CPU0:                           CPU1:
>                                 set_compound_page_dtor(HUGETLB_PAGE_DTOR);
> memory_failure_hugetlb
>   get_hwpoison_page
>     __get_hwpoison_page
>       get_page_unless_zero
>                                 put_page_testzero()
> 
>   put_page(page)
> 
> 
> More details and discussion can refer to:
> 
> https://lore.kernel.org/linux-doc/CAMZfGtVRSBkKe=tKAKLY8dp_hywotq3xL+EJZNjXuSKt3HK3bQ@mail.gmail.com/
> 

Thanks you!  I did not remember that discussion.

It would be helpful to add a separate patch for gather_surplus_pages.
Otherwise, we have the VM_BUG_ON there and not in add_hugetlb_page.

-- 
Mike Kravetz

  reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-15  8:39 [PATCH v20 0/9] Free some vmemmap pages of " Muchun Song
2021-04-15  8:39 ` [PATCH v20 1/9] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-04-15  8:39 ` [PATCH v20 2/9] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-04-15  8:39 ` [PATCH v20 3/9] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-04-16 18:54   ` Mike Kravetz
2021-04-15  8:40 ` [PATCH v20 4/9] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-04-16 21:10   ` Mike Kravetz
2021-04-17  2:55     ` [External] " Muchun Song
2021-04-15  8:40 ` [PATCH v20 5/9] mm: hugetlb: defer freeing of HugeTLB pages Muchun Song
2021-04-16 23:55   ` Mike Kravetz
2021-04-17  4:13     ` [External] " Muchun Song
2021-04-15  8:40 ` [PATCH v20 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Muchun Song
2021-04-19 23:19   ` Mike Kravetz
2021-04-20  8:46     ` [External] " Muchun Song
2021-04-20 17:48       ` Mike Kravetz [this message]
2021-04-21  3:42         ` Muchun Song
2021-04-15  8:40 ` [PATCH v20 7/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-04-19 23:41   ` Mike Kravetz
2021-04-15  8:40 ` [PATCH v20 8/9] mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled Muchun Song
2021-04-20 10:35   ` Oscar Salvador
2021-04-21  3:41     ` [External] " Muchun Song
2021-04-21  7:33       ` Oscar Salvador
2021-04-21  9:06         ` Muchun Song
2021-04-15  8:40 ` [PATCH v20 9/9] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8de3d7a0-f100-5d50-fe54-b83af07570f4@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Doc Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-doc/0 linux-doc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-doc linux-doc/ https://lore.kernel.org/linux-doc \
		linux-doc@vger.kernel.org
	public-inbox-index linux-doc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-doc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git