All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Liu Shixin <liushixin2@huawei.com>,
	Yang Shi <shy828301@gmail.com>,
	Oscar Salvador <osalvador@suse.de>,
	Muchun Song <songmuchun@bytedance.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
Date: Tue, 5 Jul 2022 06:39:20 +0000	[thread overview]
Message-ID: <20220705063918.GA2508809@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <865207df-b272-c7c9-e90c-5748262d3d87@huawei.com>

On Tue, Jul 05, 2022 at 10:16:39AM +0800, Miaohe Lin wrote:
> On 2022/7/4 9:33, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > I found a weird state of 1GB hugepage pool, caused by the following
> > procedure:
> > 
> >   - run a process reserving all free 1GB hugepages,
> >   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> >     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> >   - kill the reserving process.
> > 
> > , then all the hugepages are free *and* surplus at the same time.
> > 
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> >   3
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> >   3
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> >   0
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> >   3
> > 
> > This state is resolved by reserving and allocating the pages then
> > freeing them again, so this seems not to result in serious problem.
> > But it's a little surprising (shrinking pool suddenly fails).
> > 
> > This behavior is caused by hstate_is_gigantic() check in
> > return_unused_surplus_pages(). This was introduced so long ago in 2008
> > by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> > at that time the gigantic pages were not supposed to be allocated/freed
> > at run-time.  Now kernel can support runtime allocation/free, so let's
> > check gigantic_page_runtime_supported() together.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> This patch looks good to me with a few question below.

Thank you for reviewing.

> 
> > ---
> > v2 -> v3:
> > - Fixed typo in patch description,
> > - add !gigantic_page_runtime_supported() check instead of removing
> >   hstate_is_gigantic() check (suggested by Miaohe and Muchun)
> > - add a few more !gigantic_page_runtime_supported() check in
> >   set_max_huge_pages() (by Mike).
> > ---
> >  mm/hugetlb.c | 19 ++++++++++++++++---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 2a554f006255..bdc4499f324b 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
> >  	/* Uncommit the reservation */
> >  	h->resv_huge_pages -= unused_resv_pages;
> >  
> > -	/* Cannot return gigantic pages currently */
> > -	if (hstate_is_gigantic(h))
> > +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >  		goto out;
> >  
> >  	/*
> > @@ -3315,7 +3314,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> >  	 * the user tries to allocate gigantic pages but let the user free the
> >  	 * boottime allocated gigantic pages.
> >  	 */
> > -	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
> > +	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
> > +				      !gigantic_page_runtime_supported())) {
> >  		if (count > persistent_huge_pages(h)) {
> >  			spin_unlock_irq(&hugetlb_lock);
> >  			mutex_unlock(&h->resize_lock);
> > @@ -3363,6 +3363,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> >  			goto out;
> >  	}
> >  
> > +	/*
> > +	 * We can not decrease gigantic pool size if runtime modification
> > +	 * is not supported.
> > +	 */
> > +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
> > +		if (count < persistent_huge_pages(h)) {
> > +			spin_unlock_irq(&hugetlb_lock);
> > +			mutex_unlock(&h->resize_lock);
> > +			NODEMASK_FREE(node_alloc_noretry);
> > +			return -EINVAL;
> > +		}
> > +	}
> 
> With above change, we're not allowed to decrease the pool size now. But it was allowed previously
> even if !gigantic_page_runtime_supported. Does this will break user?

Yes, it does. I might get the wrong idea about the definition of
gigantic_page_runtime_supported(), which shows that runtime pool *extension*
is supported or not (implying that pool shrinking is always possible).
If this is right, this new if-block is not necessary.

> 
> And it seems it's not allowed to adjust the max_huge_pages now if !gigantic_page_runtime_supported
> for gigantic huge page. Should we just return for such case as there should be nothing to do now?
> Or am I miss something?

If pool shrinking is always allowed, we need uptdate max_huge_pages so,
the above if-block should have "goto out;", but it will be removed anyway
so we don't have to care for it.

Thank you for the valuable comment.

- Naoya Horiguchi

  reply	other threads:[~2022-07-05  6:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
2022-07-05  2:16   ` Miaohe Lin
2022-07-05  6:39     ` HORIGUCHI NAOYA(堀口 直也) [this message]
2022-07-06  3:04       ` Miaohe Lin
2022-07-06  3:22         ` Mike Kravetz
2022-07-07  2:59           ` Miaohe Lin
2022-07-06 21:51   ` Mike Kravetz
2022-07-07  0:56     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-04  1:33 ` [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range() Naoya Horiguchi
2022-07-04  1:42   ` Andrew Morton
2022-07-04  2:04     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-04  1:33 ` [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
2022-07-05  2:46   ` Miaohe Lin
2022-07-05  9:04     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-06  3:07       ` Miaohe Lin
2022-07-06 22:21   ` Mike Kravetz
2022-07-04  1:33 ` [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
2022-07-06  2:37   ` Miaohe Lin
2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-07  3:22       ` Miaohe Lin
2022-07-04  1:33 ` [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
2022-07-06  2:58   ` Miaohe Lin
2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-07  1:35       ` HORIGUCHI NAOYA(堀口 直也)
2022-07-07  3:08         ` Miaohe Lin
2022-07-04  1:33 ` [mm-unstable PATCH v4 6/9] mm, hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 7/9] mm, hwpoison: make __page_handle_poison returns int Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 8/9] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 9/9] mm, hwpoison: enable memory error handling on " Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220705063918.GA2508809@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=osalvador@suse.de \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.