All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>,
	corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, viro@zeniv.linux.org.uk,
	akpm@linux-foundation.org, paulmck@kernel.org,
	mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com,
	rdunlap@infradead.org, oneukum@suse.com,
	anshuman.khandual@arm.com, jroedel@suse.de,
	almasrymina@google.com, rientjes@google.com, willy@infradead.org,
	mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com,
	naoya.horiguchi@nec.com, joao.m.martins@oracle.com,
	duanxiongchun@bytedance.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Tue, 23 Feb 2021 10:27:55 +0100	[thread overview]
Message-ID: <20210223092740.GA1998@linux> (raw)
In-Reply-To: <13a5363c-6af4-1e1f-9a18-972ca18278b5@oracle.com>

On Mon, Feb 22, 2021 at 04:00:27PM -0800, Mike Kravetz wrote:
> > -static void update_and_free_page(struct hstate *h, struct page *page)
> > +static int update_and_free_page(struct hstate *h, struct page *page)
> > +	__releases(&hugetlb_lock) __acquires(&hugetlb_lock)
> >  {
> >  	int i;
> > +	int nid = page_to_nid(page);
> >  
> >  	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> > -		return;
> > +		return 0;
> >  
> >  	h->nr_huge_pages--;
> > -	h->nr_huge_pages_node[page_to_nid(page)]--;
> > +	h->nr_huge_pages_node[nid]--;
> > +	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
> > +	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page);
> > +	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
> > +	set_page_refcounted(page);
> 
> I think you added the set_page_refcounted() because the huge page will
> appear as just a compound page without a reference after dropping the
> hugetlb lock?  It might be better to set the reference before modifying
> the destructor.  Otherwise, page scanning code could find the non-hugetlb
> compound page with no reference.  I could not find any code where this
> would be a problem, but I think it would be safer to set the reference
> first.

But we already had set_page_refcounted() before this patchset there.
Are the worries only because we drop the lock? AFAICS, the "page-scanning"
problem could have happened before as well?
Although, what does page scanning mean in this context?

I am not opposed to move it above, but I would like to understand the concern
here.

> 
> > +	spin_unlock(&hugetlb_lock);
> 
> I really like the way this code is structured.  It is much simpler than
> previous versions with retries or workqueue.  There is nothing wrong with
> always dropping the lock here.  However, I wonder if we should think about
> optimizing for the case where this feature is not enabled and we are not
> freeing a 1G huge page.  I suspect this will be the most common case for
> some time, and there is no need to drop the lock in this case.
> 
> Please do not change the code based on my comment.  I just wanted to bring
> this up for thought.
> 
> Is it as simple as checking?
>         if (free_vmemmap_pages_per_hpage(h) || hstate_is_gigantic(h))
>                 spin_unlock(&hugetlb_lock);
> 
>         /* before return */
>         if (free_vmemmap_pages_per_hpage(h) || hstate_is_gigantic(h))
>                 spin_lock(&hugetlb_lock);

AFAIK, we at least need the hstate_is_gigantic? Comment below says that
free_gigantic_page might block, so we need to drop the lock.
And I am fine with the change overall.

Unless I am missing something, we should not need to drop the lock unless
we need to allocate vmemmap pages (apart from gigantic pages).

> 
> > +
> > +	if (alloc_huge_page_vmemmap(h, page)) {
> > +		int zeroed;
> > +
> > +		spin_lock(&hugetlb_lock);
> > +		INIT_LIST_HEAD(&page->lru);
> > +		set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> > +		h->nr_huge_pages++;
> > +		h->nr_huge_pages_node[nid]++;

I think prep_new_huge_page() does this for us?

> > +
> > +		/*
> > +		 * If we cannot allocate vmemmap pages, just refuse to free the
> > +		 * page and put the page back on the hugetlb free list and treat
> > +		 * as a surplus page.
> > +		 */
> > +		h->surplus_huge_pages++;
> > +		h->surplus_huge_pages_node[nid]++;
> > +
> > +		/*
> > +		 * This page is now managed by the hugetlb allocator and has
> > +		 * no users -- drop the last reference.
> > +		 */
> > +		zeroed = put_page_testzero(page);
> > +		VM_BUG_ON_PAGE(!zeroed, page);

Can this actually happen? AFAIK, page landed in update_and_free_page should be
zero refcounted, then we increase the reference, and I cannot see how the
reference might have changed in the meantime.

I am all for catching corner cases, but not sure how realistic this is.
Moreover, if we __ever__ get there, things can get nasty.

We basically will have an in-use page in the free hugetlb pool, so corruption
will happen. At that point, a plain BUG_ON might be better.

But as I said, I do not think we need that.

I yet need to look further, but what I have seen so far looks good.

-- 
Oscar Salvador
SUSE L3

  parent reply	other threads:[~2021-02-23  9:31 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-19 10:49 [PATCH v16 0/9] Free some vmemmap pages of HugeTLB page Muchun Song
2021-02-19 10:49 ` [PATCH v16 1/9] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-02-19 10:49 ` [PATCH v16 2/9] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-02-19 10:49 ` [PATCH v16 3/9] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-02-19 10:49 ` [PATCH v16 4/9] mm: hugetlb: alloc " Muchun Song
2021-02-19 14:12   ` Michal Hocko
2021-02-20  4:20     ` [External] " Muchun Song
2021-02-22  9:25       ` Michal Hocko
2021-02-22 10:31         ` Muchun Song
2021-02-22 10:50           ` Oscar Salvador
2021-02-23  0:00   ` Mike Kravetz
2021-02-23  5:35     ` [External] " Muchun Song
2021-02-23  5:35       ` Muchun Song
2021-02-23  9:27     ` Oscar Salvador [this message]
2021-02-23 10:27       ` Muchun Song
2021-02-23 10:50         ` Oscar Salvador
2021-02-23 15:41           ` Oscar Salvador
2021-02-23 22:31             ` Oscar Salvador
2021-02-24  3:47               ` Muchun Song
2021-02-24  8:31                 ` Oscar Salvador
2021-02-19 10:49 ` [PATCH v16 5/9] mm: hugetlb: set the PageHWPoison to the raw error page Muchun Song
2021-02-19 10:49 ` [PATCH v16 6/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-02-19 10:49 ` [PATCH v16 7/9] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2021-02-19 10:49 ` [PATCH v16 8/9] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-02-19 10:49 ` [PATCH v16 9/9] mm: hugetlb: optimize the code with the help of the compiler Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210223092740.GA1998@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.