All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Michal Hocko <mhocko@suse.com>,
	Muchun Song <songmuchun@bytedance.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] mm,hwpoison: fix race with compound page allocation
Date: Fri, 7 May 2021 04:17:52 +0000	[thread overview]
Message-ID: <20210507041751.GA2158342@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <YJOuFeteYQEPY9WF@localhost.localdomain>

On Thu, May 06, 2021 at 10:51:33AM +0200, Oscar Salvador wrote:
> On Thu, May 06, 2021 at 10:31:22AM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Date: Thu, 6 May 2021 09:54:39 +0900
> > Subject: [PATCH] mm,hwpoison: fix race with compound page allocation
> > 
> > When hugetlb page fault (under overcommiting situation) and memory_failure()
> > race, VM_BUG_ON_PAGE() is triggered by the following race:
> > 
> >     CPU0:                           CPU1:
> > 
> >                                     gather_surplus_pages()
> >                                       page = alloc_surplus_huge_page()
> >     memory_failure_hugetlb()
> >       get_hwpoison_page(page)
> >         __get_hwpoison_page(page)
> >           get_page_unless_zero(page)
> >                                       zero = put_page_testzero(page)
> >                                       VM_BUG_ON_PAGE(!zero, page)
> >                                       enqueue_huge_page(h, page)
> >       put_page(page)
> > 
> > __get_hwpoison_page() only checks page refcount before taking additional
> > one for memory error handling, which is wrong because there's time
> > windows where compound pages have non-zero refcount during initialization.
> > 
> > So makes __get_hwpoison_page() check more page status for a few types
> > of compound pages. PageSlab() check is added because otherwise
> > "non anonymous thp" path is wrongly chosen.
> > 
> > Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Reported-by: Muchun Song <songmuchun@bytedance.com>
> > Cc: stable@vger.kernel.org # 5.12+
> 
> Hi Naoya, 
> 
> thanks for the patch.
> I have some concerns though, more below:
> 
> > ---
> >  mm/memory-failure.c | 53 +++++++++++++++++++++++++++------------------
> >  1 file changed, 32 insertions(+), 21 deletions(-)
> > 
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index a3659619d293..966a1d6b0bc8 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1095,30 +1095,41 @@ static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> >  
> > -	if (!PageHuge(head) && PageTransHuge(head)) {
> > -		/*
> > -		 * Non anonymous thp exists only in allocation/free time. We
> > -		 * can't handle such a case correctly, so let's give it up.
> > -		 * This should be better than triggering BUG_ON when kernel
> > -		 * tries to touch the "partially handled" page.
> > -		 */
> > -		if (!PageAnon(head)) {
> > -			pr_err("Memory failure: %#lx: non anonymous thp\n",
> > -				page_to_pfn(page));
> > -			return 0;
> > +	if (PageCompound(page)) {
> > +		if (PageSlab(page)) {
> > +			return get_page_unless_zero(page);
> > +		} else if (PageHuge(head)) {
> > +			int ret = 0;
> > +
> > +			spin_lock(&hugetlb_lock);
> > +			if (HPageFreed(head) || HPageMigratable(head))
> > +				ret = get_page_unless_zero(head);
> > +			spin_unlock(&hugetlb_lock);
> > +			return ret;
> 
> Ok, I am probably overthinking this but should we re-check under the
> lock wehther the page is a hugetlb page?
> My concern is, what would happen if:
> 
> CPU0                                          CPU1
>  __get_hwpoison_page                          
>   PageHuge(head) == T                         
>                                               dissolve hugetlb page
>    hugetlb_lock                               
> 
> 
> In that case, by the time we get to check hugetlb flags, those checks
> might return false, and we do not get a refcount.

Thanks, we had better add rechecking as we do in dissolve_free_huge_page().

> So, I guess my question is: Should we re-check under the lock, and if it
> is not, do a "goto try_to_get_ref" that starts right at the beginning,
> or goes directly to the get_page_unless_zero at the end (the former
> probably better)?

Yes, retry could work in this case.  Looking at existing code,
get_any_page() provides "retry" layer, but it's not called now by
get_hwpoison_page() when called from memory_failure().  So I think of trying
to adjust code and make get_hwpoison_page call get_any_page() instead of
calling __get_hwpoison_page(() directly.

Thanks,
Naoya Horiguchi

      reply	other threads:[~2021-05-07  4:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21  6:02 [PATCH] mm: hugetlb: fix a race between memory-failure/soft_offline and gather_surplus_pages Muchun Song
2021-04-21  8:03 ` Michal Hocko
2021-04-21  8:15   ` [External] " Muchun Song
2021-04-21  8:15     ` Muchun Song
2021-04-21  8:21     ` Oscar Salvador
2021-04-21  8:41       ` Muchun Song
2021-04-21  8:41         ` Muchun Song
2021-04-21  8:49         ` Oscar Salvador
2021-04-21  8:58           ` Muchun Song
2021-04-21  8:58             ` Muchun Song
2021-04-21  8:43       ` Michal Hocko
2021-04-21  8:25     ` Michal Hocko
2021-04-21  8:33   ` HORIGUCHI NAOYA(堀口 直也)
2021-04-21  9:02     ` [External] " Muchun Song
2021-04-21  9:02       ` Muchun Song
2021-04-21 18:03     ` Mike Kravetz
2021-04-22  8:27       ` HORIGUCHI NAOYA(堀口 直也)
2021-04-23  8:01         ` HORIGUCHI NAOYA(堀口 直也)
2021-04-28  7:46           ` [PATCH] mm,hwpoison: fix race with compound page allocation Naoya Horiguchi
2021-04-28  8:23             ` Oscar Salvador
2021-04-28  9:18               ` HORIGUCHI NAOYA(堀口 直也)
2021-05-06  1:31                 ` [PATCH v2] " Naoya Horiguchi
2021-05-06  8:51                   ` Oscar Salvador
2021-05-07  4:17                     ` HORIGUCHI NAOYA(堀口 直也) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210507041751.GA2158342@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.