All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Oscar Salvador <osalvador@suse.de>,
	Muchun Song <songmuchun@bytedance.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Tony Luck <tony.luck@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 1/2] mm,hwpoison: fix race with hugetlb page allocation
Date: Thu, 20 May 2021 07:17:17 +0000	[thread overview]
Message-ID: <20210520071717.GA2641190@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <d78f430c-2390-2a5f-564a-e20e0ba6b26a@oracle.com>

On Wed, May 19, 2021 at 03:32:17PM -0700, Mike Kravetz wrote:
> On 5/18/21 4:12 PM, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When hugetlb page fault (under overcommitting situation) and
> > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> > 
> >     CPU0:                           CPU1:
> > 
> >                                     gather_surplus_pages()
> >                                       page = alloc_surplus_huge_page()
> >     memory_failure_hugetlb()
> >       get_hwpoison_page(page)
> >         __get_hwpoison_page(page)
> >           get_page_unless_zero(page)
> >                                       zero = put_page_testzero(page)
> >                                       VM_BUG_ON_PAGE(!zero, page)
> >                                       enqueue_huge_page(h, page)
> >       put_page(page)
> > 
> > __get_hwpoison_page() only checks the page refcount before taking an
> > additional one for memory error handling, which is wrong because there's
> > a time window where compound pages have non-zero refcount during
> > initialization.  So make __get_hwpoison_page() check page status a bit
> > more for hugetlb pages.
> > 
> > Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Reported-by: Muchun Song <songmuchun@bytedance.com>
> > Cc: stable@vger.kernel.org # 5.12+
> > ---
> >  include/linux/hugetlb.h |  6 ++++++
> >  mm/hugetlb.c            | 15 +++++++++++++++
> >  mm/memory-failure.c     |  8 +++++++-
> >  3 files changed, 28 insertions(+), 1 deletion(-)
> > 
> > diff --git v5.13-rc2/include/linux/hugetlb.h v5.13-rc2_patched/include/linux/hugetlb.h
> > index b92f25ccef58..790ae618548d 100644
> > --- v5.13-rc2/include/linux/hugetlb.h
> > +++ v5.13-rc2_patched/include/linux/hugetlb.h
> > @@ -149,6 +149,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
> >  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
> >  						long freed);
> >  bool isolate_huge_page(struct page *page, struct list_head *list);
> > +int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
> >  void putback_active_hugepage(struct page *page);
> >  void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
> >  void free_huge_page(struct page *page);
> > @@ -339,6 +340,11 @@ static inline bool isolate_huge_page(struct page *page, struct list_head *list)
> >  	return false;
> >  }
> >  
> > +static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
> > +{
> > +	return 0;
> > +}
> > +
> >  static inline void putback_active_hugepage(struct page *page)
> >  {
> >  }
> > diff --git v5.13-rc2/mm/hugetlb.c v5.13-rc2_patched/mm/hugetlb.c
> > index 95918f410c0f..f138bae3e302 100644
> > --- v5.13-rc2/mm/hugetlb.c
> > +++ v5.13-rc2_patched/mm/hugetlb.c
> > @@ -5847,6 +5847,21 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
> >  	return ret;
> >  }
> >  
> > +int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
> > +{
> > +	int ret = 0;
> > +
> > +	*hugetlb = false;
> > +	spin_lock_irq(&hugetlb_lock);
> > +	if (PageHeadHuge(page)) {
> > +		*hugetlb = true;
> > +		if (HPageFreed(page) || HPageMigratable(page))
> > +			ret = get_page_unless_zero(page);
> > +	}
> > +	spin_unlock_irq(&hugetlb_lock);
> > +	return ret;
> > +}
> > +
> >  void putback_active_hugepage(struct page *page)
> >  {
> >  	spin_lock_irq(&hugetlb_lock);
> > diff --git v5.13-rc2/mm/memory-failure.c v5.13-rc2_patched/mm/memory-failure.c
> > index 85ad98c00fd9..353c6177e489 100644
> > --- v5.13-rc2/mm/memory-failure.c
> > +++ v5.13-rc2_patched/mm/memory-failure.c
> > @@ -959,8 +959,14 @@ static int page_action(struct page_state *ps, struct page *p,
> >  static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> > +	int ret = 0;
> > +	bool hugetlb = false;
> > +
> > +	ret = get_hwpoison_huge_page(head, &hugetlb);
> > +	if (hugetlb)
> > +		return ret;
> >  
> 
> Hello Naoya,
> 
> Thanks for your continued efforts.  However, I believe the race still
> exists.  Unless I am mistaken, it is possible that page is in the hugetlb
> allocation patch and racing with __get_hwpoison_page() as follows:

Hi Mike, thank you for the investigation.

> 
>     CPU0:                           CPU1:
> 
>                                     gather_surplus_pages()
>                                       page = alloc_surplus_huge_page()
> 				      	page = alloc_fresh_huge_page()
> 				      	  page = alloc_buddy_huge_page()
>     memory_failure_hugetlb()
>       get_hwpoison_page(page)
>         __get_hwpoison_page(page)
> 	  get_hwpoison_huge_page()
> 	    /* Note that PageHuge()
> 	       is false, so hugetlb
> 	       not set */
> 	  PageTransHuge(head) false

It seems that PageTransHuge returns true in this race condition because it
simply checks PG_head flag.  But anyway, get_page_unless_zero() is called for
a hugetlb in this race, which is problematic.

> 					  prep_new_huge_page(page)
> 					  /* Now PageHuge() becomes true */
>           get_page_unless_zero(page)
> 
> I am not sure if it is possible to handle this race in the memory error
> code. 

I think that __get_hwpoison_page() might not properly call get_page_unless_zero().
Looking at other callers of this function, most(*) of them are calling it in
the context where a given page is pinned and in-use, and get_page_unless_zero()
is used to detect the race with page freeing.  In the current version,
__get_hwpoison_page() calls get_page_unless_zero() without caring for such an
assumption, which might be the root cause of the race with hugetlb page allocation.

# (*) It seems to me that do_migrate_range() might have the same issue
# around get_page_unless_zero().

So I think of inserting the check to comply with the assumption of
get_hwpoison_huge_page() like below:

        ret = get_hwpoison_huge_page(head, &hugetlb);
        if (hugetlb)
                return ret;

        if (!PageLRU(head) && !__PageMovable(head))
                return 0;

        if (PageTransHuge(head)) {
                ...
        }

        if (get_page_unless_zero(head)) {
                ...
        }

        return 0;

The newly added checks should work to prevent the above race, then get_any_page()
should retry and grab the page properly as a stable hugetlb page.

> I can not think of a way to avoid potentially incrementing the
> ref count on a hugetlb page as it is being created.  There is nothing
> synchronizing this in the hugetlb code.
> 
> When Muchun first proposed a fix to the race, the idea was to catch the
> race in the hugetlb code.  Michal suggested that the memory error code
> be more careful in modifying ref counts.  I would wait a bit to see if
> someone has a good idea how this can be done.  We 'may' need to revisit
> the approach suggested by Muchun.

If the above approach is still broken, let's revisit Muchun's approach.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2021-05-20  7:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-18 23:12 [PATCH v5 0/2] hwpoison: fix race with hugetlb page allocation Naoya Horiguchi
2021-05-18 23:12 ` [PATCH v5 1/2] mm,hwpoison: " Naoya Horiguchi
2021-05-19 22:32   ` Mike Kravetz
2021-05-20  7:17     ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-05-25  7:36       ` Oscar Salvador
2021-05-25  8:07         ` HORIGUCHI NAOYA(堀口 直也)
2021-05-25  9:09           ` Oscar Salvador
2021-05-25 13:08             ` HORIGUCHI NAOYA(堀口 直也)
2021-05-18 23:12 ` [PATCH v5 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210520071717.GA2641190@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=songmuchun@bytedance.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.