linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] hwpoison: fix race with compound page allocation
@ 2021-05-17  4:53 Naoya Horiguchi
  2021-05-17  4:54 ` [PATCH v4 1/2] mm,hwpoison: " Naoya Horiguchi
  2021-05-17  4:54 ` [PATCH v4 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi
  0 siblings, 2 replies; 9+ messages in thread
From: Naoya Horiguchi @ 2021-05-17  4:53 UTC (permalink / raw)
  To: Oscar Salvador, Muchun Song, linux-mm
  Cc: Andrew Morton, Mike Kravetz, Michal Hocko, Tony Luck,
	Naoya Horiguchi, linux-kernel

Hi,

I updated the patchset based on previous discussion [1],
I fixed wrong patch separation, remaining race issue, and build failure.

If you still find some issue in this series, please let me know.

Thanks,
Naoya Horiguchi

[1] https://lore.kernel.org/linux-mm/20210511151016.2310627-1-nao.horiguchi@gmail.com/T/#u
---
Summary:

Naoya Horiguchi (2):
      mm,hwpoison: fix race with compound page allocation
      mm,hwpoison: make get_hwpoison_page call get_any_page()

 mm/memory-failure.c | 122 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 72 insertions(+), 50 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-17  4:53 [PATCH v4 0/2] hwpoison: fix race with compound page allocation Naoya Horiguchi
@ 2021-05-17  4:54 ` Naoya Horiguchi
  2021-05-17 10:12   ` Oscar Salvador
  2021-05-17 20:11   ` Mike Kravetz
  2021-05-17  4:54 ` [PATCH v4 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi
  1 sibling, 2 replies; 9+ messages in thread
From: Naoya Horiguchi @ 2021-05-17  4:54 UTC (permalink / raw)
  To: Oscar Salvador, Muchun Song, linux-mm
  Cc: Andrew Morton, Mike Kravetz, Michal Hocko, Tony Luck,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

When hugetlb page fault (under overcommitting situation) and
memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:

    CPU0:                           CPU1:

                                    gather_surplus_pages()
                                      page = alloc_surplus_huge_page()
    memory_failure_hugetlb()
      get_hwpoison_page(page)
        __get_hwpoison_page(page)
          get_page_unless_zero(page)
                                      zero = put_page_testzero(page)
                                      VM_BUG_ON_PAGE(!zero, page)
                                      enqueue_huge_page(h, page)
      put_page(page)

__get_hwpoison_page() only checks page refcount before taking additional
one for memory error handling, which is wrong because there's a time
window where compound pages have non-zero refcount during initialization.

So makes __get_hwpoison_page() check page status a bit more for a few
types of compound pages. PageSlab() check is added because otherwise
"non anonymous thp" path is wrongly chosen.

Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Muchun Song <songmuchun@bytedance.com>
Cc: stable@vger.kernel.org # 5.12+
---
ChangeLog v4:
- all hugetlb related check in hugetlb_lock,
- fix build error with #ifdef CONFIG_HUGETLB_PAGE
---
 mm/memory-failure.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
index a3659619d293..761f982b6d7b 100644
--- v5.12/mm/memory-failure.c
+++ v5.12_patched/mm/memory-failure.c
@@ -1094,6 +1094,16 @@ static int page_action(struct page_state *ps, struct page *p,
 static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
+	int ret = 0;
+
+#ifdef CONFIG_HUGETLB_PAGE
+	spin_lock(&hugetlb_lock);
+	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
+		ret = get_page_unless_zero(head);
+	spin_unlock(&hugetlb_lock);
+	if (ret > 0)
+		return ret;
+#endif
 
 	if (!PageHuge(head) && PageTransHuge(head)) {
 		/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()
  2021-05-17  4:53 [PATCH v4 0/2] hwpoison: fix race with compound page allocation Naoya Horiguchi
  2021-05-17  4:54 ` [PATCH v4 1/2] mm,hwpoison: " Naoya Horiguchi
@ 2021-05-17  4:54 ` Naoya Horiguchi
  2021-05-17 12:59   ` Oscar Salvador
  1 sibling, 1 reply; 9+ messages in thread
From: Naoya Horiguchi @ 2021-05-17  4:54 UTC (permalink / raw)
  To: Oscar Salvador, Muchun Song, linux-mm
  Cc: Andrew Morton, Mike Kravetz, Michal Hocko, Tony Luck,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

__get_hwpoison_page() could fail to grab refcount by some race
condition, so it's helpful if we can handle it by retrying.
We already have retry logic, so make get_hwpoison_page() call
get_any_page() when called from memory_failure().

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v4:
- __get_hwpoison_page() returns -EBUSY (not in the previous patch),
- moved "if (ret == -EBUSY)" block,
- improved comment.
---
 mm/memory-failure.c | 114 ++++++++++++++++++++++++--------------------
 1 file changed, 63 insertions(+), 51 deletions(-)

diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
index 761f982b6d7b..85275972cc6f 100644
--- v5.12/mm/memory-failure.c
+++ v5.12_patched/mm/memory-failure.c
@@ -1084,13 +1084,6 @@ static int page_action(struct page_state *ps, struct page *p,
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
-/**
- * __get_hwpoison_page() - Get refcount for memory error handling:
- * @page:	raw error page (hit by memory error)
- *
- * Return: return 0 if failed to grab the refcount, otherwise true (some
- * non-zero value.)
- */
 static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
@@ -1098,7 +1091,9 @@ static int __get_hwpoison_page(struct page *page)
 
 #ifdef CONFIG_HUGETLB_PAGE
 	spin_lock(&hugetlb_lock);
-	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
+	if (!PageHuge(head))
+		ret = -EBUSY;
+	else if (HPageFreed(head) || HPageMigratable(head))
 		ret = get_page_unless_zero(head);
 	spin_unlock(&hugetlb_lock);
 	if (ret > 0)
@@ -1131,15 +1126,6 @@ static int __get_hwpoison_page(struct page *page)
 	return 0;
 }
 
-/*
- * Safely get reference count of an arbitrary page.
- *
- * Returns 0 for a free page, 1 for an in-use page,
- * -EIO for a page-type we cannot handle and -EBUSY if we raced with an
- * allocation.
- * We only incremented refcount in case the page was already in-use and it
- * is a known type we can handle.
- */
 static int get_any_page(struct page *p, unsigned long flags)
 {
 	int ret = 0, pass = 0;
@@ -1149,50 +1135,76 @@ static int get_any_page(struct page *p, unsigned long flags)
 		count_increased = true;
 
 try_again:
-	if (!count_increased && !__get_hwpoison_page(p)) {
-		if (page_count(p)) {
-			/* We raced with an allocation, retry. */
-			if (pass++ < 3)
-				goto try_again;
-			ret = -EBUSY;
-		} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
-			/* We raced with put_page, retry. */
+	if (!count_increased) {
+		ret = __get_hwpoison_page(p);
+		if (!ret) {
+			if (page_count(p)) {
+				/* We raced with an allocation, retry. */
+				if (pass++ < 3)
+					goto try_again;
+				ret = -EBUSY;
+			} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
+				/* We raced with put_page, retry. */
+				if (pass++ < 3)
+					goto try_again;
+				ret = -EIO;
+			}
+			goto out;
+		} else if (ret == -EBUSY) {
+			/* We raced with freeing huge page to buddy, retry. */
 			if (pass++ < 3)
 				goto try_again;
-			ret = -EIO;
 		}
+	}
+
+	if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
+		ret = 1;
 	} else {
-		if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
-			ret = 1;
-		} else {
-			/*
-			 * A page we cannot handle. Check whether we can turn
-			 * it into something we can handle.
-			 */
-			if (pass++ < 3) {
-				put_page(p);
-				shake_page(p, 1);
-				count_increased = false;
-				goto try_again;
-			}
+		/*
+		 * A page we cannot handle. Check whether we can turn
+		 * it into something we can handle.
+		 */
+		if (pass++ < 3) {
 			put_page(p);
-			ret = -EIO;
+			shake_page(p, 1);
+			count_increased = false;
+			goto try_again;
 		}
+		put_page(p);
+		ret = -EIO;
 	}
-
+out:
 	return ret;
 }
 
-static int get_hwpoison_page(struct page *p, unsigned long flags,
-			     enum mf_flags ctxt)
+/**
+ * get_hwpoison_page() - Get refcount for memory error handling
+ * @p:		Raw error page (hit by memory error)
+ * @flags:	Flags controlling behavior of error handling
+ *
+ * get_hwpoison_page() takes a page refcount of an error page to handle memory
+ * error on it, after checking that the error page is in a well-defined state
+ * (defined as a page-type we can successfully handle the memor error on it,
+ * such as LRU page and hugetlb page).
+ *
+ * Memory error handling could be triggered at any time on any type of page,
+ * so it's prone to race with typical memory management lifecycle (like
+ * allocation and free).  So to avoid such races, get_hwpoison_page() takes
+ * extra care for the error page's state (as done in __get_hwpoison_page()),
+ * and has some retry logic in get_any_page().
+ *
+ * Return: 0 on failure,
+ *         1 on success for in-use pages in a well-defined state,
+ *         -EIO for pages on which we can not handle memory errors,
+ *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
+ *         operations like allocation and free.
+ */
+static int get_hwpoison_page(struct page *p, unsigned long flags)
 {
 	int ret;
 
 	zone_pcp_disable(page_zone(p));
-	if (ctxt == MF_SOFT_OFFLINE)
-		ret = get_any_page(p, flags);
-	else
-		ret = __get_hwpoison_page(p);
+	ret = get_any_page(p, flags);
 	zone_pcp_enable(page_zone(p));
 
 	return ret;
@@ -1381,7 +1393,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 
 	num_poisoned_pages_inc();
 
-	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags, 0)) {
+	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags)) {
 		/*
 		 * Check "filter hit" and "race with other subpage."
 		 */
@@ -1605,7 +1617,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 * In fact it's dangerous to directly bump up page count from 0,
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
-	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags, 0)) {
+	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags)) {
 		if (is_free_buddy_page(p)) {
 			if (take_page_off_buddy(p)) {
 				page_ref_inc(p);
@@ -1897,7 +1909,7 @@ int unpoison_memory(unsigned long pfn)
 		return 0;
 	}
 
-	if (!get_hwpoison_page(p, flags, 0)) {
+	if (!get_hwpoison_page(p, flags)) {
 		if (TestClearPageHWPoison(p))
 			num_poisoned_pages_dec();
 		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
@@ -2113,7 +2125,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 
 retry:
 	get_online_mems();
-	ret = get_hwpoison_page(page, flags, MF_SOFT_OFFLINE);
+	ret = get_hwpoison_page(page, flags);
 	put_online_mems();
 
 	if (ret > 0) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-17  4:54 ` [PATCH v4 1/2] mm,hwpoison: " Naoya Horiguchi
@ 2021-05-17 10:12   ` Oscar Salvador
  2021-05-17 11:34     ` Naoya Horiguchi
  2021-05-17 20:11   ` Mike Kravetz
  1 sibling, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2021-05-17 10:12 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Muchun Song, linux-mm, Andrew Morton, Mike Kravetz, Michal Hocko,
	Tony Luck, Naoya Horiguchi, linux-kernel

On Mon, May 17, 2021 at 01:54:00PM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When hugetlb page fault (under overcommitting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> 
>     CPU0:                           CPU1:
> 
>                                     gather_surplus_pages()
>                                       page = alloc_surplus_huge_page()
>     memory_failure_hugetlb()
>       get_hwpoison_page(page)
>         __get_hwpoison_page(page)
>           get_page_unless_zero(page)
>                                       zero = put_page_testzero(page)
>                                       VM_BUG_ON_PAGE(!zero, page)
>                                       enqueue_huge_page(h, page)
>       put_page(page)
> 
> __get_hwpoison_page() only checks page refcount before taking additional
                                  ^^ the?                      ^^ an
> one for memory error handling, which is wrong because there's a time
> window where compound pages have non-zero refcount during initialization.
> 
> So makes __get_hwpoison_page() check page status a bit more for a few
     ^^ make
> types of compound pages. PageSlab() check is added because otherwise
> "non anonymous thp" path is wrongly chosen.

This is no longer true with this patch, is it? What happened here?

>  static int __get_hwpoison_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
> +	int ret = 0;
> +
> +#ifdef CONFIG_HUGETLB_PAGE
> +	spin_lock(&hugetlb_lock);
> +	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> +		ret = get_page_unless_zero(head);
> +	spin_unlock(&hugetlb_lock);
> +	if (ret > 0)
> +		return ret;
> +#endif

I am kind of fine with this, but I wonder whether it makes sense to hide this
details into helper (with an empty stub for non-hugetlb pages)?

>  	if (!PageHuge(head) && PageTransHuge(head)) {
This !PageHuge could go?


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-17 10:12   ` Oscar Salvador
@ 2021-05-17 11:34     ` Naoya Horiguchi
  0 siblings, 0 replies; 9+ messages in thread
From: Naoya Horiguchi @ 2021-05-17 11:34 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Muchun Song, linux-mm, Andrew Morton, Mike Kravetz, Michal Hocko,
	Tony Luck, Naoya Horiguchi, linux-kernel

On Mon, May 17, 2021 at 12:12:50PM +0200, Oscar Salvador wrote:
> On Mon, May 17, 2021 at 01:54:00PM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When hugetlb page fault (under overcommitting situation) and
> > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> > 
> >     CPU0:                           CPU1:
> > 
> >                                     gather_surplus_pages()
> >                                       page = alloc_surplus_huge_page()
> >     memory_failure_hugetlb()
> >       get_hwpoison_page(page)
> >         __get_hwpoison_page(page)
> >           get_page_unless_zero(page)
> >                                       zero = put_page_testzero(page)
> >                                       VM_BUG_ON_PAGE(!zero, page)
> >                                       enqueue_huge_page(h, page)
> >       put_page(page)
> > 
> > __get_hwpoison_page() only checks page refcount before taking additional
>                                   ^^ the?                      ^^ an
> > one for memory error handling, which is wrong because there's a time
> > window where compound pages have non-zero refcount during initialization.
> > 
> > So makes __get_hwpoison_page() check page status a bit more for a few
>      ^^ make
> > types of compound pages. PageSlab() check is added because otherwise
> > "non anonymous thp" path is wrongly chosen.
> 
> This is no longer true with this patch, is it? What happened here?

Sorry, we need remove this. I dropped the changes for PageSlab
because I'm still not sure about what to do (I'd like to do it
in a separate work).

> 
> >  static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> > +	int ret = 0;
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +	spin_lock(&hugetlb_lock);
> > +	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> > +		ret = get_page_unless_zero(head);
> > +	spin_unlock(&hugetlb_lock);
> > +	if (ret > 0)
> > +		return ret;
> > +#endif
> 
> I am kind of fine with this, but I wonder whether it makes sense to hide this
> details into helper (with an empty stub for non-hugetlb pages)?

OK, I will do this.

> 
> >  	if (!PageHuge(head) && PageTransHuge(head)) {
> This !PageHuge could go?

I'll update this, too.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()
  2021-05-17  4:54 ` [PATCH v4 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi
@ 2021-05-17 12:59   ` Oscar Salvador
  2021-05-17 15:13     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2021-05-17 12:59 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Muchun Song, linux-mm, Andrew Morton, Mike Kravetz, Michal Hocko,
	Tony Luck, Naoya Horiguchi, linux-kernel

On Mon, May 17, 2021 at 01:54:01PM +0900, Naoya Horiguchi wrote:
>  static int __get_hwpoison_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
> @@ -1098,7 +1091,9 @@ static int __get_hwpoison_page(struct page *page)
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  	spin_lock(&hugetlb_lock);
> -	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> +	if (!PageHuge(head))
> +		ret = -EBUSY;

Unless I'm missing something, we will be returning -EBUSY for any non-hugetlb
page, which is not right.
I think what you want is to return -EBUSY in case the page is a hugetlb page
but we cannot grab a refcount because the page is in a floating state, meaning
!HPageFreed && !HPageMigratable.

Right?

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()
  2021-05-17 12:59   ` Oscar Salvador
@ 2021-05-17 15:13     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 9+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-05-17 15:13 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Naoya Horiguchi, Muchun Song, linux-mm, Andrew Morton,
	Mike Kravetz, Michal Hocko, Tony Luck, linux-kernel

On Mon, May 17, 2021 at 02:59:03PM +0200, Oscar Salvador wrote:
> On Mon, May 17, 2021 at 01:54:01PM +0900, Naoya Horiguchi wrote:
> >  static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> > @@ -1098,7 +1091,9 @@ static int __get_hwpoison_page(struct page *page)
> >  
> >  #ifdef CONFIG_HUGETLB_PAGE
> >  	spin_lock(&hugetlb_lock);
> > -	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> > +	if (!PageHuge(head))
> > +		ret = -EBUSY;
> 
> Unless I'm missing something, we will be returning -EBUSY for any non-hugetlb
> page, which is not right.

(sigh, hard to say why I wrote this, really sorry)

> I think what you want is to return -EBUSY in case the page is a hugetlb page
> but we cannot grab a refcount because the page is in a floating state, meaning
> !HPageFreed && !HPageMigratable.
> 
> Right?

Totally right, so the below should what's wanted.

        spin_lock(&hugetlb_lock);
        if (PageHuge(head))
                if (!HPageFreed(head) && !HPageMigratable(head))
                        ret = -EBUSY;
                else
                        ret = get_page_unless_zero(head);
        spin_unlock(&hugetlb_lock);

- Naoya

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-17  4:54 ` [PATCH v4 1/2] mm,hwpoison: " Naoya Horiguchi
  2021-05-17 10:12   ` Oscar Salvador
@ 2021-05-17 20:11   ` Mike Kravetz
  2021-05-18  5:24     ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Kravetz @ 2021-05-17 20:11 UTC (permalink / raw)
  To: Naoya Horiguchi, Oscar Salvador, Muchun Song, linux-mm
  Cc: Andrew Morton, Michal Hocko, Tony Luck, Naoya Horiguchi, linux-kernel

On 5/16/21 9:54 PM, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When hugetlb page fault (under overcommitting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> 
>     CPU0:                           CPU1:
> 
>                                     gather_surplus_pages()
>                                       page = alloc_surplus_huge_page()
>     memory_failure_hugetlb()
>       get_hwpoison_page(page)
>         __get_hwpoison_page(page)
>           get_page_unless_zero(page)
>                                       zero = put_page_testzero(page)
>                                       VM_BUG_ON_PAGE(!zero, page)
>                                       enqueue_huge_page(h, page)
>       put_page(page)
> 
> __get_hwpoison_page() only checks page refcount before taking additional
> one for memory error handling, which is wrong because there's a time
> window where compound pages have non-zero refcount during initialization.
> 
> So makes __get_hwpoison_page() check page status a bit more for a few
> types of compound pages. PageSlab() check is added because otherwise
> "non anonymous thp" path is wrongly chosen.
> 
> Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Reported-by: Muchun Song <songmuchun@bytedance.com>
> Cc: stable@vger.kernel.org # 5.12+
> ---
> ChangeLog v4:
> - all hugetlb related check in hugetlb_lock,
> - fix build error with #ifdef CONFIG_HUGETLB_PAGE
> ---
>  mm/memory-failure.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
> index a3659619d293..761f982b6d7b 100644
> --- v5.12/mm/memory-failure.c
> +++ v5.12_patched/mm/memory-failure.c
> @@ -1094,6 +1094,16 @@ static int page_action(struct page_state *ps, struct page *p,
>  static int __get_hwpoison_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
> +	int ret = 0;
> +
> +#ifdef CONFIG_HUGETLB_PAGE
> +	spin_lock(&hugetlb_lock);
> +	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> +		ret = get_page_unless_zero(head);
> +	spin_unlock(&hugetlb_lock);
> +	if (ret > 0)
> +		return ret;
> +#endif

I must be missing something.

The above code makes sure the page is not in one of these transitive
hugetlb states as mentioned in the commit message.  It only attempts
to take a reference on the page if it is not in one of these states.

However, if it is in such a transitive state (!HPageFreed(head) &&
!HPageMigratable(head)) we will fall through and execute the code:

	if (get_page_unless_zero(head)) {
		if (head == compound_head(page))
			return 1;

So, it seems like we will always do a get_page_unless_zero() for
PageHuge() pages?

Also, due to commit db71ef79b59b ("hugetlb: make free_huge_page irq
safe") you need to make sure interrupts are disabled when taking
hugetlb_lock.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-17 20:11   ` Mike Kravetz
@ 2021-05-18  5:24     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 9+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-05-18  5:24 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Naoya Horiguchi, Oscar Salvador, Muchun Song, linux-mm,
	Andrew Morton, Michal Hocko, Tony Luck, linux-kernel

On Mon, May 17, 2021 at 01:11:25PM -0700, Mike Kravetz wrote:
> On 5/16/21 9:54 PM, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When hugetlb page fault (under overcommitting situation) and
> > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> > 
> >     CPU0:                           CPU1:
> > 
> >                                     gather_surplus_pages()
> >                                       page = alloc_surplus_huge_page()
> >     memory_failure_hugetlb()
> >       get_hwpoison_page(page)
> >         __get_hwpoison_page(page)
> >           get_page_unless_zero(page)
> >                                       zero = put_page_testzero(page)
> >                                       VM_BUG_ON_PAGE(!zero, page)
> >                                       enqueue_huge_page(h, page)
> >       put_page(page)
> > 
> > __get_hwpoison_page() only checks page refcount before taking additional
> > one for memory error handling, which is wrong because there's a time
> > window where compound pages have non-zero refcount during initialization.
> > 
> > So makes __get_hwpoison_page() check page status a bit more for a few
> > types of compound pages. PageSlab() check is added because otherwise
> > "non anonymous thp" path is wrongly chosen.
> > 
> > Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Reported-by: Muchun Song <songmuchun@bytedance.com>
> > Cc: stable@vger.kernel.org # 5.12+
> > ---
> > ChangeLog v4:
> > - all hugetlb related check in hugetlb_lock,
> > - fix build error with #ifdef CONFIG_HUGETLB_PAGE
> > ---
> >  mm/memory-failure.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
> > index a3659619d293..761f982b6d7b 100644
> > --- v5.12/mm/memory-failure.c
> > +++ v5.12_patched/mm/memory-failure.c
> > @@ -1094,6 +1094,16 @@ static int page_action(struct page_state *ps, struct page *p,
> >  static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> > +	int ret = 0;
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +	spin_lock(&hugetlb_lock);
> > +	if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> > +		ret = get_page_unless_zero(head);
> > +	spin_unlock(&hugetlb_lock);
> > +	if (ret > 0)
> > +		return ret;
> > +#endif
> 
> I must be missing something.
> 
> The above code makes sure the page is not in one of these transitive
> hugetlb states as mentioned in the commit message.  It only attempts
> to take a reference on the page if it is not in one of these states.
> 
> However, if it is in such a transitive state (!HPageFreed(head) &&
> !HPageMigratable(head)) we will fall through and execute the code:
> 
> 	if (get_page_unless_zero(head)) {
> 		if (head == compound_head(page))
> 			return 1;
> 
> So, it seems like we will always do a get_page_unless_zero() for
> PageHuge() pages?

Right, no need to fall through in such a case.

> 
> Also, due to commit db71ef79b59b ("hugetlb: make free_huge_page irq
> safe") you need to make sure interrupts are disabled when taking
> hugetlb_lock.

Thanks, I'll rebase to latest mainline and comply with new semantics.

- Naoya

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-05-18  5:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-17  4:53 [PATCH v4 0/2] hwpoison: fix race with compound page allocation Naoya Horiguchi
2021-05-17  4:54 ` [PATCH v4 1/2] mm,hwpoison: " Naoya Horiguchi
2021-05-17 10:12   ` Oscar Salvador
2021-05-17 11:34     ` Naoya Horiguchi
2021-05-17 20:11   ` Mike Kravetz
2021-05-18  5:24     ` HORIGUCHI NAOYA(堀口 直也)
2021-05-17  4:54 ` [PATCH v4 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi
2021-05-17 12:59   ` Oscar Salvador
2021-05-17 15:13     ` HORIGUCHI NAOYA(堀口 直也)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).