linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] hwpoison: fix race with compound page allocation
@ 2021-05-11 15:10 Naoya Horiguchi
  2021-05-11 15:10 ` [PATCH v3 1/2] mm,hwpoison: " Naoya Horiguchi
  2021-05-11 15:10 ` [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi
  0 siblings, 2 replies; 9+ messages in thread
From: Naoya Horiguchi @ 2021-05-11 15:10 UTC (permalink / raw)
  To: Oscar Salvador, Muchun Song, linux-mm
  Cc: Andrew Morton, Mike Kravetz, Michal Hocko, Tony Luck,
	Naoya Horiguchi, linux-kernel

Hi everyone,

I updated the patch for the issue reported by Muchun Song in [1].  I
separated into two patches (to reduce the size of core fix for stable):
patch 1/2 focuses on preventing VM_BUG_ON_PAGE() in the reported race.
Patch 2/2 includes some code improvement like calling retry code, revising
the function comment, and some refactoring.

Any comment and testing would be appreciated.

Thanks,
Naoya Horiguchi

[1] https://lore.kernel.org/linux-mm/20210421060259.67554-1-songmuchun@bytedance.com/T/
---
Summary:

Naoya Horiguchi (2):
      mm,hwpoison: fix race with compound page allocation
      mm,hwpoison: make get_hwpoison_page call get_any_page()

 mm/memory-failure.c | 169 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 96 insertions(+), 73 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-11 15:10 [PATCH v3 0/2] hwpoison: fix race with compound page allocation Naoya Horiguchi
@ 2021-05-11 15:10 ` Naoya Horiguchi
  2021-05-12  8:33   ` Oscar Salvador
  2021-05-12 12:19   ` Michal Hocko
  2021-05-11 15:10 ` [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi
  1 sibling, 2 replies; 9+ messages in thread
From: Naoya Horiguchi @ 2021-05-11 15:10 UTC (permalink / raw)
  To: Oscar Salvador, Muchun Song, linux-mm
  Cc: Andrew Morton, Mike Kravetz, Michal Hocko, Tony Luck,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

When hugetlb page fault (under overcommiting situation) and
memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:

    CPU0:                           CPU1:

                                    gather_surplus_pages()
                                      page = alloc_surplus_huge_page()
    memory_failure_hugetlb()
      get_hwpoison_page(page)
        __get_hwpoison_page(page)
          get_page_unless_zero(page)
                                      zero = put_page_testzero(page)
                                      VM_BUG_ON_PAGE(!zero, page)
                                      enqueue_huge_page(h, page)
      put_page(page)

__get_hwpoison_page() only checks page refcount before taking additional
one for memory error handling, which is wrong because there's time
windows where compound pages have non-zero refcount during initialization.

So makes __get_hwpoison_page() check page status a bit more for a few
types of compound pages. PageSlab() check is added because otherwise
"non anonymous thp" path is wrongly chosen.

Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Muchun Song <songmuchun@bytedance.com>
Cc: stable@vger.kernel.org # 5.12+
---
changelog v3:
- recheck PageHuge after holding hugetlb_lock,
---
 mm/memory-failure.c | 55 ++++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
index a3659619d293..02668b24e512 100644
--- v5.12/mm/memory-failure.c
+++ v5.12_patched/mm/memory-failure.c
@@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 
-	if (!PageHuge(head) && PageTransHuge(head)) {
-		/*
-		 * Non anonymous thp exists only in allocation/free time. We
-		 * can't handle such a case correctly, so let's give it up.
-		 * This should be better than triggering BUG_ON when kernel
-		 * tries to touch the "partially handled" page.
-		 */
-		if (!PageAnon(head)) {
-			pr_err("Memory failure: %#lx: non anonymous thp\n",
-				page_to_pfn(page));
-			return 0;
+	if (PageCompound(page)) {
+		if (PageSlab(page)) {
+			return get_page_unless_zero(page);
+		} else if (PageHuge(head)) {
+			int ret = 0;
+
+			spin_lock(&hugetlb_lock);
+			if (!PageHuge(head))
+				ret = -EBUSY;
+			else if (HPageFreed(head) || HPageMigratable(head))
+				ret = get_page_unless_zero(head);
+			spin_unlock(&hugetlb_lock);
+			return ret;
+		} else if (PageTransHuge(head)) {
+			/*
+			 * Non anonymous thp exists only in allocation/free time. We
+			 * can't handle such a case correctly, so let's give it up.
+			 * This should be better than triggering BUG_ON when kernel
+			 * tries to touch the "partially handled" page.
+			 */
+			if (!PageAnon(head)) {
+				pr_err("Memory failure: %#lx: non anonymous thp\n",
+				       page_to_pfn(page));
+				return 0;
+			}
+			if (get_page_unless_zero(head)) {
+				if (head == compound_head(page))
+					return 1;
+				pr_info("Memory failure: %#lx cannot catch tail\n",
+					page_to_pfn(page));
+				put_page(head);
+			}
 		}
+		return 0;
 	}
 
-	if (get_page_unless_zero(head)) {
-		if (head == compound_head(page))
-			return 1;
-
-		pr_info("Memory failure: %#lx cannot catch tail\n",
-			page_to_pfn(page));
-		put_page(head);
-	}
-
-	return 0;
+	return get_page_unless_zero(page);
 }
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()
  2021-05-11 15:10 [PATCH v3 0/2] hwpoison: fix race with compound page allocation Naoya Horiguchi
  2021-05-11 15:10 ` [PATCH v3 1/2] mm,hwpoison: " Naoya Horiguchi
@ 2021-05-11 15:10 ` Naoya Horiguchi
  2021-05-12  8:55   ` Oscar Salvador
  1 sibling, 1 reply; 9+ messages in thread
From: Naoya Horiguchi @ 2021-05-11 15:10 UTC (permalink / raw)
  To: Oscar Salvador, Muchun Song, linux-mm
  Cc: Andrew Morton, Mike Kravetz, Michal Hocko, Tony Luck,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Now __get_hwpoison_page() could return -EBUSY in a race condition,
so it's helpful to handle it by retrying.  We already have retry
logic, so make get_hwpoison_page() call get_any_page() when called
from memory_failure().

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 114 ++++++++++++++++++++++++--------------------
 1 file changed, 62 insertions(+), 52 deletions(-)

diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
index 02668b24e512..d6f470e9ee05 100644
--- v5.12/mm/memory-failure.c
+++ v5.12_patched/mm/memory-failure.c
@@ -1084,13 +1084,6 @@ static int page_action(struct page_state *ps, struct page *p,
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
-/**
- * __get_hwpoison_page() - Get refcount for memory error handling:
- * @page:	raw error page (hit by memory error)
- *
- * Return: return 0 if failed to grab the refcount, otherwise true (some
- * non-zero value.)
- */
 static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
@@ -1134,15 +1127,6 @@ static int __get_hwpoison_page(struct page *page)
 	return get_page_unless_zero(page);
 }
 
-/*
- * Safely get reference count of an arbitrary page.
- *
- * Returns 0 for a free page, 1 for an in-use page,
- * -EIO for a page-type we cannot handle and -EBUSY if we raced with an
- * allocation.
- * We only incremented refcount in case the page was already in-use and it
- * is a known type we can handle.
- */
 static int get_any_page(struct page *p, unsigned long flags)
 {
 	int ret = 0, pass = 0;
@@ -1152,50 +1136,76 @@ static int get_any_page(struct page *p, unsigned long flags)
 		count_increased = true;
 
 try_again:
-	if (!count_increased && !__get_hwpoison_page(p)) {
-		if (page_count(p)) {
-			/* We raced with an allocation, retry. */
-			if (pass++ < 3)
-				goto try_again;
-			ret = -EBUSY;
-		} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
-			/* We raced with put_page, retry. */
-			if (pass++ < 3)
-				goto try_again;
-			ret = -EIO;
+	if (!count_increased) {
+		ret = __get_hwpoison_page(p);
+		if (!ret) {
+			if (page_count(p)) {
+				/* We raced with an allocation, retry. */
+				if (pass++ < 3)
+					goto try_again;
+				ret = -EBUSY;
+			} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
+				/* We raced with put_page, retry. */
+				if (pass++ < 3)
+					goto try_again;
+				ret = -EIO;
+			}
+			goto out;
 		}
+	}
+
+	if (ret == -EBUSY) {
+		/* We raced with freeing huge page to buddy, retry. */
+		if (pass++ < 3)
+			goto try_again;
+	} else if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
+		ret = 1;
 	} else {
-		if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
-			ret = 1;
-		} else {
-			/*
-			 * A page we cannot handle. Check whether we can turn
-			 * it into something we can handle.
-			 */
-			if (pass++ < 3) {
-				put_page(p);
-				shake_page(p, 1);
-				count_increased = false;
-				goto try_again;
-			}
+		/*
+		 * A page we cannot handle. Check whether we can turn
+		 * it into something we can handle.
+		 */
+		if (pass++ < 3) {
 			put_page(p);
-			ret = -EIO;
+			shake_page(p, 1);
+			count_increased = false;
+			goto try_again;
 		}
+		put_page(p);
+		ret = -EIO;
 	}
-
+out:
 	return ret;
 }
 
-static int get_hwpoison_page(struct page *p, unsigned long flags,
-			     enum mf_flags ctxt)
+/**
+ * get_hwpoison_page() - Get refcount for memory error handling
+ * @p:		Raw error page (hit by memory error)
+ * @flags:	Flags controlling behavior of error handling
+ *
+ * get_hwpoison_page() takes a page refcount of an error page to handle memory
+ * error on it, after checking that the error page is in a well-defined state
+ * (defined as a page-type we can successfully handle the memor error on it,
+ * such as LRU page and hugetlb page).
+ *
+ * Memory error handling could be triggered at any time on any type of page,
+ * so it's prone to race with typical memory management lifecycle (like
+ * allocation and free).  So to avoid such races, get_hwpoison_page() takes
+ * extra care for the error page's state (as done in __get_hwpoison_page()),
+ * and has some retry logic in get_any_page().
+ *
+ * Return: 0 for buddy free pages,
+ *         1 on success for in-use pages in a well-defined state,
+ *         -EIO for pages on which we can not handle memory errors,
+ *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
+ *         operations like allocation and free.
+ */
+static int get_hwpoison_page(struct page *p, unsigned long flags)
 {
 	int ret;
 
 	zone_pcp_disable(page_zone(p));
-	if (ctxt == MF_SOFT_OFFLINE)
-		ret = get_any_page(p, flags);
-	else
-		ret = __get_hwpoison_page(p);
+	ret = get_any_page(p, flags);
 	zone_pcp_enable(page_zone(p));
 
 	return ret;
@@ -1384,7 +1394,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 
 	num_poisoned_pages_inc();
 
-	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags, 0)) {
+	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags)) {
 		/*
 		 * Check "filter hit" and "race with other subpage."
 		 */
@@ -1608,7 +1618,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 * In fact it's dangerous to directly bump up page count from 0,
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
-	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags, 0)) {
+	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags)) {
 		if (is_free_buddy_page(p)) {
 			if (take_page_off_buddy(p)) {
 				page_ref_inc(p);
@@ -1900,7 +1910,7 @@ int unpoison_memory(unsigned long pfn)
 		return 0;
 	}
 
-	if (!get_hwpoison_page(p, flags, 0)) {
+	if (!get_hwpoison_page(p, flags)) {
 		if (TestClearPageHWPoison(p))
 			num_poisoned_pages_dec();
 		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
@@ -2116,7 +2126,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 
 retry:
 	get_online_mems();
-	ret = get_hwpoison_page(page, flags, MF_SOFT_OFFLINE);
+	ret = get_hwpoison_page(page, flags);
 	put_online_mems();
 
 	if (ret > 0) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-11 15:10 ` [PATCH v3 1/2] mm,hwpoison: " Naoya Horiguchi
@ 2021-05-12  8:33   ` Oscar Salvador
  2021-05-13  0:00     ` HORIGUCHI NAOYA(堀口 直也)
  2021-05-12 12:19   ` Michal Hocko
  1 sibling, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2021-05-12  8:33 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Muchun Song, linux-mm, Andrew Morton, Mike Kravetz, Michal Hocko,
	Tony Luck, Naoya Horiguchi, linux-kernel

On Wed, May 12, 2021 at 12:10:15AM +0900, Naoya Horiguchi wrote:
> @@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
>  
> -	if (!PageHuge(head) && PageTransHuge(head)) {
> -		/*
> -		 * Non anonymous thp exists only in allocation/free time. We
> -		 * can't handle such a case correctly, so let's give it up.
> -		 * This should be better than triggering BUG_ON when kernel
> -		 * tries to touch the "partially handled" page.
> -		 */
> -		if (!PageAnon(head)) {
> -			pr_err("Memory failure: %#lx: non anonymous thp\n",
> -				page_to_pfn(page));
> -			return 0;
> +	if (PageCompound(page)) {
> +		if (PageSlab(page)) {
> +			return get_page_unless_zero(page);
> +		} else if (PageHuge(head)) {
> +			int ret = 0;
> +
> +			spin_lock(&hugetlb_lock);
> +			if (!PageHuge(head))
> +				ret = -EBUSY;
> +			else if (HPageFreed(head) || HPageMigratable(head))
> +				ret = get_page_unless_zero(head);
> +			spin_unlock(&hugetlb_lock);
> +			return ret;

Uhm, I am having a hard time with that -EBUSY.
At this stage, we expect __get_hwpoison_page() to either return true or false,
depending on whether it could grab a page's refcount or not. Returning -EBUSY
here seems wrong (plus it is inconsistent with the comment above the function).
It might be useful for the latter patch, I do not know as I yet have to check
that one, but if anything, let us stay consistent here in this one.
So, if hugetlb vanished under us, let us return "we could not grab the
refcount". Does it make sense?

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()
  2021-05-11 15:10 ` [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi
@ 2021-05-12  8:55   ` Oscar Salvador
  2021-05-13  0:03     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2021-05-12  8:55 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Muchun Song, linux-mm, Andrew Morton, Mike Kravetz, Michal Hocko,
	Tony Luck, Naoya Horiguchi, linux-kernel

On Wed, May 12, 2021 at 12:10:16AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Now __get_hwpoison_page() could return -EBUSY in a race condition,
> so it's helpful to handle it by retrying.  We already have retry
> logic, so make get_hwpoison_page() call get_any_page() when called
> from memory_failure().

As I stated in your previous patch, I think you should start returning -EBUSY
from this patch onwards.

>  static int get_any_page(struct page *p, unsigned long flags)
>  {
>  	int ret = 0, pass = 0;
> @@ -1152,50 +1136,76 @@ static int get_any_page(struct page *p, unsigned long flags)
>  		count_increased = true;
>  
>  try_again:
> -	if (!count_increased && !__get_hwpoison_page(p)) {
> -		if (page_count(p)) {
> -			/* We raced with an allocation, retry. */
> -			if (pass++ < 3)
> -				goto try_again;
> -			ret = -EBUSY;
> -		} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> -			/* We raced with put_page, retry. */
> -			if (pass++ < 3)
> -				goto try_again;
> -			ret = -EIO;
> +	if (!count_increased) {
> +		ret = __get_hwpoison_page(p);
> +		if (!ret) {
> +			if (page_count(p)) {
> +				/* We raced with an allocation, retry. */
> +				if (pass++ < 3)
> +					goto try_again;
> +				ret = -EBUSY;
> +			} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> +				/* We raced with put_page, retry. */
> +				if (pass++ < 3)
> +					goto try_again;
> +				ret = -EIO;
> +			}
> +			goto out;
>  		}
> +	}

I think this can be improved.

We cannot have -EBUSY unless we come from __get_hwpoison_page() (!count_increased),
so I think a much more natural approach would be to stuff the hunk below in there,
and then place the other stuff in an else, so something like:

        if (!count_increased) {
                ret = __get_hwpoison_page(p);
                if (!ret) {
                        if (page_count(p)) {
                                /* We raced with an allocation, retry. */
                                if (pass++ < 3)
                                        goto try_again;
                                ret = -EBUSY;
                        } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
                                /* We raced with put_page, retry. */
                                if (pass++ < 3)
                                        goto try_again;
                                ret = -EIO;
                        }
                        goto out;
                } else if (ret == -EBUSY) {
			/* We raced with freeing huge page to buddy, retry. */
			if (pass++ < 3)
				goto try_again;
		}
        } else {
		/* We do already have a refcount for this page, see if we can
		 * handle it.
		 */
		if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
			ret = 1;
		} else {
			/* A page we cannot handle. Check...
		}
	}

Other than that, looks good to me.

> +
> +	if (ret == -EBUSY) {
> +		/* We raced with freeing huge page to buddy, retry. */
> +		if (pass++ < 3)
> +			goto try_again;

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-11 15:10 ` [PATCH v3 1/2] mm,hwpoison: " Naoya Horiguchi
  2021-05-12  8:33   ` Oscar Salvador
@ 2021-05-12 12:19   ` Michal Hocko
  2021-05-12 23:51     ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2021-05-12 12:19 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Oscar Salvador, Muchun Song, linux-mm, Andrew Morton,
	Mike Kravetz, Tony Luck, Naoya Horiguchi, linux-kernel

On Wed 12-05-21 00:10:15, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When hugetlb page fault (under overcommiting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> 
>     CPU0:                           CPU1:
> 
>                                     gather_surplus_pages()
>                                       page = alloc_surplus_huge_page()
>     memory_failure_hugetlb()
>       get_hwpoison_page(page)
>         __get_hwpoison_page(page)
>           get_page_unless_zero(page)
>                                       zero = put_page_testzero(page)
>                                       VM_BUG_ON_PAGE(!zero, page)
>                                       enqueue_huge_page(h, page)
>       put_page(page)
> 
> __get_hwpoison_page() only checks page refcount before taking additional
> one for memory error handling, which is wrong because there's time
> windows where compound pages have non-zero refcount during initialization.
> 
> So makes __get_hwpoison_page() check page status a bit more for a few
> types of compound pages. PageSlab() check is added because otherwise
> "non anonymous thp" path is wrongly chosen.

This should really describe the fix in more details. E.g.
[...]
> @@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
>  
> -	if (!PageHuge(head) && PageTransHuge(head)) {
> -		/*
> -		 * Non anonymous thp exists only in allocation/free time. We
> -		 * can't handle such a case correctly, so let's give it up.
> -		 * This should be better than triggering BUG_ON when kernel
> -		 * tries to touch the "partially handled" page.
> -		 */
> -		if (!PageAnon(head)) {
> -			pr_err("Memory failure: %#lx: non anonymous thp\n",
> -				page_to_pfn(page));
> -			return 0;
> +	if (PageCompound(page)) {

So you do rely on PageCompound to be true. Which is prone to races as
well. All you need is to check before prep_compound_page and run into
get_page_unless_zero (down in this function) before hugetlb reaches
put_page_testzero. Or do I miss something that would prevent from that?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-12 12:19   ` Michal Hocko
@ 2021-05-12 23:51     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 9+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-05-12 23:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, Oscar Salvador, Muchun Song, linux-mm,
	Andrew Morton, Mike Kravetz, Tony Luck, linux-kernel

On Wed, May 12, 2021 at 02:19:32PM +0200, Michal Hocko wrote:
> On Wed 12-05-21 00:10:15, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When hugetlb page fault (under overcommiting situation) and
> > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> > 
> >     CPU0:                           CPU1:
> > 
> >                                     gather_surplus_pages()
> >                                       page = alloc_surplus_huge_page()
> >     memory_failure_hugetlb()
> >       get_hwpoison_page(page)
> >         __get_hwpoison_page(page)
> >           get_page_unless_zero(page)
> >                                       zero = put_page_testzero(page)
> >                                       VM_BUG_ON_PAGE(!zero, page)
> >                                       enqueue_huge_page(h, page)
> >       put_page(page)
> > 
> > __get_hwpoison_page() only checks page refcount before taking additional
> > one for memory error handling, which is wrong because there's time
> > windows where compound pages have non-zero refcount during initialization.
> > 
> > So makes __get_hwpoison_page() check page status a bit more for a few
> > types of compound pages. PageSlab() check is added because otherwise
> > "non anonymous thp" path is wrongly chosen.
> 
> This should really describe the fix in more details. E.g.
> [...]
> > @@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> >  
> > -	if (!PageHuge(head) && PageTransHuge(head)) {
> > -		/*
> > -		 * Non anonymous thp exists only in allocation/free time. We
> > -		 * can't handle such a case correctly, so let's give it up.
> > -		 * This should be better than triggering BUG_ON when kernel
> > -		 * tries to touch the "partially handled" page.
> > -		 */
> > -		if (!PageAnon(head)) {
> > -			pr_err("Memory failure: %#lx: non anonymous thp\n",
> > -				page_to_pfn(page));
> > -			return 0;
> > +	if (PageCompound(page)) {
> 
> So you do rely on PageCompound to be true. Which is prone to races as
> well. All you need is to check before prep_compound_page and run into
> get_page_unless_zero (down in this function) before hugetlb reaches
> put_page_testzero. Or do I miss something that would prevent from that?

No, you're right, the race can still happen (I only considered the case when
a compound page is already allocated, that was insufficient...).  Any check
outside locking seems harmful, so does the checking like below make more
sense?

    static int __get_hwpoison_page(struct page *page)
    {
            struct page *head = compound_head(page);
            int ret = 0;

    #ifdef CONFIG_HUGETLB_PAGE   // this #ifdef is needed for hugetlb_lock
            spin_lock(&hugetlb_lock);
            if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
                    ret = get_page_unless_zero(head);
            spin_unlock(&hugetlb_lock);
            if (ret > 0)
                    return ret;
    #endif
    
            // other types of compound page.
            ...

            return get_page_unless_zero(page);
    }

- Naoya

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation
  2021-05-12  8:33   ` Oscar Salvador
@ 2021-05-13  0:00     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 9+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-05-13  0:00 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Naoya Horiguchi, Muchun Song, linux-mm, Andrew Morton,
	Mike Kravetz, Michal Hocko, Tony Luck, linux-kernel

On Wed, May 12, 2021 at 10:33:24AM +0200, Oscar Salvador wrote:
> On Wed, May 12, 2021 at 12:10:15AM +0900, Naoya Horiguchi wrote:
> > @@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
> >  {
> >  	struct page *head = compound_head(page);
> >  
> > -	if (!PageHuge(head) && PageTransHuge(head)) {
> > -		/*
> > -		 * Non anonymous thp exists only in allocation/free time. We
> > -		 * can't handle such a case correctly, so let's give it up.
> > -		 * This should be better than triggering BUG_ON when kernel
> > -		 * tries to touch the "partially handled" page.
> > -		 */
> > -		if (!PageAnon(head)) {
> > -			pr_err("Memory failure: %#lx: non anonymous thp\n",
> > -				page_to_pfn(page));
> > -			return 0;
> > +	if (PageCompound(page)) {
> > +		if (PageSlab(page)) {
> > +			return get_page_unless_zero(page);
> > +		} else if (PageHuge(head)) {
> > +			int ret = 0;
> > +
> > +			spin_lock(&hugetlb_lock);
> > +			if (!PageHuge(head))
> > +				ret = -EBUSY;
> > +			else if (HPageFreed(head) || HPageMigratable(head))
> > +				ret = get_page_unless_zero(head);
> > +			spin_unlock(&hugetlb_lock);
> > +			return ret;
> 
> Uhm, I am having a hard time with that -EBUSY.
> At this stage, we expect __get_hwpoison_page() to either return true or false,
> depending on whether it could grab a page's refcount or not. Returning -EBUSY
> here seems wrong (plus it is inconsistent with the comment above the function).
> It might be useful for the latter patch, I do not know as I yet have to check
> that one, but if anything, let us stay consistent here in this one.
> So, if hugetlb vanished under us, let us return "we could not grab the
> refcount". Does it make sense?

Yes, you are totally right.  I failed to properly split the patch.
-EBUSY is non-zero, so it's considererd as "successfully pinned", which is
not true.  I should've set ret to 0.

- Naoya

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()
  2021-05-12  8:55   ` Oscar Salvador
@ 2021-05-13  0:03     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 9+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-05-13  0:03 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Naoya Horiguchi, Muchun Song, linux-mm, Andrew Morton,
	Mike Kravetz, Michal Hocko, Tony Luck, linux-kernel

On Wed, May 12, 2021 at 10:55:22AM +0200, Oscar Salvador wrote:
> On Wed, May 12, 2021 at 12:10:16AM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > Now __get_hwpoison_page() could return -EBUSY in a race condition,
> > so it's helpful to handle it by retrying.  We already have retry
> > logic, so make get_hwpoison_page() call get_any_page() when called
> > from memory_failure().
> 
> As I stated in your previous patch, I think you should start returning -EBUSY
> from this patch onwards.
> 
> >  static int get_any_page(struct page *p, unsigned long flags)
> >  {
> >  	int ret = 0, pass = 0;
> > @@ -1152,50 +1136,76 @@ static int get_any_page(struct page *p, unsigned long flags)
> >  		count_increased = true;
> >  
> >  try_again:
> > -	if (!count_increased && !__get_hwpoison_page(p)) {
> > -		if (page_count(p)) {
> > -			/* We raced with an allocation, retry. */
> > -			if (pass++ < 3)
> > -				goto try_again;
> > -			ret = -EBUSY;
> > -		} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> > -			/* We raced with put_page, retry. */
> > -			if (pass++ < 3)
> > -				goto try_again;
> > -			ret = -EIO;
> > +	if (!count_increased) {
> > +		ret = __get_hwpoison_page(p);
> > +		if (!ret) {
> > +			if (page_count(p)) {
> > +				/* We raced with an allocation, retry. */
> > +				if (pass++ < 3)
> > +					goto try_again;
> > +				ret = -EBUSY;
> > +			} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> > +				/* We raced with put_page, retry. */
> > +				if (pass++ < 3)
> > +					goto try_again;
> > +				ret = -EIO;
> > +			}
> > +			goto out;
> >  		}
> > +	}
> 
> I think this can be improved.
> 
> We cannot have -EBUSY unless we come from __get_hwpoison_page() (!count_increased),
> so I think a much more natural approach would be to stuff the hunk below in there,
> and then place the other stuff in an else, so something like:
> 
>         if (!count_increased) {
>                 ret = __get_hwpoison_page(p);
>                 if (!ret) {
>                         if (page_count(p)) {
>                                 /* We raced with an allocation, retry. */
>                                 if (pass++ < 3)
>                                         goto try_again;
>                                 ret = -EBUSY;
>                         } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
>                                 /* We raced with put_page, retry. */
>                                 if (pass++ < 3)
>                                         goto try_again;
>                                 ret = -EIO;
>                         }
>                         goto out;
>                 } else if (ret == -EBUSY) {
> 			/* We raced with freeing huge page to buddy, retry. */
> 			if (pass++ < 3)
> 				goto try_again;
> 		}

Moving "if (ret == -EBUSY)" block to here makes sense to me. Thank you.

>         } else {

In the original logic, if __get_hwpoison_page() returns 1, we fall into the
"if (PageHuge(p) || PageLRU(p) || __PageMovable(p)" check.  I guess that this
"else" seems not necessary?

> 		/* We do already have a refcount for this page, see if we can
> 		 * handle it.
> 		 */
> 		if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
> 			ret = 1;
> 		} else {
> 			/* A page we cannot handle. Check...
> 		}
> 	}

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-05-13  0:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11 15:10 [PATCH v3 0/2] hwpoison: fix race with compound page allocation Naoya Horiguchi
2021-05-11 15:10 ` [PATCH v3 1/2] mm,hwpoison: " Naoya Horiguchi
2021-05-12  8:33   ` Oscar Salvador
2021-05-13  0:00     ` HORIGUCHI NAOYA(堀口 直也)
2021-05-12 12:19   ` Michal Hocko
2021-05-12 23:51     ` HORIGUCHI NAOYA(堀口 直也)
2021-05-11 15:10 ` [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page() Naoya Horiguchi
2021-05-12  8:55   ` Oscar Salvador
2021-05-13  0:03     ` HORIGUCHI NAOYA(堀口 直也)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).