stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlb: do not demote poisoned hugetlb pages
@ 2022-03-07 21:57 Mike Kravetz
  2022-03-08 13:43 ` Miaohe Lin
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Kravetz @ 2022-03-07 21:57 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: HORIGUCHI NAOYA, Miaohe Lin, Oscar Salvador, Michal Hocko,
	Andrew Morton, Mike Kravetz, stable

It is possible for poisoned hugetlb pages to reside on the free lists.
The huge page allocation routines which dequeue entries from the free
lists make a point of avoiding poisoned pages.  There is no such check
and avoidance in the demote code path.

If a hugetlb page on the is on a free list, poison will only be set in
the head page rather then the page with the actual error.  If such a
page is demoted, then the poison flag may follow the wrong page.  A page
without error could have poison set, and a page with poison could not
have the flag set.

Check for poison before attempting to demote a hugetlb page.  Also,
return -EBUSY to the caller if only poisoned pages are on the free list.

Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
---
 mm/hugetlb.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b34f50156f7e..f8ca7cca3c1a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3475,7 +3475,6 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
 {
 	int nr_nodes, node;
 	struct page *page;
-	int rc = 0;
 
 	lockdep_assert_held(&hugetlb_lock);
 
@@ -3486,15 +3485,19 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
 	}
 
 	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
-		if (!list_empty(&h->hugepage_freelists[node])) {
-			page = list_entry(h->hugepage_freelists[node].next,
-					struct page, lru);
-			rc = demote_free_huge_page(h, page);
-			break;
+		list_for_each_entry(page, &h->hugepage_freelists[node], lru) {
+			if (PageHWPoison(page))
+				continue;
+
+			return demote_free_huge_page(h, page);
 		}
 	}
 
-	return rc;
+	/*
+	 * Only way to get here is if all pages on free lists are poisoned.
+	 * Return -EBUSY so that caller will not retry.
+	 */
+	return -EBUSY;
 }
 
 #define HSTATE_ATTR_RO(_name) \
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] hugetlb: do not demote poisoned hugetlb pages
  2022-03-07 21:57 [PATCH] hugetlb: do not demote poisoned hugetlb pages Mike Kravetz
@ 2022-03-08 13:43 ` Miaohe Lin
  2022-03-16 22:31   ` Mike Kravetz
  0 siblings, 1 reply; 5+ messages in thread
From: Miaohe Lin @ 2022-03-08 13:43 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel
  Cc: HORIGUCHI NAOYA, Oscar Salvador, Michal Hocko, Andrew Morton, stable

On 2022/3/8 5:57, Mike Kravetz wrote:
> It is possible for poisoned hugetlb pages to reside on the free lists.
> The huge page allocation routines which dequeue entries from the free
> lists make a point of avoiding poisoned pages.  There is no such check
> and avoidance in the demote code path.
> 
> If a hugetlb page on the is on a free list, poison will only be set in
> the head page rather then the page with the actual error.  If such a
> page is demoted, then the poison flag may follow the wrong page.  A page
> without error could have poison set, and a page with poison could not
> have the flag set.
> 
> Check for poison before attempting to demote a hugetlb page.  Also,
> return -EBUSY to the caller if only poisoned pages are on the free list.
> 
> Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: <stable@vger.kernel.org>
> ---
>  mm/hugetlb.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index b34f50156f7e..f8ca7cca3c1a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3475,7 +3475,6 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>  {
>  	int nr_nodes, node;
>  	struct page *page;
> -	int rc = 0;
>  
>  	lockdep_assert_held(&hugetlb_lock);
>  
> @@ -3486,15 +3485,19 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>  	}
>  
>  	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
> -		if (!list_empty(&h->hugepage_freelists[node])) {
> -			page = list_entry(h->hugepage_freelists[node].next,
> -					struct page, lru);
> -			rc = demote_free_huge_page(h, page);
> -			break;
> +		list_for_each_entry(page, &h->hugepage_freelists[node], lru) {
> +			if (PageHWPoison(page))
> +				continue;
> +
> +			return demote_free_huge_page(h, page);

It seems this patch is not ideal. Memory failure can hit the hugetlb page anytime without
holding the hugetlb_lock. So the page might become HWPoison just after the check. But this
patch should have handled the common case. Many thanks for your work. :)

>  		}
>  	}
>  
> -	return rc;
> +	/*
> +	 * Only way to get here is if all pages on free lists are poisoned.
> +	 * Return -EBUSY so that caller will not retry.
> +	 */
> +	return -EBUSY;
>  }
>  
>  #define HSTATE_ATTR_RO(_name) \
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hugetlb: do not demote poisoned hugetlb pages
  2022-03-08 13:43 ` Miaohe Lin
@ 2022-03-16 22:31   ` Mike Kravetz
  2022-03-17  1:57     ` Miaohe Lin
  2022-03-18 11:31     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 2 replies; 5+ messages in thread
From: Mike Kravetz @ 2022-03-16 22:31 UTC (permalink / raw)
  To: Miaohe Lin, linux-mm, linux-kernel
  Cc: HORIGUCHI NAOYA, Oscar Salvador, Michal Hocko, Andrew Morton, stable

On 3/8/22 05:43, Miaohe Lin wrote:
> On 2022/3/8 5:57, Mike Kravetz wrote:
>> It is possible for poisoned hugetlb pages to reside on the free lists.
>> The huge page allocation routines which dequeue entries from the free
>> lists make a point of avoiding poisoned pages.  There is no such check
>> and avoidance in the demote code path.
>>
>> If a hugetlb page on the is on a free list, poison will only be set in
>> the head page rather then the page with the actual error.  If such a
>> page is demoted, then the poison flag may follow the wrong page.  A page
>> without error could have poison set, and a page with poison could not
>> have the flag set.
>>
>> Check for poison before attempting to demote a hugetlb page.  Also,
>> return -EBUSY to the caller if only poisoned pages are on the free list.
>>
>> Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>>  mm/hugetlb.c | 17 ++++++++++-------
>>  1 file changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index b34f50156f7e..f8ca7cca3c1a 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3475,7 +3475,6 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>>  {
>>  	int nr_nodes, node;
>>  	struct page *page;
>> -	int rc = 0;
>>  
>>  	lockdep_assert_held(&hugetlb_lock);
>>  
>> @@ -3486,15 +3485,19 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>>  	}
>>  
>>  	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
>> -		if (!list_empty(&h->hugepage_freelists[node])) {
>> -			page = list_entry(h->hugepage_freelists[node].next,
>> -					struct page, lru);
>> -			rc = demote_free_huge_page(h, page);
>> -			break;
>> +		list_for_each_entry(page, &h->hugepage_freelists[node], lru) {
>> +			if (PageHWPoison(page))
>> +				continue;
>> +
>> +			return demote_free_huge_page(h, page);
> 
> It seems this patch is not ideal. Memory failure can hit the hugetlb page anytime without
> holding the hugetlb_lock. So the page might become HWPoison just after the check. But this
> patch should have handled the common case. Many thanks for your work. :)
> 

Correct, this patch handles the common case of not demoting a hugetlb
page if HWPoison is set.  This is similar to code in the dequeue path
used when allocating a huge page for allocation use.

As you point out, work still needs to be done to better coordinate
memory failure with demote as well as huge page freeing.  As you know
Naoya is working on this now.  It is unclear if that work will be limited
to memory error handling code, or if greater coordination with hugetlb
code will be required.

Unless you have objections, I believe this patch should move forward and
be backported to stable trees.  If we determine that more coordination
between memory error and hugetlb code is needed, that can be added later. 
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hugetlb: do not demote poisoned hugetlb pages
  2022-03-16 22:31   ` Mike Kravetz
@ 2022-03-17  1:57     ` Miaohe Lin
  2022-03-18 11:31     ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 0 replies; 5+ messages in thread
From: Miaohe Lin @ 2022-03-17  1:57 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel
  Cc: HORIGUCHI NAOYA, Oscar Salvador, Michal Hocko, Andrew Morton, stable

On 2022/3/17 6:31, Mike Kravetz wrote:
> On 3/8/22 05:43, Miaohe Lin wrote:
>> On 2022/3/8 5:57, Mike Kravetz wrote:
>>> It is possible for poisoned hugetlb pages to reside on the free lists.
>>> The huge page allocation routines which dequeue entries from the free
>>> lists make a point of avoiding poisoned pages.  There is no such check
>>> and avoidance in the demote code path.
>>>
>>> If a hugetlb page on the is on a free list, poison will only be set in
>>> the head page rather then the page with the actual error.  If such a
>>> page is demoted, then the poison flag may follow the wrong page.  A page
>>> without error could have poison set, and a page with poison could not
>>> have the flag set.
>>>
>>> Check for poison before attempting to demote a hugetlb page.  Also,
>>> return -EBUSY to the caller if only poisoned pages are on the free list.
>>>
>>> Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
>>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>>  mm/hugetlb.c | 17 ++++++++++-------
>>>  1 file changed, 10 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index b34f50156f7e..f8ca7cca3c1a 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -3475,7 +3475,6 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>>>  {
>>>  	int nr_nodes, node;
>>>  	struct page *page;
>>> -	int rc = 0;
>>>  
>>>  	lockdep_assert_held(&hugetlb_lock);
>>>  
>>> @@ -3486,15 +3485,19 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>>>  	}
>>>  
>>>  	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
>>> -		if (!list_empty(&h->hugepage_freelists[node])) {
>>> -			page = list_entry(h->hugepage_freelists[node].next,
>>> -					struct page, lru);
>>> -			rc = demote_free_huge_page(h, page);
>>> -			break;
>>> +		list_for_each_entry(page, &h->hugepage_freelists[node], lru) {
>>> +			if (PageHWPoison(page))
>>> +				continue;
>>> +
>>> +			return demote_free_huge_page(h, page);
>>
>> It seems this patch is not ideal. Memory failure can hit the hugetlb page anytime without
>> holding the hugetlb_lock. So the page might become HWPoison just after the check. But this
>> patch should have handled the common case. Many thanks for your work. :)
>>
> 
> Correct, this patch handles the common case of not demoting a hugetlb
> page if HWPoison is set.  This is similar to code in the dequeue path
> used when allocating a huge page for allocation use.
> 
> As you point out, work still needs to be done to better coordinate
> memory failure with demote as well as huge page freeing.  As you know
> Naoya is working on this now.  It is unclear if that work will be limited
> to memory error handling code, or if greater coordination with hugetlb
> code will be required.
> 
> Unless you have objections, I believe this patch should move forward and
> be backported to stable trees.  If we determine that more coordination
> between memory error and hugetlb code is needed, that can be added later. 

I think this patch is good enough to move forward and be backported to stable trees.
Many thanks. :)

> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hugetlb: do not demote poisoned hugetlb pages
  2022-03-16 22:31   ` Mike Kravetz
  2022-03-17  1:57     ` Miaohe Lin
@ 2022-03-18 11:31     ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 0 replies; 5+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-18 11:31 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Miaohe Lin, linux-mm, linux-kernel, Oscar Salvador, Michal Hocko,
	Andrew Morton, stable

On Wed, Mar 16, 2022 at 03:31:57PM -0700, Mike Kravetz wrote:
> On 3/8/22 05:43, Miaohe Lin wrote:
> > On 2022/3/8 5:57, Mike Kravetz wrote:
> >> It is possible for poisoned hugetlb pages to reside on the free lists.
> >> The huge page allocation routines which dequeue entries from the free
> >> lists make a point of avoiding poisoned pages.  There is no such check
> >> and avoidance in the demote code path.
> >>
> >> If a hugetlb page on the is on a free list, poison will only be set in
> >> the head page rather then the page with the actual error.  If such a
> >> page is demoted, then the poison flag may follow the wrong page.  A page
> >> without error could have poison set, and a page with poison could not
> >> have the flag set.
> >>
> >> Check for poison before attempting to demote a hugetlb page.  Also,
> >> return -EBUSY to the caller if only poisoned pages are on the free list.
> >>
> >> Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> >> Cc: <stable@vger.kernel.org>
> >> ---
> >>  mm/hugetlb.c | 17 ++++++++++-------
> >>  1 file changed, 10 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index b34f50156f7e..f8ca7cca3c1a 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -3475,7 +3475,6 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
> >>  {
> >>  	int nr_nodes, node;
> >>  	struct page *page;
> >> -	int rc = 0;
> >>  
> >>  	lockdep_assert_held(&hugetlb_lock);
> >>  
> >> @@ -3486,15 +3485,19 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
> >>  	}
> >>  
> >>  	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
> >> -		if (!list_empty(&h->hugepage_freelists[node])) {
> >> -			page = list_entry(h->hugepage_freelists[node].next,
> >> -					struct page, lru);
> >> -			rc = demote_free_huge_page(h, page);
> >> -			break;
> >> +		list_for_each_entry(page, &h->hugepage_freelists[node], lru) {
> >> +			if (PageHWPoison(page))
> >> +				continue;
> >> +
> >> +			return demote_free_huge_page(h, page);
> > 
> > It seems this patch is not ideal. Memory failure can hit the hugetlb page anytime without
> > holding the hugetlb_lock. So the page might become HWPoison just after the check. But this
> > patch should have handled the common case. Many thanks for your work. :)
> > 
> 
> Correct, this patch handles the common case of not demoting a hugetlb
> page if HWPoison is set.  This is similar to code in the dequeue path
> used when allocating a huge page for allocation use.
> 
> As you point out, work still needs to be done to better coordinate
> memory failure with demote as well as huge page freeing.  As you know
> Naoya is working on this now.  It is unclear if that work will be limited
> to memory error handling code, or if greater coordination with hugetlb
> code will be required.

I submitted v5 patch today and it changes memory-failure.c mostly.
The changes on hugetlb.c is only about get_hwpoison_huge_page(),
where checking compound_head() is done in hugetlb_lock, so it never
touches core logic on hugetlb allocation/free/demotion.
So the suggested change should cooperate well enough with my patch.

> 
> Unless you have objections, I believe this patch should move forward and
> be backported to stable trees.  If we determine that more coordination
> between memory error and hugetlb code is needed, that can be added later. 

Sending to stable looks fine to me.

Thank you for the patch and helpingn on my thread.

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-18 11:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 21:57 [PATCH] hugetlb: do not demote poisoned hugetlb pages Mike Kravetz
2022-03-08 13:43 ` Miaohe Lin
2022-03-16 22:31   ` Mike Kravetz
2022-03-17  1:57     ` Miaohe Lin
2022-03-18 11:31     ` HORIGUCHI NAOYA(堀口 直也)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).