[PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
@ 2022-05-11 15:19 Naoya Horiguchi
  2022-05-11 18:35 ` Mike Kravetz
  0 siblings, 1 reply; 10+ messages in thread
From: Naoya Horiguchi @ 2022-05-11 15:19 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, zhenwei pi, Miaohe Lin, Mike Kravetz,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

If memory_failure() fails to grab page refcount on a hugetlb page
because it's busy, it returns without setting PG_hwpoison on it.
This not only loses a chance of error containment, but breaks the rule
that action_result() should be called only when memory_failure() do
any of handling work (even if that's just setting PG_hwpoison).
This inconsistency could harm code maintainability.

So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.

Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 include/linux/mm.h  | 1 +
 mm/memory-failure.c | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d446e834a3e5..04de0c3e4f9f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3187,6 +3187,7 @@ enum mf_flags {
 	MF_MUST_KILL = 1 << 2,
 	MF_SOFT_OFFLINE = 1 << 3,
 	MF_UNPOISON = 1 << 4,
+	MF_NO_RETRY = 1 << 5,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6a28d020a4da..e3269b991016 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 			count_increased = true;
 	} else {
 		ret = -EBUSY;
-		goto out;
+		if (!(flags & MF_NO_RETRY))
+			goto out;
 	}
 
 	if (TestSetPageHWPoison(head)) {
@@ -1556,7 +1557,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	struct page *p = pfn_to_page(pfn);
 	struct page *head;
 	unsigned long page_flags;
-	bool retry = true;
 
 	*hugetlb = 1;
 retry:
@@ -1572,8 +1572,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 		}
 		return res;
 	} else if (res == -EBUSY) {
-		if (retry) {
-			retry = false;
+		if (!(flags & MF_NO_RETRY)) {
+			flags |= MF_NO_RETRY;
 			goto retry;
 		}
 		action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-05-11 15:19 [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
@ 2022-05-11 18:35 ` Mike Kravetz
  2022-05-12  2:54   ` Miaohe Lin
  2022-05-12  4:32   ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 2 replies; 10+ messages in thread
From: Mike Kravetz @ 2022-05-11 18:35 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, zhenwei pi, Miaohe Lin, Naoya Horiguchi, linux-kernel

On 5/11/22 08:19, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> If memory_failure() fails to grab page refcount on a hugetlb page
> because it's busy, it returns without setting PG_hwpoison on it.
> This not only loses a chance of error containment, but breaks the rule
> that action_result() should be called only when memory_failure() do
> any of handling work (even if that's just setting PG_hwpoison).
> This inconsistency could harm code maintainability.
> 
> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
> 
> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  include/linux/mm.h  | 1 +
>  mm/memory-failure.c | 8 ++++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d446e834a3e5..04de0c3e4f9f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3187,6 +3187,7 @@ enum mf_flags {
>  	MF_MUST_KILL = 1 << 2,
>  	MF_SOFT_OFFLINE = 1 << 3,
>  	MF_UNPOISON = 1 << 4,
> +	MF_NO_RETRY = 1 << 5,
>  };
>  extern int memory_failure(unsigned long pfn, int flags);
>  extern void memory_failure_queue(unsigned long pfn, int flags);
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 6a28d020a4da..e3269b991016 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>  			count_increased = true;
>  	} else {
>  		ret = -EBUSY;
> -		goto out;
> +		if (!(flags & MF_NO_RETRY))
> +			goto out;
>  	}

Hi Naoya,

We are in the else block because !HPageFreed() and !HPageMigratable().
IIUC, this likely means the page is isolated.  One common reason for isolation
is migration.  So, the page could be isolated and on a list for migration.

I took a quick look at the hugetlb migration code and did not see any checks
for PageHWPoison after a hugetlb page is isolated.  I could have missed
something?  If there are no checks, we will read the PageHWPoison page
in kernel mode while copying to the migration target.

Is this an issue?  Is is something we need to be concerned with?  Memory
errors can happen at any time, and gracefully handling them is best effort.
-- 
Mike Kravetz

>  
>  	if (TestSetPageHWPoison(head)) {
> @@ -1556,7 +1557,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  	struct page *p = pfn_to_page(pfn);
>  	struct page *head;
>  	unsigned long page_flags;
> -	bool retry = true;
>  
>  	*hugetlb = 1;
>  retry:
> @@ -1572,8 +1572,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  		}
>  		return res;
>  	} else if (res == -EBUSY) {
> -		if (retry) {
> -			retry = false;
> +		if (!(flags & MF_NO_RETRY)) {
> +			flags |= MF_NO_RETRY;
>  			goto retry;
>  		}
>  		action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-05-11 18:35 ` Mike Kravetz
@ 2022-05-12  2:54   ` Miaohe Lin
  2022-05-12  3:06     ` Mike Kravetz
  2022-05-12  4:46     ` HORIGUCHI NAOYA(堀口　直也)
  2022-05-12  4:32   ` HORIGUCHI NAOYA(堀口　直也)
  1 sibling, 2 replies; 10+ messages in thread
From: Miaohe Lin @ 2022-05-12  2:54 UTC (permalink / raw)
  To: Mike Kravetz, Naoya Horiguchi
  Cc: Andrew Morton, zhenwei pi, Naoya Horiguchi, linux-kernel, Linux-MM

On 2022/5/12 2:35, Mike Kravetz wrote:
> On 5/11/22 08:19, Naoya Horiguchi wrote:
>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>
>> If memory_failure() fails to grab page refcount on a hugetlb page
>> because it's busy, it returns without setting PG_hwpoison on it.
>> This not only loses a chance of error containment, but breaks the rule
>> that action_result() should be called only when memory_failure() do
>> any of handling work (even if that's just setting PG_hwpoison).
>> This inconsistency could harm code maintainability.
>>
>> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.

I'm sorry but where is hugetlb_set_page_hwpoison() defined and used ? I can't find it.

>>
>> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> ---
>>  include/linux/mm.h  | 1 +
>>  mm/memory-failure.c | 8 ++++----
>>  2 files changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index d446e834a3e5..04de0c3e4f9f 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -3187,6 +3187,7 @@ enum mf_flags {
>>  	MF_MUST_KILL = 1 << 2,
>>  	MF_SOFT_OFFLINE = 1 << 3,
>>  	MF_UNPOISON = 1 << 4,
>> +	MF_NO_RETRY = 1 << 5,
>>  };
>>  extern int memory_failure(unsigned long pfn, int flags);
>>  extern void memory_failure_queue(unsigned long pfn, int flags);
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 6a28d020a4da..e3269b991016 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>  			count_increased = true;
>>  	} else {
>>  		ret = -EBUSY;
>> -		goto out;
>> +		if (!(flags & MF_NO_RETRY))
>> +			goto out;
>>  	}
> 
> Hi Naoya,
> 
> We are in the else block because !HPageFreed() and !HPageMigratable().
> IIUC, this likely means the page is isolated.  One common reason for isolation
> is migration.  So, the page could be isolated and on a list for migration.
> 
> I took a quick look at the hugetlb migration code and did not see any checks
> for PageHWPoison after a hugetlb page is isolated.  I could have missed
> something?  If there are no checks, we will read the PageHWPoison page
> in kernel mode while copying to the migration target.
> 
> Is this an issue?  Is is something we need to be concerned with?  Memory
> errors can happen at any time, and gracefully handling them is best effort.

It seems HWPoison hugetlb page will still be accessed before this patch. Can we do a
get_page_unless_zero first here to ensure that hugetlb page migration should fail due
to this extra page reference and thus not access the page content? If hugetlb page is
already freezed, corrupted memory will still be consumed though. :(

Thanks!

> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-05-12  2:54   ` Miaohe Lin
@ 2022-05-12  3:06     ` Mike Kravetz
  2022-05-12  4:50       ` HORIGUCHI NAOYA(堀口　直也)
  2022-05-12  4:46     ` HORIGUCHI NAOYA(堀口　直也)
  1 sibling, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2022-05-12  3:06 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi
  Cc: Andrew Morton, zhenwei pi, Naoya Horiguchi, linux-kernel, Linux-MM

On 5/11/22 19:54, Miaohe Lin wrote:
> On 2022/5/12 2:35, Mike Kravetz wrote:
>> On 5/11/22 08:19, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> If memory_failure() fails to grab page refcount on a hugetlb page
>>> because it's busy, it returns without setting PG_hwpoison on it.
>>> This not only loses a chance of error containment, but breaks the rule
>>> that action_result() should be called only when memory_failure() do
>>> any of handling work (even if that's just setting PG_hwpoison).
>>> This inconsistency could harm code maintainability.
>>>
>>> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
> 
> I'm sorry but where is hugetlb_set_page_hwpoison() defined and used ? I can't find it.
> 
>>>
>>> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> ---
>>>  include/linux/mm.h  | 1 +
>>>  mm/memory-failure.c | 8 ++++----
>>>  2 files changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index d446e834a3e5..04de0c3e4f9f 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -3187,6 +3187,7 @@ enum mf_flags {
>>>  	MF_MUST_KILL = 1 << 2,
>>>  	MF_SOFT_OFFLINE = 1 << 3,
>>>  	MF_UNPOISON = 1 << 4,
>>> +	MF_NO_RETRY = 1 << 5,
>>>  };
>>>  extern int memory_failure(unsigned long pfn, int flags);
>>>  extern void memory_failure_queue(unsigned long pfn, int flags);
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 6a28d020a4da..e3269b991016 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>>  			count_increased = true;
>>>  	} else {
>>>  		ret = -EBUSY;
>>> -		goto out;
>>> +		if (!(flags & MF_NO_RETRY))
>>> +			goto out;
>>>  	}
>>
>> Hi Naoya,
>>
>> We are in the else block because !HPageFreed() and !HPageMigratable().
>> IIUC, this likely means the page is isolated.  One common reason for isolation
>> is migration.  So, the page could be isolated and on a list for migration.
>>
>> I took a quick look at the hugetlb migration code and did not see any checks
>> for PageHWPoison after a hugetlb page is isolated.  I could have missed
>> something?  If there are no checks, we will read the PageHWPoison page
>> in kernel mode while copying to the migration target.
>>
>> Is this an issue?  Is is something we need to be concerned with?  Memory
>> errors can happen at any time, and gracefully handling them is best effort.
> 
> It seems HWPoison hugetlb page will still be accessed before this patch. Can we do a
> get_page_unless_zero first here to ensure that hugetlb page migration should fail due
> to this extra page reference and thus not access the page content? If hugetlb page is
> already freezed, corrupted memory will still be consumed though. :(

Right.  This potential issue was not introduced with this patch.
Also, I am not sure but it might be an issue with non-hugetlb pages as well.

As mentioned, memory error handling is a best effort.  Since errors can
happen at any time, we can not handle all cases.  Or, you could spend the
rest of your life trying. :)

The question is, should we worry about errors that happen when a page is
isolated for migration?

-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-05-11 18:35 ` Mike Kravetz
  2022-05-12  2:54   ` Miaohe Lin
@ 2022-05-12  4:32   ` HORIGUCHI NAOYA(堀口　直也)
  2022-05-12 11:18     ` Miaohe Lin
  1 sibling, 1 reply; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-05-12  4:32 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, zhenwei pi, Miaohe Lin,
	linux-kernel

On Wed, May 11, 2022 at 11:35:55AM -0700, Mike Kravetz wrote:
> On 5/11/22 08:19, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > If memory_failure() fails to grab page refcount on a hugetlb page
> > because it's busy, it returns without setting PG_hwpoison on it.
> > This not only loses a chance of error containment, but breaks the rule
> > that action_result() should be called only when memory_failure() do
> > any of handling work (even if that's just setting PG_hwpoison).
> > This inconsistency could harm code maintainability.
> > 
> > So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
> > 
> > Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> >  include/linux/mm.h  | 1 +
> >  mm/memory-failure.c | 8 ++++----
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index d446e834a3e5..04de0c3e4f9f 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3187,6 +3187,7 @@ enum mf_flags {
> >  	MF_MUST_KILL = 1 << 2,
> >  	MF_SOFT_OFFLINE = 1 << 3,
> >  	MF_UNPOISON = 1 << 4,
> > +	MF_NO_RETRY = 1 << 5,
> >  };
> >  extern int memory_failure(unsigned long pfn, int flags);
> >  extern void memory_failure_queue(unsigned long pfn, int flags);
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 6a28d020a4da..e3269b991016 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >  			count_increased = true;
> >  	} else {
> >  		ret = -EBUSY;
> > -		goto out;
> > +		if (!(flags & MF_NO_RETRY))
> > +			goto out;
> >  	}
> 
> Hi Naoya,
> 
> We are in the else block because !HPageFreed() and !HPageMigratable().
> IIUC, this likely means the page is isolated.  One common reason for isolation
> is migration.  So, the page could be isolated and on a list for migration.

Yes, and I also detected this issue by testing race between hugepage allocation
and memory_failure(). 

> 
> I took a quick look at the hugetlb migration code and did not see any checks
> for PageHWPoison after a hugetlb page is isolated.  I could have missed
> something?  If there are no checks, we will read the PageHWPoison page
> in kernel mode while copying to the migration target.

Yes, that could happen.  This patch does not affect ongoing hugepage migration.
But after the migration source hugepage is freed, the PG_hwpoison should work
to prevent reusing.

> 
> Is this an issue?  Is is something we need to be concerned with?  Memory
> errors can happen at any time, and gracefully handling them is best effort.

Right, so doing nothing for this case could be OK if doing something causes
some issues or makes code too complicated.  The motivation of this patch is
that now I think memory_failure() should do something (at least setting
PG_hwpoison) unless the page is already hwpoisoned or rejected by
hwpoison_filter(), because of the effect after free as mentioned above.

This is also expected in other case too. For example, slab is a unhandlable
type of page, but we do set PG_hwpoison.  This flag should not affect any of
ongoing slab-related process, but that's OK because it becomes effective
after the slab page is freed.

So this patch is intended to align to the behavior.  Allowing hugepage
migration to do something good using PG_hwpoison seems to me an unsolved
separate issue.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-05-12  2:54   ` Miaohe Lin
  2022-05-12  3:06     ` Mike Kravetz
@ 2022-05-12  4:46     ` HORIGUCHI NAOYA(堀口　直也)
  1 sibling, 0 replies; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-05-12  4:46 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Mike Kravetz, Naoya Horiguchi, Andrew Morton, zhenwei pi,
	linux-kernel, Linux-MM

On Thu, May 12, 2022 at 10:54:05AM +0800, Miaohe Lin wrote:
> On 2022/5/12 2:35, Mike Kravetz wrote:
> > On 5/11/22 08:19, Naoya Horiguchi wrote:
> >> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>
> >> If memory_failure() fails to grab page refcount on a hugetlb page
> >> because it's busy, it returns without setting PG_hwpoison on it.
> >> This not only loses a chance of error containment, but breaks the rule
> >> that action_result() should be called only when memory_failure() do
> >> any of handling work (even if that's just setting PG_hwpoison).
> >> This inconsistency could harm code maintainability.
> >>
> >> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
> 
> I'm sorry but where is hugetlb_set_page_hwpoison() defined and used ? I can't find it.

Sorry, this depends on the unmerged patch
https://lore.kernel.org/linux-mm/20220427042841.678351-2-naoya.horiguchi@linux.dev/
, so should come after that.  I'll do both into a single patchset next.

...
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 6a28d020a4da..e3269b991016 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >>  			count_increased = true;
> >>  	} else {
> >>  		ret = -EBUSY;
> >> -		goto out;
> >> +		if (!(flags & MF_NO_RETRY))
> >> +			goto out;
> >>  	}
> > 
> > Hi Naoya,
> > 
> > We are in the else block because !HPageFreed() and !HPageMigratable().
> > IIUC, this likely means the page is isolated.  One common reason for isolation
> > is migration.  So, the page could be isolated and on a list for migration.
> > 
> > I took a quick look at the hugetlb migration code and did not see any checks
> > for PageHWPoison after a hugetlb page is isolated.  I could have missed
> > something?  If there are no checks, we will read the PageHWPoison page
> > in kernel mode while copying to the migration target.
> > 
> > Is this an issue?  Is is something we need to be concerned with?  Memory
> > errors can happen at any time, and gracefully handling them is best effort.
> 
> It seems HWPoison hugetlb page will still be accessed before this patch. Can we do a
> get_page_unless_zero first here to ensure that hugetlb page migration should fail due
> to this extra page reference and thus not access the page content? If hugetlb page is
> already freezed, corrupted memory will still be consumed though. :(

Right, I have no idea about this ...

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-05-12  3:06     ` Mike Kravetz
@ 2022-05-12  4:50       ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 0 replies; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-05-12  4:50 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, zhenwei pi,
	linux-kernel, Linux-MM

On Wed, May 11, 2022 at 08:06:47PM -0700, Mike Kravetz wrote:
> On 5/11/22 19:54, Miaohe Lin wrote:
> > On 2022/5/12 2:35, Mike Kravetz wrote:
> >> On 5/11/22 08:19, Naoya Horiguchi wrote:
> >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>>
> >>> If memory_failure() fails to grab page refcount on a hugetlb page
> >>> because it's busy, it returns without setting PG_hwpoison on it.
> >>> This not only loses a chance of error containment, but breaks the rule
> >>> that action_result() should be called only when memory_failure() do
> >>> any of handling work (even if that's just setting PG_hwpoison).
> >>> This inconsistency could harm code maintainability.
> >>>
> >>> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
> > 
> > I'm sorry but where is hugetlb_set_page_hwpoison() defined and used ? I can't find it.
> > 
> >>>
> >>> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
> >>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>> ---
> >>>  include/linux/mm.h  | 1 +
> >>>  mm/memory-failure.c | 8 ++++----
> >>>  2 files changed, 5 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>> index d446e834a3e5..04de0c3e4f9f 100644
> >>> --- a/include/linux/mm.h
> >>> +++ b/include/linux/mm.h
> >>> @@ -3187,6 +3187,7 @@ enum mf_flags {
> >>>  	MF_MUST_KILL = 1 << 2,
> >>>  	MF_SOFT_OFFLINE = 1 << 3,
> >>>  	MF_UNPOISON = 1 << 4,
> >>> +	MF_NO_RETRY = 1 << 5,
> >>>  };
> >>>  extern int memory_failure(unsigned long pfn, int flags);
> >>>  extern void memory_failure_queue(unsigned long pfn, int flags);
> >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>> index 6a28d020a4da..e3269b991016 100644
> >>> --- a/mm/memory-failure.c
> >>> +++ b/mm/memory-failure.c
> >>> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >>>  			count_increased = true;
> >>>  	} else {
> >>>  		ret = -EBUSY;
> >>> -		goto out;
> >>> +		if (!(flags & MF_NO_RETRY))
> >>> +			goto out;
> >>>  	}
> >>
> >> Hi Naoya,
> >>
> >> We are in the else block because !HPageFreed() and !HPageMigratable().
> >> IIUC, this likely means the page is isolated.  One common reason for isolation
> >> is migration.  So, the page could be isolated and on a list for migration.
> >>
> >> I took a quick look at the hugetlb migration code and did not see any checks
> >> for PageHWPoison after a hugetlb page is isolated.  I could have missed
> >> something?  If there are no checks, we will read the PageHWPoison page
> >> in kernel mode while copying to the migration target.
> >>
> >> Is this an issue?  Is is something we need to be concerned with?  Memory
> >> errors can happen at any time, and gracefully handling them is best effort.
> > 
> > It seems HWPoison hugetlb page will still be accessed before this patch. Can we do a
> > get_page_unless_zero first here to ensure that hugetlb page migration should fail due
> > to this extra page reference and thus not access the page content? If hugetlb page is
> > already freezed, corrupted memory will still be consumed though. :(
> 
> Right.  This potential issue was not introduced with this patch.
> Also, I am not sure but it might be an issue with non-hugetlb pages as well.
> 
> As mentioned, memory error handling is a best effort.  Since errors can
> happen at any time, we can not handle all cases.  Or, you could spend the
> rest of your life trying. :)
> 
> The question is, should we worry about errors that happen when a page is
> isolated for migration?

I think yes, but that's not to save current migration event, but to
save us from future memory errors caused by the broken page.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-05-12  4:32   ` HORIGUCHI NAOYA(堀口　直也)
@ 2022-05-12 11:18     ` Miaohe Lin
  2022-06-02  6:12       ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 1 reply; 10+ messages in thread
From: Miaohe Lin @ 2022-05-12 11:18 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也), Mike Kravetz
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, zhenwei pi, linux-kernel

On 2022/5/12 12:32, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, May 11, 2022 at 11:35:55AM -0700, Mike Kravetz wrote:
>> On 5/11/22 08:19, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> If memory_failure() fails to grab page refcount on a hugetlb page
>>> because it's busy, it returns without setting PG_hwpoison on it.
>>> This not only loses a chance of error containment, but breaks the rule
>>> that action_result() should be called only when memory_failure() do
>>> any of handling work (even if that's just setting PG_hwpoison).
>>> This inconsistency could harm code maintainability.
>>>
>>> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
>>>
>>> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> ---
>>>  include/linux/mm.h  | 1 +
>>>  mm/memory-failure.c | 8 ++++----
>>>  2 files changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index d446e834a3e5..04de0c3e4f9f 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -3187,6 +3187,7 @@ enum mf_flags {
>>>  	MF_MUST_KILL = 1 << 2,
>>>  	MF_SOFT_OFFLINE = 1 << 3,
>>>  	MF_UNPOISON = 1 << 4,
>>> +	MF_NO_RETRY = 1 << 5,
>>>  };
>>>  extern int memory_failure(unsigned long pfn, int flags);
>>>  extern void memory_failure_queue(unsigned long pfn, int flags);
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 6a28d020a4da..e3269b991016 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>>  			count_increased = true;
>>>  	} else {
>>>  		ret = -EBUSY;
>>> -		goto out;
>>> +		if (!(flags & MF_NO_RETRY))
>>> +			goto out;
>>>  	}
>>
>> Hi Naoya,
>>
>> We are in the else block because !HPageFreed() and !HPageMigratable().
>> IIUC, this likely means the page is isolated.  One common reason for isolation
>> is migration.  So, the page could be isolated and on a list for migration.
> 
> Yes, and I also detected this issue by testing race between hugepage allocation
> and memory_failure(). 
> 
>>
>> I took a quick look at the hugetlb migration code and did not see any checks
>> for PageHWPoison after a hugetlb page is isolated.  I could have missed
>> something?  If there are no checks, we will read the PageHWPoison page
>> in kernel mode while copying to the migration target.
> 
> Yes, that could happen.  This patch does not affect ongoing hugepage migration.
> But after the migration source hugepage is freed, the PG_hwpoison should work
> to prevent reusing.
> 
>>
>> Is this an issue?  Is is something we need to be concerned with?  Memory
>> errors can happen at any time, and gracefully handling them is best effort.
> 
> Right, so doing nothing for this case could be OK if doing something causes
> some issues or makes code too complicated.  The motivation of this patch is
> that now I think memory_failure() should do something (at least setting
> PG_hwpoison) unless the page is already hwpoisoned or rejected by
> hwpoison_filter(), because of the effect after free as mentioned above.
> 
> This is also expected in other case too. For example, slab is a unhandlable
> type of page, but we do set PG_hwpoison.  This flag should not affect any of
> ongoing slab-related process, but that's OK because it becomes effective
> after the slab page is freed.
> 
> So this patch is intended to align to the behavior.  Allowing hugepage
> migration to do something good using PG_hwpoison seems to me an unsolved
> separate issue.

I tend to agree with Naoya. And could we try to do it better? IMHO, we could do a
get_page_unless_zero here to ensure that hugetlb page migration will fail due to
this extra page reference and thus preventing the page content from being accessed.
Does this work? Or am I miss something?

Thanks!

> 
> Thanks,
> Naoya Horiguchi
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-05-12 11:18     ` Miaohe Lin
@ 2022-06-02  6:12       ` HORIGUCHI NAOYA(堀口　直也)
  2022-06-06  3:12         ` Miaohe Lin
  0 siblings, 1 reply; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-06-02  6:12 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Mike Kravetz, Naoya Horiguchi, linux-mm, Andrew Morton,
	zhenwei pi, linux-kernel

On Thu, May 12, 2022 at 07:18:51PM +0800, Miaohe Lin wrote:
> On 2022/5/12 12:32, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Wed, May 11, 2022 at 11:35:55AM -0700, Mike Kravetz wrote:
> >> On 5/11/22 08:19, Naoya Horiguchi wrote:
> >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>>
> >>> If memory_failure() fails to grab page refcount on a hugetlb page
> >>> because it's busy, it returns without setting PG_hwpoison on it.
> >>> This not only loses a chance of error containment, but breaks the rule
> >>> that action_result() should be called only when memory_failure() do
> >>> any of handling work (even if that's just setting PG_hwpoison).
> >>> This inconsistency could harm code maintainability.
> >>>
> >>> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
> >>>
> >>> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
> >>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>> ---
> >>>  include/linux/mm.h  | 1 +
> >>>  mm/memory-failure.c | 8 ++++----
> >>>  2 files changed, 5 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>> index d446e834a3e5..04de0c3e4f9f 100644
> >>> --- a/include/linux/mm.h
> >>> +++ b/include/linux/mm.h
> >>> @@ -3187,6 +3187,7 @@ enum mf_flags {
> >>>  	MF_MUST_KILL = 1 << 2,
> >>>  	MF_SOFT_OFFLINE = 1 << 3,
> >>>  	MF_UNPOISON = 1 << 4,
> >>> +	MF_NO_RETRY = 1 << 5,
> >>>  };
> >>>  extern int memory_failure(unsigned long pfn, int flags);
> >>>  extern void memory_failure_queue(unsigned long pfn, int flags);
> >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>> index 6a28d020a4da..e3269b991016 100644
> >>> --- a/mm/memory-failure.c
> >>> +++ b/mm/memory-failure.c
> >>> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >>>  			count_increased = true;
> >>>  	} else {
> >>>  		ret = -EBUSY;
> >>> -		goto out;
> >>> +		if (!(flags & MF_NO_RETRY))
> >>> +			goto out;
> >>>  	}
> >>
> >> Hi Naoya,
> >>
> >> We are in the else block because !HPageFreed() and !HPageMigratable().
> >> IIUC, this likely means the page is isolated.  One common reason for isolation
> >> is migration.  So, the page could be isolated and on a list for migration.
> > 
> > Yes, and I also detected this issue by testing race between hugepage allocation
> > and memory_failure(). 
> > 
> >>
> >> I took a quick look at the hugetlb migration code and did not see any checks
> >> for PageHWPoison after a hugetlb page is isolated.  I could have missed
> >> something?  If there are no checks, we will read the PageHWPoison page
> >> in kernel mode while copying to the migration target.
> > 
> > Yes, that could happen.  This patch does not affect ongoing hugepage migration.
> > But after the migration source hugepage is freed, the PG_hwpoison should work
> > to prevent reusing.
> > 
> >>
> >> Is this an issue?  Is is something we need to be concerned with?  Memory
> >> errors can happen at any time, and gracefully handling them is best effort.
> > 
> > Right, so doing nothing for this case could be OK if doing something causes
> > some issues or makes code too complicated.  The motivation of this patch is
> > that now I think memory_failure() should do something (at least setting
> > PG_hwpoison) unless the page is already hwpoisoned or rejected by
> > hwpoison_filter(), because of the effect after free as mentioned above.
> > 
> > This is also expected in other case too. For example, slab is a unhandlable
> > type of page, but we do set PG_hwpoison.  This flag should not affect any of
> > ongoing slab-related process, but that's OK because it becomes effective
> > after the slab page is freed.
> > 
> > So this patch is intended to align to the behavior.  Allowing hugepage
> > migration to do something good using PG_hwpoison seems to me an unsolved
> > separate issue.
> 
> I tend to agree with Naoya. And could we try to do it better? IMHO, we could do a
> get_page_unless_zero here to ensure that hugetlb page migration will fail due to
> this extra page reference and thus preventing the page content from being accessed.
> Does this work? Or am I miss something?

Sorry for my missing to answering the question,

Taking page refcount to prevent page migration could work.  One concern is
how we can distinguish hugepages under migration and those under allocation
from buddy.  Maybe this was also mentioned in discussion over 
https://github.com/torvalds/linux/commit/405ce051236cc65b30bbfe490b28ce60ae6aed85
, there's a small window where an allocating compound page is refcounted and
hugepage (having deconstructor COMPOUND_PAGE_DTOR), and not protected by
hugetlb_lock, so simply get_page_unless_zero() might not work (might break
allocation code).
If we have more reliable indicator to ensure that a hugepage is under migration,
that would be helpful.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-06-02  6:12       ` HORIGUCHI NAOYA(堀口　直也)
@ 2022-06-06  3:12         ` Miaohe Lin
  0 siblings, 0 replies; 10+ messages in thread
From: Miaohe Lin @ 2022-06-06  3:12 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Mike Kravetz, Naoya Horiguchi, linux-mm, Andrew Morton,
	zhenwei pi, linux-kernel

On 2022/6/2 14:12, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, May 12, 2022 at 07:18:51PM +0800, Miaohe Lin wrote:
>> On 2022/5/12 12:32, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Wed, May 11, 2022 at 11:35:55AM -0700, Mike Kravetz wrote:
>>>> On 5/11/22 08:19, Naoya Horiguchi wrote:
>>>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>>
>>>>> If memory_failure() fails to grab page refcount on a hugetlb page
>>>>> because it's busy, it returns without setting PG_hwpoison on it.
>>>>> This not only loses a chance of error containment, but breaks the rule
>>>>> that action_result() should be called only when memory_failure() do
>>>>> any of handling work (even if that's just setting PG_hwpoison).
>>>>> This inconsistency could harm code maintainability.
>>>>>
>>>>> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.
>>>>>
>>>>> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
>>>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>> ---
>>>>>  include/linux/mm.h  | 1 +
>>>>>  mm/memory-failure.c | 8 ++++----
>>>>>  2 files changed, 5 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>> index d446e834a3e5..04de0c3e4f9f 100644
>>>>> --- a/include/linux/mm.h
>>>>> +++ b/include/linux/mm.h
>>>>> @@ -3187,6 +3187,7 @@ enum mf_flags {
>>>>>  	MF_MUST_KILL = 1 << 2,
>>>>>  	MF_SOFT_OFFLINE = 1 << 3,
>>>>>  	MF_UNPOISON = 1 << 4,
>>>>> +	MF_NO_RETRY = 1 << 5,
>>>>>  };
>>>>>  extern int memory_failure(unsigned long pfn, int flags);
>>>>>  extern void memory_failure_queue(unsigned long pfn, int flags);
>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>> index 6a28d020a4da..e3269b991016 100644
>>>>> --- a/mm/memory-failure.c
>>>>> +++ b/mm/memory-failure.c
>>>>> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>>>>  			count_increased = true;
>>>>>  	} else {
>>>>>  		ret = -EBUSY;
>>>>> -		goto out;
>>>>> +		if (!(flags & MF_NO_RETRY))
>>>>> +			goto out;
>>>>>  	}
>>>>
>>>> Hi Naoya,
>>>>
>>>> We are in the else block because !HPageFreed() and !HPageMigratable().
>>>> IIUC, this likely means the page is isolated.  One common reason for isolation
>>>> is migration.  So, the page could be isolated and on a list for migration.
>>>
>>> Yes, and I also detected this issue by testing race between hugepage allocation
>>> and memory_failure(). 
>>>
>>>>
>>>> I took a quick look at the hugetlb migration code and did not see any checks
>>>> for PageHWPoison after a hugetlb page is isolated.  I could have missed
>>>> something?  If there are no checks, we will read the PageHWPoison page
>>>> in kernel mode while copying to the migration target.
>>>
>>> Yes, that could happen.  This patch does not affect ongoing hugepage migration.
>>> But after the migration source hugepage is freed, the PG_hwpoison should work
>>> to prevent reusing.
>>>
>>>>
>>>> Is this an issue?  Is is something we need to be concerned with?  Memory
>>>> errors can happen at any time, and gracefully handling them is best effort.
>>>
>>> Right, so doing nothing for this case could be OK if doing something causes
>>> some issues or makes code too complicated.  The motivation of this patch is
>>> that now I think memory_failure() should do something (at least setting
>>> PG_hwpoison) unless the page is already hwpoisoned or rejected by
>>> hwpoison_filter(), because of the effect after free as mentioned above.
>>>
>>> This is also expected in other case too. For example, slab is a unhandlable
>>> type of page, but we do set PG_hwpoison.  This flag should not affect any of
>>> ongoing slab-related process, but that's OK because it becomes effective
>>> after the slab page is freed.
>>>
>>> So this patch is intended to align to the behavior.  Allowing hugepage
>>> migration to do something good using PG_hwpoison seems to me an unsolved
>>> separate issue.
>>
>> I tend to agree with Naoya. And could we try to do it better? IMHO, we could do a
>> get_page_unless_zero here to ensure that hugetlb page migration will fail due to
>> this extra page reference and thus preventing the page content from being accessed.
>> Does this work? Or am I miss something?
> 
> Sorry for my missing to answering the question,

Never mind. I almost forget it too. ;)

> 
> Taking page refcount to prevent page migration could work.  One concern is
> how we can distinguish hugepages under migration and those under allocation
> from buddy.  Maybe this was also mentioned in discussion over 
> https://github.com/torvalds/linux/commit/405ce051236cc65b30bbfe490b28ce60ae6aed85
> , there's a small window where an allocating compound page is refcounted and
> hugepage (having deconstructor COMPOUND_PAGE_DTOR), and not protected by
> hugetlb_lock, so simply get_page_unless_zero() might not work (might break
> allocation code).
> If we have more reliable indicator to ensure that a hugepage is under migration,
> that would be helpful.

Yes, I agree with you. If we have more reliable indicator to ensure that a hugepage
is under migration, we could try to do this then. :)

> 
> Thanks,
> Naoya Horiguchi

Many thanks for reply!

> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-06  3:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11 15:19 [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
2022-05-11 18:35 ` Mike Kravetz
2022-05-12  2:54   ` Miaohe Lin
2022-05-12  3:06     ` Mike Kravetz
2022-05-12  4:50       ` HORIGUCHI NAOYA(堀口　直也)
2022-05-12  4:46     ` HORIGUCHI NAOYA(堀口　直也)
2022-05-12  4:32   ` HORIGUCHI NAOYA(堀口　直也)
2022-05-12 11:18     ` Miaohe Lin
2022-06-02  6:12       ` HORIGUCHI NAOYA(堀口　直也)
2022-06-06  3:12         ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).