linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
@ 2023-07-17 18:18 Sidhartha Kumar
  2023-07-17 19:00 ` Andrew Morton
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Sidhartha Kumar @ 2023-07-17 18:18 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, willy, linmiaohe, naoya.horiguchi, Sidhartha Kumar, stable

It was pointed out[1] that using folio_test_hwpoison() is wrong
as we need to check the indiviual page that has poison.
folio_test_hwpoison() only checks the head page so go back to using
PageHWPoison().

Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
Cc: stable@vger.kernel.org #v6.4
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>

[1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
---
 mm/memory-failure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 02b1d8f104d51..a114c8c3039cd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
-	if (!folio_test_hwpoison(folio)) {
+	if (!PageHWPoison(p)) {
 		unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
 				 pfn, &unpoison_rs);
 		goto unlock_mutex;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
  2023-07-17 18:18 [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory() Sidhartha Kumar
@ 2023-07-17 19:00 ` Andrew Morton
  2023-07-18 14:25   ` Sidhartha Kumar
  2023-07-17 23:21 ` Jane Chu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2023-07-17 19:00 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: linux-kernel, linux-mm, willy, linmiaohe, naoya.horiguchi, stable

On Mon, 17 Jul 2023 11:18:12 -0700 Sidhartha Kumar <sidhartha.kumar@oracle.com> wrote:

> It was pointed out[1] that using folio_test_hwpoison() is wrong
> as we need to check the indiviual page that has poison.
> folio_test_hwpoison() only checks the head page so go back to using
> PageHWPoison().

Please describe the user-visible effects of the bug, especially
when proposing a -stable backport.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
  2023-07-17 18:18 [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory() Sidhartha Kumar
  2023-07-17 19:00 ` Andrew Morton
@ 2023-07-17 23:21 ` Jane Chu
  2023-07-18  0:14 ` Naoya Horiguchi
  2023-07-20  9:06 ` Miaohe Lin
  3 siblings, 0 replies; 9+ messages in thread
From: Jane Chu @ 2023-07-17 23:21 UTC (permalink / raw)
  To: Sidhartha Kumar, linux-kernel, linux-mm
  Cc: akpm, willy, linmiaohe, naoya.horiguchi, stable

On 7/17/2023 11:18 AM, Sidhartha Kumar wrote:
> It was pointed out[1] that using folio_test_hwpoison() is wrong
> as we need to check the indiviual page that has poison.
> folio_test_hwpoison() only checks the head page so go back to using
> PageHWPoison().
> 
> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> Cc: stable@vger.kernel.org #v6.4
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> 
> [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> ---
>   mm/memory-failure.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 02b1d8f104d51..a114c8c3039cd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
>   		goto unlock_mutex;
>   	}
>   
> -	if (!folio_test_hwpoison(folio)) {
> +	if (!PageHWPoison(p)) {
>   		unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
>   				 pfn, &unpoison_rs);
>   		goto unlock_mutex;

Would it worth the trouble to create folio_page_test_##lname(folio, 
index) macros to address folio subpage?

thanks!
-jane



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
  2023-07-17 18:18 [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory() Sidhartha Kumar
  2023-07-17 19:00 ` Andrew Morton
  2023-07-17 23:21 ` Jane Chu
@ 2023-07-18  0:14 ` Naoya Horiguchi
  2023-07-18  0:39   ` Naoya Horiguchi
  2023-07-20  9:06 ` Miaohe Lin
  3 siblings, 1 reply; 9+ messages in thread
From: Naoya Horiguchi @ 2023-07-18  0:14 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: linux-kernel, linux-mm, akpm, willy, linmiaohe, naoya.horiguchi, stable

On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
> It was pointed out[1] that using folio_test_hwpoison() is wrong
> as we need to check the indiviual page that has poison.
> folio_test_hwpoison() only checks the head page so go back to using
> PageHWPoison().
> 
> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> Cc: stable@vger.kernel.org #v6.4
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> 
> [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 02b1d8f104d51..a114c8c3039cd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
>  		goto unlock_mutex;
>  	}
>  
> -	if (!folio_test_hwpoison(folio)) {
> +	if (!PageHWPoison(p)) {


I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
set on the head page, rather than on the raw subpage. In the case of
hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
pages.  (I believe this is not detected because no one considers the
scenario of unpoisoning hwpoisoned thps, which is a rare case).  Perhaps the
function is_page_hwpoison() would be useful for this purpose?

Thanks,
Naoya Horiguchi

>  		unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
>  				 pfn, &unpoison_rs);
>  		goto unlock_mutex;
> -- 
> 2.41.0
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
  2023-07-18  0:14 ` Naoya Horiguchi
@ 2023-07-18  0:39   ` Naoya Horiguchi
  2023-07-18 14:30     ` Sidhartha Kumar
  0 siblings, 1 reply; 9+ messages in thread
From: Naoya Horiguchi @ 2023-07-18  0:39 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: linux-kernel, linux-mm, akpm, willy, linmiaohe, naoya.horiguchi, stable

On Tue, Jul 18, 2023 at 09:14:09AM +0900, Naoya Horiguchi wrote:
> On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
> > It was pointed out[1] that using folio_test_hwpoison() is wrong
> > as we need to check the indiviual page that has poison.
> > folio_test_hwpoison() only checks the head page so go back to using
> > PageHWPoison().
> > 
> > Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> > Cc: stable@vger.kernel.org #v6.4
> > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > 
> > [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> > ---
> >  mm/memory-failure.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 02b1d8f104d51..a114c8c3039cd 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
> >  		goto unlock_mutex;
> >  	}
> >  
> > -	if (!folio_test_hwpoison(folio)) {
> > +	if (!PageHWPoison(p)) {
> 
> 
> I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
> set on the head page, rather than on the raw subpage. In the case of
> hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
> pages.  (I believe this is not detected because no one considers the
> scenario of unpoisoning hwpoisoned thps, which is a rare case).  Perhaps the
> function is_page_hwpoison() would be useful for this purpose?

Sorry, I was wrong.  Checking PageHWPoison() is fine because the users of
unpoison should know where the PageHWPoison is set via /proc/kpageflags.
So this patch is OK to me after comments from other reviewers are resolved.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
  2023-07-17 19:00 ` Andrew Morton
@ 2023-07-18 14:25   ` Sidhartha Kumar
  0 siblings, 0 replies; 9+ messages in thread
From: Sidhartha Kumar @ 2023-07-18 14:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, willy, linmiaohe, naoya.horiguchi, stable

On 7/17/23 12:00 PM, Andrew Morton wrote:
> On Mon, 17 Jul 2023 11:18:12 -0700 Sidhartha Kumar <sidhartha.kumar@oracle.com> wrote:
> 
>> It was pointed out[1] that using folio_test_hwpoison() is wrong
>> as we need to check the indiviual page that has poison.
>> folio_test_hwpoison() only checks the head page so go back to using
>> PageHWPoison().
> 
> Please describe the user-visible effects of the bug, especially
> when proposing a -stable backport.

User-visible effects include existing hwpoison-inject tests possibly 
failing as unpoisoning a single subpage could lead to unpoisoning an 
entire folio. Memory unpoisoning could also not work as expected as the 
function will break early due to only checking the head page and not the 
actually poisoned subpage.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
  2023-07-18  0:39   ` Naoya Horiguchi
@ 2023-07-18 14:30     ` Sidhartha Kumar
  2023-07-18 23:59       ` Naoya Horiguchi
  0 siblings, 1 reply; 9+ messages in thread
From: Sidhartha Kumar @ 2023-07-18 14:30 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-kernel, linux-mm, akpm, willy, linmiaohe, naoya.horiguchi, stable

On 7/17/23 5:39 PM, Naoya Horiguchi wrote:
> On Tue, Jul 18, 2023 at 09:14:09AM +0900, Naoya Horiguchi wrote:
>> On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
>>> It was pointed out[1] that using folio_test_hwpoison() is wrong
>>> as we need to check the indiviual page that has poison.
>>> folio_test_hwpoison() only checks the head page so go back to using
>>> PageHWPoison().
>>>
>>> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>> Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
>>> Cc: stable@vger.kernel.org #v6.4
>>> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>>>
>>> [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
>>> ---
>>>   mm/memory-failure.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 02b1d8f104d51..a114c8c3039cd 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
>>>   		goto unlock_mutex;
>>>   	}
>>>   
>>> -	if (!folio_test_hwpoison(folio)) {
>>> +	if (!PageHWPoison(p)) {
>>
>>
>> I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
>> set on the head page, rather than on the raw subpage. In the case of
>> hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
>> pages.  (I believe this is not detected because no one considers the
>> scenario of unpoisoning hwpoisoned thps, which is a rare case).  Perhaps the
>> function is_page_hwpoison() would be useful for this purpose?
> 
> Sorry, I was wrong.  Checking PageHWPoison() is fine because the users of
> unpoison should know where the PageHWPoison is set via /proc/kpageflags.
> So this patch is OK to me after comments from other reviewers are resolved.
> 

Hi Naoya,

While taking a closer at the patch, later in unpoison_memory() there is 
also:

-               ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
+               ret = folio_test_clear_hwpoison(folio) ? 0 : -EBUSY;

I thought this folio conversion would be safe because page is the result 
of a compound_head() call but I'm wondering if the same issue exists 
here and we should be calling TestClearPageHWPoison() on the specific 
subpage by doing TestClearPageHWPoison(p).

Thanks,
Sidhartha Kumar

> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Thanks,
> Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
  2023-07-18 14:30     ` Sidhartha Kumar
@ 2023-07-18 23:59       ` Naoya Horiguchi
  0 siblings, 0 replies; 9+ messages in thread
From: Naoya Horiguchi @ 2023-07-18 23:59 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: linux-kernel, linux-mm, akpm, willy, linmiaohe, naoya.horiguchi, stable

On Tue, Jul 18, 2023 at 07:30:23AM -0700, Sidhartha Kumar wrote:
> On 7/17/23 5:39 PM, Naoya Horiguchi wrote:
> > On Tue, Jul 18, 2023 at 09:14:09AM +0900, Naoya Horiguchi wrote:
> > > On Mon, Jul 17, 2023 at 11:18:12AM -0700, Sidhartha Kumar wrote:
> > > > It was pointed out[1] that using folio_test_hwpoison() is wrong
> > > > as we need to check the indiviual page that has poison.
> > > > folio_test_hwpoison() only checks the head page so go back to using
> > > > PageHWPoison().
> > > > 
> > > > Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> > > > Cc: stable@vger.kernel.org #v6.4
> > > > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > > > 
> > > > [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> > > > ---
> > > >   mm/memory-failure.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > > index 02b1d8f104d51..a114c8c3039cd 100644
> > > > --- a/mm/memory-failure.c
> > > > +++ b/mm/memory-failure.c
> > > > @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
> > > >   		goto unlock_mutex;
> > > >   	}
> > > > -	if (!folio_test_hwpoison(folio)) {
> > > > +	if (!PageHWPoison(p)) {
> > > 
> > > 
> > > I don't think this works for hwpoisoned hugetlb pages that have PageHWPoison
> > > set on the head page, rather than on the raw subpage. In the case of
> > > hwpoisoned thps, PageHWPoison is set on the raw subpage, not on the head
> > > pages.  (I believe this is not detected because no one considers the
> > > scenario of unpoisoning hwpoisoned thps, which is a rare case).  Perhaps the
> > > function is_page_hwpoison() would be useful for this purpose?
> > 
> > Sorry, I was wrong.  Checking PageHWPoison() is fine because the users of
> > unpoison should know where the PageHWPoison is set via /proc/kpageflags.
> > So this patch is OK to me after comments from other reviewers are resolved.
> > 
> 
> Hi Naoya,
> 
> While taking a closer at the patch, later in unpoison_memory() there is
> also:
> 
> -               ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
> +               ret = folio_test_clear_hwpoison(folio) ? 0 : -EBUSY;
> 
> I thought this folio conversion would be safe because page is the result of
> a compound_head() call but I'm wondering if the same issue exists here and
> we should be calling TestClearPageHWPoison() on the specific subpage by
> doing TestClearPageHWPoison(p).

In this case (get_hwpoison_page returns 0), the target of unpoison_memory was
buddy page or free huge page, so there seems not any realistic problem.
But putting back to TestClearPageHWPoison() looks consistent, so I'm fine with it.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory()
  2023-07-17 18:18 [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory() Sidhartha Kumar
                   ` (2 preceding siblings ...)
  2023-07-18  0:14 ` Naoya Horiguchi
@ 2023-07-20  9:06 ` Miaohe Lin
  3 siblings, 0 replies; 9+ messages in thread
From: Miaohe Lin @ 2023-07-20  9:06 UTC (permalink / raw)
  To: Sidhartha Kumar, linux-kernel, linux-mm
  Cc: akpm, willy, naoya.horiguchi, stable

On 2023/7/18 2:18, Sidhartha Kumar wrote:
> It was pointed out[1] that using folio_test_hwpoison() is wrong
> as we need to check the indiviual page that has poison.
> folio_test_hwpoison() only checks the head page so go back to using
> PageHWPoison().
> 
> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
> Cc: stable@vger.kernel.org #v6.4
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> 
> [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 02b1d8f104d51..a114c8c3039cd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2523,7 +2523,7 @@ int unpoison_memory(unsigned long pfn)
>  		goto unlock_mutex;
>  	}
>  
> -	if (!folio_test_hwpoison(folio)) {
> +	if (!PageHWPoison(p)) {

For successfully handled pages, they should be non-compound pages (dissolved, splitted or normal pages).
So this patch makes no change for them. But for failed to hwpoisoned thp and hugetlb, there's some difference.
But since Naoya points out that, "the users of unpoison should know where the PageHWPoison is set via
/proc/kpageflags.", I'm fine with this patch.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-07-20  9:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-17 18:18 [PATCH] mm/memory-failure: fix hardware poison check in unpoison_memory() Sidhartha Kumar
2023-07-17 19:00 ` Andrew Morton
2023-07-18 14:25   ` Sidhartha Kumar
2023-07-17 23:21 ` Jane Chu
2023-07-18  0:14 ` Naoya Horiguchi
2023-07-18  0:39   ` Naoya Horiguchi
2023-07-18 14:30     ` Sidhartha Kumar
2023-07-18 23:59       ` Naoya Horiguchi
2023-07-20  9:06 ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).