linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] A few fixup and cleanup patches for memory failure
@ 2022-04-07 13:03 Miaohe Lin
  2022-04-07 13:03 ` [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test Miaohe Lin
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Miaohe Lin @ 2022-04-07 13:03 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: shy828301, mike.kravetz, david, linux-mm, linux-kernel, linmiaohe

Hi everyone,
This series contains a patch to fix false-postive PageSwapCache test,
minor cleanup for HWPoisonHandlable and try to dissolve truncated
hugetlb page. More details can be found in the respective changelogs.
Thanks!

---
v1:
Rebase [1] on mainline.
Add new patch "mm/memory-failure.c: dissolve truncated hugetlb page"

[1]https://lore.kernel.org/linux-mm/0d5f2c97-992b-442c-9ecf-26ba363461aa@huawei.com/T/#m310e36806a1f614a4d1c7cc1d7a5e6e6e17f663d
---
Miaohe Lin (3):
  mm/memory-failure.c: avoid false-postive PageSwapCache test
  mm/memory-failure.c: minor cleanup for HWPoisonHandlable
  mm/memory-failure.c: dissolve truncated hugetlb page

 mm/memory-failure.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test
  2022-04-07 13:03 [PATCH 0/3] A few fixup and cleanup patches for memory failure Miaohe Lin
@ 2022-04-07 13:03 ` Miaohe Lin
  2022-04-08  8:52   ` David Hildenbrand
  2022-04-11  6:35   ` HORIGUCHI NAOYA(堀口 直也)
  2022-04-07 13:03 ` [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable Miaohe Lin
  2022-04-07 13:03 ` [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page Miaohe Lin
  2 siblings, 2 replies; 18+ messages in thread
From: Miaohe Lin @ 2022-04-07 13:03 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: shy828301, mike.kravetz, david, linux-mm, linux-kernel, linmiaohe

PageSwapCache is only reliable when PageAnon is true because PG_swapcache
serves as PG_owner_priv_1 which can be used by fs if it's pagecache page.
So we should test PageAnon to distinguish pagecache page from swapcache
page to avoid false-postive PageSwapCache test.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ef402b490663..2e97302d62e4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2262,7 +2262,7 @@ static int __soft_offline_page(struct page *page)
 		return 0;
 	}
 
-	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
+	if (!PageHuge(page) && PageLRU(page) && !PageAnon(page))
 		/*
 		 * Try to invalidate first. This should work for
 		 * non dirty unmapped page cache pages.
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable
  2022-04-07 13:03 [PATCH 0/3] A few fixup and cleanup patches for memory failure Miaohe Lin
  2022-04-07 13:03 ` [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test Miaohe Lin
@ 2022-04-07 13:03 ` Miaohe Lin
  2022-04-08  8:52   ` David Hildenbrand
                     ` (2 more replies)
  2022-04-07 13:03 ` [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page Miaohe Lin
  2 siblings, 3 replies; 18+ messages in thread
From: Miaohe Lin @ 2022-04-07 13:03 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: shy828301, mike.kravetz, david, linux-mm, linux-kernel, linmiaohe

The local variable movable can be removed by returning true directly. Also
fix typo 'mirgate'. No functional change intended.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2e97302d62e4..bd563f47630c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1180,13 +1180,11 @@ void ClearPageHWPoisonTakenOff(struct page *page)
  */
 static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
 {
-	bool movable = false;
-
-	/* Soft offline could mirgate non-LRU movable pages */
+	/* Soft offline could migrate non-LRU movable pages */
 	if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
-		movable = true;
+		return true;
 
-	return movable || PageLRU(page) || is_free_buddy_page(page);
+	return PageLRU(page) || is_free_buddy_page(page);
 }
 
 static int __get_hwpoison_page(struct page *page, unsigned long flags)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page
  2022-04-07 13:03 [PATCH 0/3] A few fixup and cleanup patches for memory failure Miaohe Lin
  2022-04-07 13:03 ` [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test Miaohe Lin
  2022-04-07 13:03 ` [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable Miaohe Lin
@ 2022-04-07 13:03 ` Miaohe Lin
  2022-04-11 13:13   ` HORIGUCHI NAOYA(堀口 直也)
  2 siblings, 1 reply; 18+ messages in thread
From: Miaohe Lin @ 2022-04-07 13:03 UTC (permalink / raw)
  To: akpm, naoya.horiguchi
  Cc: shy828301, mike.kravetz, david, linux-mm, linux-kernel, linmiaohe

If me_huge_page meets a truncated huge page, hpage won't be dissolved
even if we hold the last refcnt. It's because the truncated huge page
has NULL page_mapping while it's not anonymous page too. Thus we lose
the last chance to dissolve it into buddy to save healthy subpages.
Remove PageAnon check to handle these huge pages too.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index bd563f47630c..3f054dbb169d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1046,8 +1046,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		 * hugepage, so we can free and dissolve it into buddy to
 		 * save healthy subpages.
 		 */
-		if (PageAnon(hpage))
-			put_page(hpage);
+		put_page(hpage);
 		if (__page_handle_poison(p)) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test
  2022-04-07 13:03 ` [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test Miaohe Lin
@ 2022-04-08  8:52   ` David Hildenbrand
  2022-04-08 17:32     ` Yang Shi
  2022-04-11  6:35   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2022-04-08  8:52 UTC (permalink / raw)
  To: Miaohe Lin, akpm, naoya.horiguchi
  Cc: shy828301, mike.kravetz, linux-mm, linux-kernel

On 07.04.22 15:03, Miaohe Lin wrote:
> PageSwapCache is only reliable when PageAnon is true because PG_swapcache
> serves as PG_owner_priv_1 which can be used by fs if it's pagecache page.
> So we should test PageAnon to distinguish pagecache page from swapcache
> page to avoid false-postive PageSwapCache test.

Well, that's not quite correct. Just because a page is PageAnon()
doesn't mean that it's in the swapache. It means that it might be in the
swapcache but cannot be in the pagecache.

Maybe you wanted to say

"So we should test PageAnon() to distinguish pagecache pages from
anonymous pages."

> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ef402b490663..2e97302d62e4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2262,7 +2262,7 @@ static int __soft_offline_page(struct page *page)
>  		return 0;
>  	}
>  
> -	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
> +	if (!PageHuge(page) && PageLRU(page) && !PageAnon(page))
>  		/*
>  		 * Try to invalidate first. This should work for
>  		 * non dirty unmapped page cache pages.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable
  2022-04-07 13:03 ` [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable Miaohe Lin
@ 2022-04-08  8:52   ` David Hildenbrand
  2022-04-08 17:33   ` Yang Shi
  2022-04-11 13:14   ` HORIGUCHI NAOYA(堀口 直也)
  2 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2022-04-08  8:52 UTC (permalink / raw)
  To: Miaohe Lin, akpm, naoya.horiguchi
  Cc: shy828301, mike.kravetz, linux-mm, linux-kernel

On 07.04.22 15:03, Miaohe Lin wrote:
> The local variable movable can be removed by returning true directly. Also
> fix typo 'mirgate'. No functional change intended.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 2e97302d62e4..bd563f47630c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1180,13 +1180,11 @@ void ClearPageHWPoisonTakenOff(struct page *page)
>   */
>  static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>  {
> -	bool movable = false;
> -
> -	/* Soft offline could mirgate non-LRU movable pages */
> +	/* Soft offline could migrate non-LRU movable pages */
>  	if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
> -		movable = true;
> +		return true;
>  
> -	return movable || PageLRU(page) || is_free_buddy_page(page);
> +	return PageLRU(page) || is_free_buddy_page(page);
>  }
>  
>  static int __get_hwpoison_page(struct page *page, unsigned long flags)

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test
  2022-04-08  8:52   ` David Hildenbrand
@ 2022-04-08 17:32     ` Yang Shi
  2022-04-09  2:36       ` Miaohe Lin
  0 siblings, 1 reply; 18+ messages in thread
From: Yang Shi @ 2022-04-08 17:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Miaohe Lin, Andrew Morton,
	HORIGUCHI NAOYA(堀口 直也),
	Mike Kravetz, Linux MM, Linux Kernel Mailing List

On Fri, Apr 8, 2022 at 1:52 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.04.22 15:03, Miaohe Lin wrote:
> > PageSwapCache is only reliable when PageAnon is true because PG_swapcache
> > serves as PG_owner_priv_1 which can be used by fs if it's pagecache page.
> > So we should test PageAnon to distinguish pagecache page from swapcache
> > page to avoid false-postive PageSwapCache test.
>
> Well, that's not quite correct. Just because a page is PageAnon()
> doesn't mean that it's in the swapache. It means that it might be in the
> swapcache but cannot be in the pagecache.
>
> Maybe you wanted to say
>
> "So we should test PageAnon() to distinguish pagecache pages from
> anonymous pages."

Yeah, I agree. The patch looks fine to me with David's comment addressed.

>
> >
> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> >  mm/memory-failure.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index ef402b490663..2e97302d62e4 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -2262,7 +2262,7 @@ static int __soft_offline_page(struct page *page)
> >               return 0;
> >       }
> >
> > -     if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
> > +     if (!PageHuge(page) && PageLRU(page) && !PageAnon(page))
> >               /*
> >                * Try to invalidate first. This should work for
> >                * non dirty unmapped page cache pages.
>
>
> --
> Thanks,
>
> David / dhildenb
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable
  2022-04-07 13:03 ` [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable Miaohe Lin
  2022-04-08  8:52   ` David Hildenbrand
@ 2022-04-08 17:33   ` Yang Shi
  2022-04-11 13:14   ` HORIGUCHI NAOYA(堀口 直也)
  2 siblings, 0 replies; 18+ messages in thread
From: Yang Shi @ 2022-04-08 17:33 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, HORIGUCHI NAOYA(堀口 直也),
	Mike Kravetz, David Hildenbrand, Linux MM,
	Linux Kernel Mailing List

On Thu, Apr 7, 2022 at 6:03 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> The local variable movable can be removed by returning true directly. Also
> fix typo 'mirgate'. No functional change intended.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
>  mm/memory-failure.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 2e97302d62e4..bd563f47630c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1180,13 +1180,11 @@ void ClearPageHWPoisonTakenOff(struct page *page)
>   */
>  static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>  {
> -       bool movable = false;
> -
> -       /* Soft offline could mirgate non-LRU movable pages */
> +       /* Soft offline could migrate non-LRU movable pages */
>         if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
> -               movable = true;
> +               return true;
>
> -       return movable || PageLRU(page) || is_free_buddy_page(page);
> +       return PageLRU(page) || is_free_buddy_page(page);
>  }
>
>  static int __get_hwpoison_page(struct page *page, unsigned long flags)
> --
> 2.23.0
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test
  2022-04-08 17:32     ` Yang Shi
@ 2022-04-09  2:36       ` Miaohe Lin
  0 siblings, 0 replies; 18+ messages in thread
From: Miaohe Lin @ 2022-04-09  2:36 UTC (permalink / raw)
  To: Yang Shi, David Hildenbrand
  Cc: Andrew Morton, HORIGUCHI NAOYA(堀口 直也),
	Mike Kravetz, Linux MM, Linux Kernel Mailing List

On 2022/4/9 1:32, Yang Shi wrote:
> On Fri, Apr 8, 2022 at 1:52 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.04.22 15:03, Miaohe Lin wrote:
>>> PageSwapCache is only reliable when PageAnon is true because PG_swapcache
>>> serves as PG_owner_priv_1 which can be used by fs if it's pagecache page.
>>> So we should test PageAnon to distinguish pagecache page from swapcache
>>> page to avoid false-postive PageSwapCache test.
>>
>> Well, that's not quite correct. Just because a page is PageAnon()
>> doesn't mean that it's in the swapache. It means that it might be in the
>> swapcache but cannot be in the pagecache.
>>
>> Maybe you wanted to say
>>
>> "So we should test PageAnon() to distinguish pagecache pages from
>> anonymous pages."

That's indeed what I want to say.

> 
> Yeah, I agree. The patch looks fine to me with David's comment addressed.

Many thanks for both of you! Will do it in v3.

> 
>>
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  mm/memory-failure.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index ef402b490663..2e97302d62e4 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -2262,7 +2262,7 @@ static int __soft_offline_page(struct page *page)
>>>               return 0;
>>>       }
>>>
>>> -     if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
>>> +     if (!PageHuge(page) && PageLRU(page) && !PageAnon(page))
>>>               /*
>>>                * Try to invalidate first. This should work for
>>>                * non dirty unmapped page cache pages.
>>
>>
>> --
>> Thanks,
>>
>> David / dhildenb
>>
>>
> .
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test
  2022-04-07 13:03 ` [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test Miaohe Lin
  2022-04-08  8:52   ` David Hildenbrand
@ 2022-04-11  6:35   ` HORIGUCHI NAOYA(堀口 直也)
  2022-04-11 13:19     ` Miaohe Lin
  1 sibling, 1 reply; 18+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-04-11  6:35 UTC (permalink / raw)
  To: Miaohe Lin; +Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

On Thu, Apr 07, 2022 at 09:03:50PM +0800, Miaohe Lin wrote:
> PageSwapCache is only reliable when PageAnon is true because PG_swapcache
> serves as PG_owner_priv_1 which can be used by fs if it's pagecache page.
> So we should test PageAnon to distinguish pagecache page from swapcache
> page to avoid false-postive PageSwapCache test.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ef402b490663..2e97302d62e4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2262,7 +2262,7 @@ static int __soft_offline_page(struct page *page)
>  		return 0;
>  	}
>  
> -	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
> +	if (!PageHuge(page) && PageLRU(page) && !PageAnon(page))
>  		/*
>  		 * Try to invalidate first. This should work for
>  		 * non dirty unmapped page cache pages.
> -- 

I foudn that with this change the following VM_BUG_ON_FOLIO() is triggered
when calling soft-offline for a swapcache.  Maybe we need check both of
PageAnon and PageSwapCache instead of either?

Thanks,
Naoya Horiguchi

[   41.232172] page:0000000033d8a20c refcount:0 mapcount:0 mapping:00000000bc103d88 index:0x36d pfn:0x14359b
[   41.234931] memcg:ffff8c2f84d72000
[   41.235850] aops:swap_aops
[   41.236576] flags: 0x57ffffc0080415(locked|uptodate|lru|owner_priv_1|swapbacked|node=1|zone=2|lastcpupid=0x1fffff)
[   41.239221] raw: 0057ffffc0080415 ffffef2c050eda48 ffffef2c050dbe08 0000000000000000
[   41.241216] raw: 000000000000036d 000000000000036e 00000000ffffffff ffff8c2f84d72000
[   41.243184] page dumped because: VM_BUG_ON_FOLIO(folio_test_lru(folio))
[   41.244872] ------------[ cut here ]------------
[   41.246074] kernel BUG at mm/memcontrol.c:7062!
[   41.247248] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   41.248539] CPU: 5 PID: 1036 Comm: bash Tainted: G            E     5.18.0-rc1-v5.18-rc1-220408-2310-012-gf501f+ #11
[   41.251844] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
[   41.254087] RIP: 0010:mem_cgroup_swapout+0x181/0x2c0
[   41.255399] Code: 03 0f 85 37 01 00 00 65 48 ff 08 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 8b a3 e0 ff 48 c7 c6 88 b7 5a aa e8 3f 26 f7 ff <0f> 0b 48 c7 c6 d0 ec 5a aa e8 31 26 f7 ff 0f 0b 66 90 8b 45 5c 48
[   41.260408] RSP: 0018:ffffa9340218fce0 EFLAGS: 00010082
[   41.261780] RAX: 000000000000003b RBX: ffff8c2fc180a000 RCX: 0000000000000000
[   41.263604] RDX: 0000000000000002 RSI: ffffffffaa599561 RDI: 00000000ffffffff
[   41.265435] RBP: ffffef2c050d66c0 R08: 0000000000000000 R09: 00000000ffffdfff
[   41.267266] R10: ffffa9340218fad0 R11: ffffffffaa940d08 R12: 000000000000036e
[   41.269094] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8c2fc180a008
[   41.270911] FS:  00007f00259d3740(0000) GS:ffff8c30bbc80000(0000) knlGS:0000000000000000
[   41.272975] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.274422] CR2: 0000561d1325a973 CR3: 00000001452c2001 CR4: 0000000000170ee0
[   41.276253] Call Trace:
[   41.277013]  <TASK>
[   41.277561]  __remove_mapping+0xce/0x300
[   41.278604]  remove_mapping+0x12/0xe0
[   41.279571]  soft_offline_page+0x834/0x8b0
[   41.280972]  soft_offline_page_store+0x43/0x70
[   41.282171]  kernfs_fop_write_iter+0x11c/0x1b0
[   41.283292]  new_sync_write+0xf9/0x160
[   41.284310]  vfs_write+0x209/0x290
[   41.285174]  ksys_write+0x4f/0xc0
[   41.286049]  do_syscall_64+0x3b/0x90
[   41.286991]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   41.288328] RIP: 0033:0x7f00257018b7
[   41.289232] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   41.294001] RSP: 002b:00007fff3dc50748 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   41.295937] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f00257018b7
[   41.297767] RDX: 000000000000000c RSI: 0000561d1325a970 RDI: 0000000000000001
[   41.299600] RBP: 0000561d1325a970 R08: 0000000000000000 R09: 00007f00257b64e0
[   41.301418] R10: 00007f00257b63e0 R11: 0000000000000246 R12: 000000000000000c
[   41.303305] R13: 00007f00257fb5a0 R14: 000000000000000c R15: 00007f00257fb7a0
[   41.305179]  </TASK>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page
  2022-04-07 13:03 ` [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page Miaohe Lin
@ 2022-04-11 13:13   ` HORIGUCHI NAOYA(堀口 直也)
  2022-04-12  2:47     ` Miaohe Lin
  0 siblings, 1 reply; 18+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-04-11 13:13 UTC (permalink / raw)
  To: Miaohe Lin; +Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

Hi Miaohe,

On Thu, Apr 07, 2022 at 09:03:52PM +0800, Miaohe Lin wrote:
> If me_huge_page meets a truncated huge page, hpage won't be dissolved

I might not understand correctly what "truncated huge page" means.  If it
means the page passed to me_huge_page() and truncate_error_page() is called
on it, the else branch you're trying to update is not chosen, so maybe it
sounds irrelevant to me?  Could you elaborate it or share the procedure to
reproduce the case you care about?

> even if we hold the last refcnt. It's because the truncated huge page
> has NULL page_mapping while it's not anonymous page too. Thus we lose
> the last chance to dissolve it into buddy to save healthy subpages.
> Remove PageAnon check to handle these huge pages too.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index bd563f47630c..3f054dbb169d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1046,8 +1046,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
>  		 * hugepage, so we can free and dissolve it into buddy to
>  		 * save healthy subpages.
>  		 */
> -		if (PageAnon(hpage))
> -			put_page(hpage);

I think that the reason of this "if (PageAnon(hpage))" is to not remove
hugepages for hugetlbfs files.  Unlike anonymous hugepage, it can be
accessed from file after error handling, so we can't simply dissolve it
because otherwise another process reading the hugepage sees zeroed one
without knowing the memory error.

Thanks,
Naoya Horiguchi

> +		put_page(hpage);
>  		if (__page_handle_poison(p)) {
>  			page_ref_inc(p);
>  			res = MF_RECOVERED;
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable
  2022-04-07 13:03 ` [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable Miaohe Lin
  2022-04-08  8:52   ` David Hildenbrand
  2022-04-08 17:33   ` Yang Shi
@ 2022-04-11 13:14   ` HORIGUCHI NAOYA(堀口 直也)
  2 siblings, 0 replies; 18+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-04-11 13:14 UTC (permalink / raw)
  To: Miaohe Lin; +Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

On Thu, Apr 07, 2022 at 09:03:51PM +0800, Miaohe Lin wrote:
> The local variable movable can be removed by returning true directly. Also
> fix typo 'mirgate'. No functional change intended.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

> ---
>  mm/memory-failure.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 2e97302d62e4..bd563f47630c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1180,13 +1180,11 @@ void ClearPageHWPoisonTakenOff(struct page *page)
>   */
>  static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>  {
> -	bool movable = false;
> -
> -	/* Soft offline could mirgate non-LRU movable pages */
> +	/* Soft offline could migrate non-LRU movable pages */
>  	if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
> -		movable = true;
> +		return true;
>  
> -	return movable || PageLRU(page) || is_free_buddy_page(page);
> +	return PageLRU(page) || is_free_buddy_page(page);
>  }
>  
>  static int __get_hwpoison_page(struct page *page, unsigned long flags)
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test
  2022-04-11  6:35   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-04-11 13:19     ` Miaohe Lin
  2022-04-12  6:37       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 18+ messages in thread
From: Miaohe Lin @ 2022-04-11 13:19 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

On 2022/4/11 14:35, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, Apr 07, 2022 at 09:03:50PM +0800, Miaohe Lin wrote:
>> PageSwapCache is only reliable when PageAnon is true because PG_swapcache
>> serves as PG_owner_priv_1 which can be used by fs if it's pagecache page.
>> So we should test PageAnon to distinguish pagecache page from swapcache
>> page to avoid false-postive PageSwapCache test.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory-failure.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index ef402b490663..2e97302d62e4 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2262,7 +2262,7 @@ static int __soft_offline_page(struct page *page)
>>  		return 0;
>>  	}
>>  
>> -	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
>> +	if (!PageHuge(page) && PageLRU(page) && !PageAnon(page))
>>  		/*
>>  		 * Try to invalidate first. This should work for
>>  		 * non dirty unmapped page cache pages.
>> -- 
> 
> I foudn that with this change the following VM_BUG_ON_FOLIO() is triggered
> when calling soft-offline for a swapcache.  Maybe we need check both of
> PageAnon and PageSwapCache instead of either?
> 

Many thanks for your test! This is my overlook. Sorry about it! :( The root cause is that the page is
added into swapcache and lru( so that it can pass the HWPoisonHandlable check) but page anon is not
set yet due to page lock is held by __soft_offline_page. So we have the below core dump:

[   41.232172] page:0000000033d8a20c refcount:0 mapcount:0 mapping:00000000bc103d88 index:0x36d pfn:0x14359b
										^^^ page is not anon

[   41.236576] flags: 0x57ffffc0080415(locked|uptodate|lru|owner_priv_1|swapbacked|node=1|zone=2|lastcpupid=0x1fffff)
								^^^^^^^^^^^^^^^^^^ page is in swapcache

It seems we can check !PageAnon(page) && !PageSwapCache(page), as you suggested, to fix this issue. But maybe I
should drop this patch because invalidate_inode_page will always return 0 for PageAnon due to folio_mapping == NULL.
So nothing is really done for anonymous page here. And the origin !PageSwapCache(page) check should do the right work.
Or we shouldn't even try to call invalidate_inode_page with anonymous page in principle?

BTW: PageSwapCache should be reliable here as folio_test_swapbacked is checked implicitly inside it. In such case, PG_swapcache
can't serve as PG_owner_priv_1 as pagecache page shouldn't set PG_swapbacked (shmem will set PG_swapbacked but PG_owner_priv_1
is not used anyway). Or am I miss something again?

> Thanks,
> Naoya Horiguchi

Thanks a lot!

> 
> [   41.232172] page:0000000033d8a20c refcount:0 mapcount:0 mapping:00000000bc103d88 index:0x36d pfn:0x14359b
> [   41.234931] memcg:ffff8c2f84d72000
> [   41.235850] aops:swap_aops
> [   41.236576] flags: 0x57ffffc0080415(locked|uptodate|lru|owner_priv_1|swapbacked|node=1|zone=2|lastcpupid=0x1fffff)
> [   41.239221] raw: 0057ffffc0080415 ffffef2c050eda48 ffffef2c050dbe08 0000000000000000
> [   41.241216] raw: 000000000000036d 000000000000036e 00000000ffffffff ffff8c2f84d72000
> [   41.243184] page dumped because: VM_BUG_ON_FOLIO(folio_test_lru(folio))
> [   41.244872] ------------[ cut here ]------------
> [   41.246074] kernel BUG at mm/memcontrol.c:7062!
> [   41.247248] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [   41.248539] CPU: 5 PID: 1036 Comm: bash Tainted: G            E     5.18.0-rc1-v5.18-rc1-220408-2310-012-gf501f+ #11
> [   41.251844] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> [   41.254087] RIP: 0010:mem_cgroup_swapout+0x181/0x2c0
> [   41.255399] Code: 03 0f 85 37 01 00 00 65 48 ff 08 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 8b a3 e0 ff 48 c7 c6 88 b7 5a aa e8 3f 26 f7 ff <0f> 0b 48 c7 c6 d0 ec 5a aa e8 31 26 f7 ff 0f 0b 66 90 8b 45 5c 48
> [   41.260408] RSP: 0018:ffffa9340218fce0 EFLAGS: 00010082
> [   41.261780] RAX: 000000000000003b RBX: ffff8c2fc180a000 RCX: 0000000000000000
> [   41.263604] RDX: 0000000000000002 RSI: ffffffffaa599561 RDI: 00000000ffffffff
> [   41.265435] RBP: ffffef2c050d66c0 R08: 0000000000000000 R09: 00000000ffffdfff
> [   41.267266] R10: ffffa9340218fad0 R11: ffffffffaa940d08 R12: 000000000000036e
> [   41.269094] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8c2fc180a008
> [   41.270911] FS:  00007f00259d3740(0000) GS:ffff8c30bbc80000(0000) knlGS:0000000000000000
> [   41.272975] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   41.274422] CR2: 0000561d1325a973 CR3: 00000001452c2001 CR4: 0000000000170ee0
> [   41.276253] Call Trace:
> [   41.277013]  <TASK>
> [   41.277561]  __remove_mapping+0xce/0x300
> [   41.278604]  remove_mapping+0x12/0xe0
> [   41.279571]  soft_offline_page+0x834/0x8b0
> [   41.280972]  soft_offline_page_store+0x43/0x70
> [   41.282171]  kernfs_fop_write_iter+0x11c/0x1b0
> [   41.283292]  new_sync_write+0xf9/0x160
> [   41.284310]  vfs_write+0x209/0x290
> [   41.285174]  ksys_write+0x4f/0xc0
> [   41.286049]  do_syscall_64+0x3b/0x90
> [   41.286991]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   41.288328] RIP: 0033:0x7f00257018b7
> [   41.289232] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> [   41.294001] RSP: 002b:00007fff3dc50748 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [   41.295937] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f00257018b7
> [   41.297767] RDX: 000000000000000c RSI: 0000561d1325a970 RDI: 0000000000000001
> [   41.299600] RBP: 0000561d1325a970 R08: 0000000000000000 R09: 00007f00257b64e0
> [   41.301418] R10: 00007f00257b63e0 R11: 0000000000000246 R12: 000000000000000c
> [   41.303305] R13: 00007f00257fb5a0 R14: 000000000000000c R15: 00007f00257fb7a0
> [   41.305179]  </TASK>
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page
  2022-04-11 13:13   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-04-12  2:47     ` Miaohe Lin
  2022-04-12  5:59       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 18+ messages in thread
From: Miaohe Lin @ 2022-04-12  2:47 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

On 2022/4/11 21:13, HORIGUCHI NAOYA(堀口 直也) wrote:
> Hi Miaohe,
> 
> On Thu, Apr 07, 2022 at 09:03:52PM +0800, Miaohe Lin wrote:
>> If me_huge_page meets a truncated huge page, hpage won't be dissolved
> 
> I might not understand correctly what "truncated huge page" means.  If it
> means the page passed to me_huge_page() and truncate_error_page() is called
> on it, the else branch you're trying to update is not chosen, so maybe it
> sounds irrelevant to me?  Could you elaborate it or share the procedure to
> reproduce the case you care about?

Sorry for making confusing. What 'truncated hugetlb page' means is that a hugepage is
truncated but still on the way to free. So HPageMigratable is still set and we might
come across it here. Does this make sense for you?

> 
>> even if we hold the last refcnt. It's because the truncated huge page
>> has NULL page_mapping while it's not anonymous page too. Thus we lose
>> the last chance to dissolve it into buddy to save healthy subpages.
>> Remove PageAnon check to handle these huge pages too.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory-failure.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index bd563f47630c..3f054dbb169d 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1046,8 +1046,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
>>  		 * hugepage, so we can free and dissolve it into buddy to
>>  		 * save healthy subpages.
>>  		 */
>> -		if (PageAnon(hpage))
>> -			put_page(hpage);
> 
> I think that the reason of this "if (PageAnon(hpage))" is to not remove
> hugepages for hugetlbfs files.  Unlike anonymous hugepage, it can be
> accessed from file after error handling, so we can't simply dissolve it
> because otherwise another process reading the hugepage sees zeroed one
> without knowing the memory error.

In this branch, we have precondition that page_mapping is NULL. So it can't be hugepages
for hugetlbfs files. It should be anonymous hugepages in most cases. If it's not anonymous
hugepages too, i.e. (!page_mapping(hpage) && !PageAnon(hpage)), it could be free hugepages
or 'truncated hugetlb page'. But we have already handled the free hugepages case, it should
be 'truncated hugetlb page' here. Since it's on the way to free, we should put the refcnt
to increase the chance that we can free and dissolve it into buddy to save healthy subpages.
Or am I miss something?

Thanks!

> 
> Thanks,
> Naoya Horiguchi
> 
>> +		put_page(hpage);
>>  		if (__page_handle_poison(p)) {
>>  			page_ref_inc(p);
>>  			res = MF_RECOVERED;
>> -- 
>> 2.23.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page
  2022-04-12  2:47     ` Miaohe Lin
@ 2022-04-12  5:59       ` HORIGUCHI NAOYA(堀口 直也)
  2022-04-12  6:10         ` Miaohe Lin
  0 siblings, 1 reply; 18+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-04-12  5:59 UTC (permalink / raw)
  To: Miaohe Lin; +Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

On Tue, Apr 12, 2022 at 10:47:53AM +0800, Miaohe Lin wrote:
> On 2022/4/11 21:13, HORIGUCHI NAOYA(堀口 直也) wrote:
> > Hi Miaohe,
> > 
> > On Thu, Apr 07, 2022 at 09:03:52PM +0800, Miaohe Lin wrote:
> >> If me_huge_page meets a truncated huge page, hpage won't be dissolved
> > 
> > I might not understand correctly what "truncated huge page" means.  If it
> > means the page passed to me_huge_page() and truncate_error_page() is called
> > on it, the else branch you're trying to update is not chosen, so maybe it
> > sounds irrelevant to me?  Could you elaborate it or share the procedure to
> > reproduce the case you care about?
> 
> Sorry for making confusing. What 'truncated hugetlb page' means is that a hugepage is
> truncated but still on the way to free. So HPageMigratable is still set and we might
> come across it here. Does this make sense for you?

Yes, it does. Thank you.

> 
> > 
> >> even if we hold the last refcnt. It's because the truncated huge page
> >> has NULL page_mapping while it's not anonymous page too. Thus we lose
> >> the last chance to dissolve it into buddy to save healthy subpages.
> >> Remove PageAnon check to handle these huge pages too.
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/memory-failure.c | 3 +--
> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index bd563f47630c..3f054dbb169d 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1046,8 +1046,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
> >>  		 * hugepage, so we can free and dissolve it into buddy to
> >>  		 * save healthy subpages.
> >>  		 */
> >> -		if (PageAnon(hpage))
> >> -			put_page(hpage);
> > 
> > I think that the reason of this "if (PageAnon(hpage))" is to not remove
> > hugepages for hugetlbfs files.  Unlike anonymous hugepage, it can be
> > accessed from file after error handling, so we can't simply dissolve it
> > because otherwise another process reading the hugepage sees zeroed one
> > without knowing the memory error.
> 
> In this branch, we have precondition that page_mapping is NULL. So it can't be hugepages
> for hugetlbfs files. It should be anonymous hugepages in most cases. If it's not anonymous
> hugepages too, i.e. (!page_mapping(hpage) && !PageAnon(hpage)), it could be free hugepages
> or 'truncated hugetlb page'. But we have already handled the free hugepages case, it should
> be 'truncated hugetlb page' here. Since it's on the way to free, we should put the refcnt
> to increase the chance that we can free and dissolve it into buddy to save healthy subpages.
> Or am I miss something?

No, it sounds correct.  So I agree with removing the "if".  Could you also
update the inline comment just above it in the next version?  We no longer
need to limitedly mention "anonymous hugepage".

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page
  2022-04-12  5:59       ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-04-12  6:10         ` Miaohe Lin
  0 siblings, 0 replies; 18+ messages in thread
From: Miaohe Lin @ 2022-04-12  6:10 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

On 2022/4/12 13:59, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Apr 12, 2022 at 10:47:53AM +0800, Miaohe Lin wrote:
>> On 2022/4/11 21:13, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> Hi Miaohe,
>>>
>>> On Thu, Apr 07, 2022 at 09:03:52PM +0800, Miaohe Lin wrote:
>>>> If me_huge_page meets a truncated huge page, hpage won't be dissolved
>>>
>>> I might not understand correctly what "truncated huge page" means.  If it
>>> means the page passed to me_huge_page() and truncate_error_page() is called
>>> on it, the else branch you're trying to update is not chosen, so maybe it
>>> sounds irrelevant to me?  Could you elaborate it or share the procedure to
>>> reproduce the case you care about?
>>
>> Sorry for making confusing. What 'truncated hugetlb page' means is that a hugepage is
>> truncated but still on the way to free. So HPageMigratable is still set and we might
>> come across it here. Does this make sense for you?
> 
> Yes, it does. Thank you.
> 
>>
>>>
>>>> even if we hold the last refcnt. It's because the truncated huge page
>>>> has NULL page_mapping while it's not anonymous page too. Thus we lose
>>>> the last chance to dissolve it into buddy to save healthy subpages.
>>>> Remove PageAnon check to handle these huge pages too.
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/memory-failure.c | 3 +--
>>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index bd563f47630c..3f054dbb169d 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1046,8 +1046,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
>>>>  		 * hugepage, so we can free and dissolve it into buddy to
>>>>  		 * save healthy subpages.
>>>>  		 */
>>>> -		if (PageAnon(hpage))
>>>> -			put_page(hpage);
>>>
>>> I think that the reason of this "if (PageAnon(hpage))" is to not remove
>>> hugepages for hugetlbfs files.  Unlike anonymous hugepage, it can be
>>> accessed from file after error handling, so we can't simply dissolve it
>>> because otherwise another process reading the hugepage sees zeroed one
>>> without knowing the memory error.
>>
>> In this branch, we have precondition that page_mapping is NULL. So it can't be hugepages
>> for hugetlbfs files. It should be anonymous hugepages in most cases. If it's not anonymous
>> hugepages too, i.e. (!page_mapping(hpage) && !PageAnon(hpage)), it could be free hugepages
>> or 'truncated hugetlb page'. But we have already handled the free hugepages case, it should
>> be 'truncated hugetlb page' here. Since it's on the way to free, we should put the refcnt
>> to increase the chance that we can free and dissolve it into buddy to save healthy subpages.
>> Or am I miss something?
> 
> No, it sounds correct.  So I agree with removing the "if".  Could you also
> update the inline comment just above it in the next version?  We no longer
> need to limitedly mention "anonymous hugepage".

Sure. Will do it in next version. Many thanks!

> 
> Thanks,
> Naoya Horiguchi
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test
  2022-04-11 13:19     ` Miaohe Lin
@ 2022-04-12  6:37       ` HORIGUCHI NAOYA(堀口 直也)
  2022-04-12  8:57         ` Miaohe Lin
  0 siblings, 1 reply; 18+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-04-12  6:37 UTC (permalink / raw)
  To: Miaohe Lin; +Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

On Mon, Apr 11, 2022 at 09:19:26PM +0800, Miaohe Lin wrote:
> On 2022/4/11 14:35, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Thu, Apr 07, 2022 at 09:03:50PM +0800, Miaohe Lin wrote:
> >> PageSwapCache is only reliable when PageAnon is true because PG_swapcache
> >> serves as PG_owner_priv_1 which can be used by fs if it's pagecache page.
> >> So we should test PageAnon to distinguish pagecache page from swapcache
> >> page to avoid false-postive PageSwapCache test.
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/memory-failure.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index ef402b490663..2e97302d62e4 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -2262,7 +2262,7 @@ static int __soft_offline_page(struct page *page)
> >>  		return 0;
> >>  	}
> >>  
> >> -	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
> >> +	if (!PageHuge(page) && PageLRU(page) && !PageAnon(page))
> >>  		/*
> >>  		 * Try to invalidate first. This should work for
> >>  		 * non dirty unmapped page cache pages.
> >> -- 
> > 
> > I foudn that with this change the following VM_BUG_ON_FOLIO() is triggered
> > when calling soft-offline for a swapcache.  Maybe we need check both of
> > PageAnon and PageSwapCache instead of either?
> > 
> 
> Many thanks for your test! This is my overlook. Sorry about it! :( The root cause is that the page is
> added into swapcache and lru( so that it can pass the HWPoisonHandlable check) but page anon is not
> set yet due to page lock is held by __soft_offline_page. So we have the below core dump:
> 
> [   41.232172] page:0000000033d8a20c refcount:0 mapcount:0 mapping:00000000bc103d88 index:0x36d pfn:0x14359b
> 										^^^ page is not anon
> 
> [   41.236576] flags: 0x57ffffc0080415(locked|uptodate|lru|owner_priv_1|swapbacked|node=1|zone=2|lastcpupid=0x1fffff)
> 								^^^^^^^^^^^^^^^^^^ page is in swapcache
> 
> It seems we can check !PageAnon(page) && !PageSwapCache(page), as you suggested, to fix this issue. But maybe I
> should drop this patch because invalidate_inode_page will always return 0 for PageAnon due to folio_mapping == NULL.
> So nothing is really done for anonymous page here. And the origin !PageSwapCache(page) check should do the right work.

Thanks for clarification.

> Or we shouldn't even try to call invalidate_inode_page with anonymous page in principle?

I think just keeping the current behavior is fine (because as you stated
above invalidate_inode_page() simple ignores anonymous pages).

Thanks,
Naoya Horiguchi

> BTW: PageSwapCache should be reliable here as folio_test_swapbacked is checked implicitly inside it. In such case, PG_swapcache
> can't serve as PG_owner_priv_1 as pagecache page shouldn't set PG_swapbacked (shmem will set PG_swapbacked but PG_owner_priv_1
> is not used anyway). Or am I miss something again?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test
  2022-04-12  6:37       ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-04-12  8:57         ` Miaohe Lin
  0 siblings, 0 replies; 18+ messages in thread
From: Miaohe Lin @ 2022-04-12  8:57 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, shy828301, mike.kravetz, david, linux-mm, linux-kernel

On 2022/4/12 14:37, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Apr 11, 2022 at 09:19:26PM +0800, Miaohe Lin wrote:
>> On 2022/4/11 14:35, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Thu, Apr 07, 2022 at 09:03:50PM +0800, Miaohe Lin wrote:
>>>> PageSwapCache is only reliable when PageAnon is true because PG_swapcache
>>>> serves as PG_owner_priv_1 which can be used by fs if it's pagecache page.
>>>> So we should test PageAnon to distinguish pagecache page from swapcache
>>>> page to avoid false-postive PageSwapCache test.
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/memory-failure.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index ef402b490663..2e97302d62e4 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -2262,7 +2262,7 @@ static int __soft_offline_page(struct page *page)
>>>>  		return 0;
>>>>  	}
>>>>  
>>>> -	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
>>>> +	if (!PageHuge(page) && PageLRU(page) && !PageAnon(page))
>>>>  		/*
>>>>  		 * Try to invalidate first. This should work for
>>>>  		 * non dirty unmapped page cache pages.
>>>> -- 
>>>
>>> I foudn that with this change the following VM_BUG_ON_FOLIO() is triggered
>>> when calling soft-offline for a swapcache.  Maybe we need check both of
>>> PageAnon and PageSwapCache instead of either?
>>>
>>
>> Many thanks for your test! This is my overlook. Sorry about it! :( The root cause is that the page is
>> added into swapcache and lru( so that it can pass the HWPoisonHandlable check) but page anon is not
>> set yet due to page lock is held by __soft_offline_page. So we have the below core dump:
>>
>> [   41.232172] page:0000000033d8a20c refcount:0 mapcount:0 mapping:00000000bc103d88 index:0x36d pfn:0x14359b
>> 										^^^ page is not anon
>>
>> [   41.236576] flags: 0x57ffffc0080415(locked|uptodate|lru|owner_priv_1|swapbacked|node=1|zone=2|lastcpupid=0x1fffff)
>> 								^^^^^^^^^^^^^^^^^^ page is in swapcache
>>
>> It seems we can check !PageAnon(page) && !PageSwapCache(page), as you suggested, to fix this issue. But maybe I
>> should drop this patch because invalidate_inode_page will always return 0 for PageAnon due to folio_mapping == NULL.
>> So nothing is really done for anonymous page here. And the origin !PageSwapCache(page) check should do the right work.
> 
> Thanks for clarification.
> 
>> Or we shouldn't even try to call invalidate_inode_page with anonymous page in principle?
> 
> I think just keeping the current behavior is fine (because as you stated
> above invalidate_inode_page() simple ignores anonymous pages).
> 

Will drop this patch. Sorry for make noise. :(

> Thanks,
> Naoya Horiguchi
> 
>> BTW: PageSwapCache should be reliable here as folio_test_swapbacked is checked implicitly inside it. In such case, PG_swapcache
>> can't serve as PG_owner_priv_1 as pagecache page shouldn't set PG_swapbacked (shmem will set PG_swapbacked but PG_owner_priv_1
>> is not used anyway). Or am I miss something again?



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-12  8:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 13:03 [PATCH 0/3] A few fixup and cleanup patches for memory failure Miaohe Lin
2022-04-07 13:03 ` [PATCH 1/3] mm/memory-failure.c: avoid false-postive PageSwapCache test Miaohe Lin
2022-04-08  8:52   ` David Hildenbrand
2022-04-08 17:32     ` Yang Shi
2022-04-09  2:36       ` Miaohe Lin
2022-04-11  6:35   ` HORIGUCHI NAOYA(堀口 直也)
2022-04-11 13:19     ` Miaohe Lin
2022-04-12  6:37       ` HORIGUCHI NAOYA(堀口 直也)
2022-04-12  8:57         ` Miaohe Lin
2022-04-07 13:03 ` [PATCH 2/3] mm/memory-failure.c: minor cleanup for HWPoisonHandlable Miaohe Lin
2022-04-08  8:52   ` David Hildenbrand
2022-04-08 17:33   ` Yang Shi
2022-04-11 13:14   ` HORIGUCHI NAOYA(堀口 直也)
2022-04-07 13:03 ` [PATCH 3/3] mm/memory-failure.c: dissolve truncated hugetlb page Miaohe Lin
2022-04-11 13:13   ` HORIGUCHI NAOYA(堀口 直也)
2022-04-12  2:47     ` Miaohe Lin
2022-04-12  5:59       ` HORIGUCHI NAOYA(堀口 直也)
2022-04-12  6:10         ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).