All of lore.kernel.org
 help / color / mirror / Atom feed
* memory offline infinite loop after soft offline
@ 2019-10-11 21:32 Qian Cai
  2019-10-12 10:30 ` osalvador
  2019-10-14  8:39 ` Michal Hocko
  0 siblings, 2 replies; 27+ messages in thread
From: Qian Cai @ 2019-10-11 21:32 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-kernel, linux-mm, David Hildenbrand, Michal Hocko, Mike Kravetz

# /opt/ltp/runtest/bin/move_pages12
move_pages12.c:263: INFO: Free RAM 258988928 kB
move_pages12.c:281: INFO: Increasing 2048kB hugepages pool on node 0 to 4
move_pages12.c:291: INFO: Increasing 2048kB hugepages pool on node 8 to 4
move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 0
move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 8
move_pages12.c:197: PASS: Bug not reproduced
move_pages12.c:197: PASS: Bug not reproduced

for mem in $(ls -d /sys/devices/system/memory/memory*); do
        echo offline > $mem/state
        echo online > $mem/state
done

That LTP move_pages12 test will first madvise(MADV_SOFT_OFFLINE) for a range.
Then, one of "echo offline" will trigger an infinite loop in __offline_pages()
here,

		/* check again */
		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
					    NULL, check_pages_isolated_cb);
	} while (ret);

because check_pages_isolated_cb() always return -EBUSY from
test_pages_isolated(),


	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
						skip_hwpoisoned_pages);
        ...
        return pfn < end_pfn ? -EBUSY : 0;

The root cause is in __test_page_isolated_in_pageblock() where "pfn" is always
less than "end_pfn" because the associated page is not a PageBuddy.

	while (pfn < end_pfn) {
	...
		else
			break;

	return pfn;

Adding a dump_page() for that pfn shows,

[  101.665160][ T8885] pfn = 77501, end_pfn = 78000
[  101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[  101.665329][ T8885] flags: 0x3fffc000000000()
[  101.665391][ T8885] raw: 003fffc000000000 0000000000000000 ffffffff01dd0500
0000000000000000
[  101.665498][ T8885] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[  101.665588][ T8885] page dumped because: soft_offline
[  101.665639][ T8885] page_owner tracks the page as freed
[  101.665697][ T8885] page last allocated via order 5, migratetype Movable,
gfp_mask
0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
THISNODE)
[  101.665924][ T8885]  prep_new_page+0x3c0/0x440
[  101.665962][ T8885]  get_page_from_freelist+0x2568/0x2bb0
[  101.666059][ T8885]  __alloc_pages_nodemask+0x1b4/0x670
[  101.666115][ T8885]  alloc_fresh_huge_page+0x244/0x6e0
[  101.666183][ T8885]  alloc_migrate_huge_page+0x30/0x70
[  101.666254][ T8885]  alloc_new_node_page+0xc4/0x380
[  101.666325][ T8885]  migrate_pages+0x3b4/0x19e0
[  101.666375][ T8885]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
[  101.666464][ T8885]  kernel_move_pages+0x498/0xfc0
[  101.666520][ T8885]  sys_move_pages+0x28/0x40
[  101.666643][ T8885]  system_call+0x5c/0x68
[  101.666665][ T8885] page last free stack trace:
[  101.666704][ T8885]  __free_pages_ok+0xa4c/0xd40
[  101.666773][ T8885]  update_and_free_page+0x2dc/0x5b0
[  101.666821][ T8885]  free_huge_page+0x2dc/0x740
[  101.666875][ T8885]  __put_compound_page+0x64/0xc0
[  101.666926][ T8885]  putback_active_hugepage+0x228/0x390
[  101.666990][ T8885]  migrate_pages+0xa78/0x19e0
[  101.667048][ T8885]  soft_offline_page+0x314/0x1050
[  101.667117][ T8885]  sys_madvise+0x1068/0x1080
[  101.667185][ T8885]  system_call+0x5c/0x68

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-11 21:32 memory offline infinite loop after soft offline Qian Cai
@ 2019-10-12 10:30 ` osalvador
  2019-10-14  8:39 ` Michal Hocko
  1 sibling, 0 replies; 27+ messages in thread
From: osalvador @ 2019-10-12 10:30 UTC (permalink / raw)
  To: Qian Cai
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, David Hildenbrand,
	Michal Hocko, Mike Kravetz, owner-linux-mm

On 2019-10-11 23:32, Qian Cai wrote:
> # /opt/ltp/runtest/bin/move_pages12
> move_pages12.c:263: INFO: Free RAM 258988928 kB
> move_pages12.c:281: INFO: Increasing 2048kB hugepages pool on node 0 to 
> 4
> move_pages12.c:291: INFO: Increasing 2048kB hugepages pool on node 8 to 
> 4
> move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 0
> move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 8
> move_pages12.c:197: PASS: Bug not reproduced
> move_pages12.c:197: PASS: Bug not reproduced
> 
> for mem in $(ls -d /sys/devices/system/memory/memory*); do
>         echo offline > $mem/state
>         echo online > $mem/state
> done
> 
> That LTP move_pages12 test will first madvise(MADV_SOFT_OFFLINE) for a 
> range.
> Then, one of "echo offline" will trigger an infinite loop in 
> __offline_pages()
> here,


I did not deeply check whether this issue is really related 
soft-offline.

As is, soft-offline has a few issues that can lead to tricky problems.
Actually, lately I have received quite a few reports from customers 
where testing soft-offline
resulted in applications crashing all around.

I am working on a re-implementation to fix those issues [1].
I yet have to fix a bug I found yesterday though, but it should be quite 
trivial to fix it up, so I should be able to send a new re-spin next 
week.

[1] https://lore.kernel.org/patchwork/cover/1126173/

Thanks

> 
> 		/* check again */
> 		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
> 					    NULL, check_pages_isolated_cb);
> 	} while (ret);
> 
> because check_pages_isolated_cb() always return -EBUSY from
> test_pages_isolated(),
> 
> 
> 	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
> 						skip_hwpoisoned_pages);
>         ...
>         return pfn < end_pfn ? -EBUSY : 0;
> 
> The root cause is in __test_page_isolated_in_pageblock() where "pfn" is 
> always
> less than "end_pfn" because the associated page is not a PageBuddy.
> 
> 	while (pfn < end_pfn) {
> 	...
> 		else
> 			break;
> 
> 	return pfn;
> 
> Adding a dump_page() for that pfn shows,
> 
> [  101.665160][ T8885] pfn = 77501, end_pfn = 78000
> [  101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapcount:0
> mapping:0000000000000000 index:0x0
> [  101.665329][ T8885] flags: 0x3fffc000000000()
> [  101.665391][ T8885] raw: 003fffc000000000 0000000000000000 
> ffffffff01dd0500
> 0000000000000000
> [  101.665498][ T8885] raw: 0000000000000000 0000000000000000 
> 00000000ffffffff
> 0000000000000000
> [  101.665588][ T8885] page dumped because: soft_offline
> [  101.665639][ T8885] page_owner tracks the page as freed
> [  101.665697][ T8885] page last allocated via order 5, migratetype 
> Movable,
> gfp_mask
> 0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
> THISNODE)
> [  101.665924][ T8885]  prep_new_page+0x3c0/0x440
> [  101.665962][ T8885]  get_page_from_freelist+0x2568/0x2bb0
> [  101.666059][ T8885]  __alloc_pages_nodemask+0x1b4/0x670
> [  101.666115][ T8885]  alloc_fresh_huge_page+0x244/0x6e0
> [  101.666183][ T8885]  alloc_migrate_huge_page+0x30/0x70
> [  101.666254][ T8885]  alloc_new_node_page+0xc4/0x380
> [  101.666325][ T8885]  migrate_pages+0x3b4/0x19e0
> [  101.666375][ T8885]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
> [  101.666464][ T8885]  kernel_move_pages+0x498/0xfc0
> [  101.666520][ T8885]  sys_move_pages+0x28/0x40
> [  101.666643][ T8885]  system_call+0x5c/0x68
> [  101.666665][ T8885] page last free stack trace:
> [  101.666704][ T8885]  __free_pages_ok+0xa4c/0xd40
> [  101.666773][ T8885]  update_and_free_page+0x2dc/0x5b0
> [  101.666821][ T8885]  free_huge_page+0x2dc/0x740
> [  101.666875][ T8885]  __put_compound_page+0x64/0xc0
> [  101.666926][ T8885]  putback_active_hugepage+0x228/0x390
> [  101.666990][ T8885]  migrate_pages+0xa78/0x19e0
> [  101.667048][ T8885]  soft_offline_page+0x314/0x1050
> [  101.667117][ T8885]  sys_madvise+0x1068/0x1080
> [  101.667185][ T8885]  system_call+0x5c/0x68


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-11 21:32 memory offline infinite loop after soft offline Qian Cai
  2019-10-12 10:30 ` osalvador
@ 2019-10-14  8:39 ` Michal Hocko
  2019-10-17  9:34   ` Naoya Horiguchi
  1 sibling, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2019-10-14  8:39 UTC (permalink / raw)
  To: Qian Cai
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Fri 11-10-19 17:32:44, Qian Cai wrote:
> # /opt/ltp/runtest/bin/move_pages12
> move_pages12.c:263: INFO: Free RAM 258988928 kB
> move_pages12.c:281: INFO: Increasing 2048kB hugepages pool on node 0 to 4
> move_pages12.c:291: INFO: Increasing 2048kB hugepages pool on node 8 to 4
> move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 0
> move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 8
> move_pages12.c:197: PASS: Bug not reproduced
> move_pages12.c:197: PASS: Bug not reproduced
> 
> for mem in $(ls -d /sys/devices/system/memory/memory*); do
>         echo offline > $mem/state
>         echo online > $mem/state
> done
> 
> That LTP move_pages12 test will first madvise(MADV_SOFT_OFFLINE) for a range.
> Then, one of "echo offline" will trigger an infinite loop in __offline_pages()
> here,
> 
> 		/* check again */
> 		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
> 					    NULL, check_pages_isolated_cb);
> 	} while (ret);
> 
> because check_pages_isolated_cb() always return -EBUSY from
> test_pages_isolated(),
> 
> 
> 	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
> 						skip_hwpoisoned_pages);
>         ...
>         return pfn < end_pfn ? -EBUSY : 0;
> 
> The root cause is in __test_page_isolated_in_pageblock() where "pfn" is always
> less than "end_pfn" because the associated page is not a PageBuddy.
> 
> 	while (pfn < end_pfn) {
> 	...
> 		else
> 			break;
> 
> 	return pfn;

Hmm, this is interesting. I would expect that this would hit the
previous branch
	if (skip_hwpoisoned_pages && PageHWPoison(page))
and skip over hwpoisoned page. But I cannot seem to find that we would
mark all tail pages HWPoison as well and so any tail page seem to
confuse __test_page_isolated_in_pageblock.

Oscar is rewriting the hwpoison implementation but I am not sure
how/whether he is handling this case differently. Naoya, shouldn't we do
the following at least?
---
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 89c19c0feadb..5fb3fee16fde 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 			 * simple way to verify that as VM_BUG_ON(), though.
 			 */
 			pfn += 1 << page_order(page);
-		else if (skip_hwpoisoned_pages && PageHWPoison(page))
+		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
 			/* A HWPoisoned page cannot be also PageBuddy */
 			pfn++;
 		else
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-14  8:39 ` Michal Hocko
@ 2019-10-17  9:34   ` Naoya Horiguchi
  2019-10-17 10:01     ` Michal Hocko
  0 siblings, 1 reply; 27+ messages in thread
From: Naoya Horiguchi @ 2019-10-17  9:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Qian Cai, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> On Fri 11-10-19 17:32:44, Qian Cai wrote:
> > # /opt/ltp/runtest/bin/move_pages12
> > move_pages12.c:263: INFO: Free RAM 258988928 kB
> > move_pages12.c:281: INFO: Increasing 2048kB hugepages pool on node 0 to 4
> > move_pages12.c:291: INFO: Increasing 2048kB hugepages pool on node 8 to 4
> > move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 0
> > move_pages12.c:207: INFO: Allocating and freeing 4 hugepages on node 8
> > move_pages12.c:197: PASS: Bug not reproduced
> > move_pages12.c:197: PASS: Bug not reproduced
> > 
> > for mem in $(ls -d /sys/devices/system/memory/memory*); do
> >         echo offline > $mem/state
> >         echo online > $mem/state
> > done
> > 
> > That LTP move_pages12 test will first madvise(MADV_SOFT_OFFLINE) for a range.
> > Then, one of "echo offline" will trigger an infinite loop in __offline_pages()
> > here,
> > 
> > 		/* check again */
> > 		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
> > 					    NULL, check_pages_isolated_cb);
> > 	} while (ret);
> > 
> > because check_pages_isolated_cb() always return -EBUSY from
> > test_pages_isolated(),
> > 
> > 
> > 	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
> > 						skip_hwpoisoned_pages);
> >         ...
> >         return pfn < end_pfn ? -EBUSY : 0;
> > 
> > The root cause is in __test_page_isolated_in_pageblock() where "pfn" is always
> > less than "end_pfn" because the associated page is not a PageBuddy.
> > 
> > 	while (pfn < end_pfn) {
> > 	...
> > 		else
> > 			break;
> > 
> > 	return pfn;
> 
> Hmm, this is interesting. I would expect that this would hit the
> previous branch
> 	if (skip_hwpoisoned_pages && PageHWPoison(page))
> and skip over hwpoisoned page. But I cannot seem to find that we would
> mark all tail pages HWPoison as well and so any tail page seem to
> confuse __test_page_isolated_in_pageblock.
> 
> Oscar is rewriting the hwpoison implementation but I am not sure
> how/whether he is handling this case differently. Naoya, shouldn't we do
> the following at least?

My appology for late response.

> ---
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 89c19c0feadb..5fb3fee16fde 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>  			 * simple way to verify that as VM_BUG_ON(), though.
>  			 */
>  			pfn += 1 << page_order(page);
> -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
>  			/* A HWPoisoned page cannot be also PageBuddy */
>  			pfn++;
>  		else

This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
we seem to have this issue since the following commit,

  commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
  Author: Wen Congyang <wency@cn.fujitsu.com>
  Date:   Tue Dec 11 16:00:45 2012 -0800
  
      memory-hotplug: skip HWPoisoned page when offlining pages

and extension of LTP coverage finally discovered this.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-17  9:34   ` Naoya Horiguchi
@ 2019-10-17 10:01     ` Michal Hocko
  2019-10-17 10:03       ` David Hildenbrand
  2019-10-17 18:07         ` Qian Cai
  0 siblings, 2 replies; 27+ messages in thread
From: Michal Hocko @ 2019-10-17 10:01 UTC (permalink / raw)
  To: Qian Cai, Naoya Horiguchi
  Cc: linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
[...]
> > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > index 89c19c0feadb..5fb3fee16fde 100644
> > --- a/mm/page_isolation.c
> > +++ b/mm/page_isolation.c
> > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> >  			 * simple way to verify that as VM_BUG_ON(), though.
> >  			 */
> >  			pfn += 1 << page_order(page);
> > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> >  			/* A HWPoisoned page cannot be also PageBuddy */
> >  			pfn++;
> >  		else
> 
> This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> we seem to have this issue since the following commit,

Thanks a lot for double checking Naoya!
 
>   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
>   Author: Wen Congyang <wency@cn.fujitsu.com>
>   Date:   Tue Dec 11 16:00:45 2012 -0800
>   
>       memory-hotplug: skip HWPoisoned page when offlining pages
> 
> and extension of LTP coverage finally discovered this.

Qian, could you give the patch some testing?
---

From 441a9515dcdb29bb0ca39ff995632907d959032f Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 17 Oct 2019 11:49:15 +0200
Subject: [PATCH] hugetlb, memory_hotplug: fix HWPoisoned tail pages properly
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Qian Cai has noticed that hwpoisoned hugetlb pages prevent memory
offlining from making a forward progress. He has nailed down the issue
to be __test_page_isolated_in_pageblock always returning EBUSY because
of soft offlined page:
[  101.665160][ T8885] pfn = 77501, end_pfn = 78000
[  101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[  101.665329][ T8885] flags: 0x3fffc000000000()
[  101.665391][ T8885] raw: 003fffc000000000 0000000000000000 ffffffff01dd0500
0000000000000000
[  101.665498][ T8885] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[  101.665588][ T8885] page dumped because: soft_offline
[  101.665639][ T8885] page_owner tracks the page as freed
[  101.665697][ T8885] page last allocated via order 5, migratetype Movable,
gfp_mask
0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
THISNODE)
[  101.665924][ T8885]  prep_new_page+0x3c0/0x440
[  101.665962][ T8885]  get_page_from_freelist+0x2568/0x2bb0
[  101.666059][ T8885]  __alloc_pages_nodemask+0x1b4/0x670
[  101.666115][ T8885]  alloc_fresh_huge_page+0x244/0x6e0
[  101.666183][ T8885]  alloc_migrate_huge_page+0x30/0x70
[  101.666254][ T8885]  alloc_new_node_page+0xc4/0x380
[  101.666325][ T8885]  migrate_pages+0x3b4/0x19e0
[  101.666375][ T8885]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
[  101.666464][ T8885]  kernel_move_pages+0x498/0xfc0
[  101.666520][ T8885]  sys_move_pages+0x28/0x40
[  101.666643][ T8885]  system_call+0x5c/0x68
[  101.666665][ T8885] page last free stack trace:
[  101.666704][ T8885]  __free_pages_ok+0xa4c/0xd40
[  101.666773][ T8885]  update_and_free_page+0x2dc/0x5b0
[  101.666821][ T8885]  free_huge_page+0x2dc/0x740
[  101.666875][ T8885]  __put_compound_page+0x64/0xc0
[  101.666926][ T8885]  putback_active_hugepage+0x228/0x390
[  101.666990][ T8885]  migrate_pages+0xa78/0x19e0
[  101.667048][ T8885]  soft_offline_page+0x314/0x1050
[  101.667117][ T8885]  sys_madvise+0x1068/0x1080
[  101.667185][ T8885]  system_call+0x5c/0x68

The reason is that __test_page_isolated_in_pageblock doesn't recognize
hugetlb tail pages as the HWPoison bit is not transferred from the head
page. Pfn walker then doesn't recognize those pages and so EBUSY is
returned up the call chain.

The proper fix would be to handle HWPoison throughout the huge page but
considering there is a WIP to rework that code considerably let's go
with a simple and easily backportable workaround and simply check the
the head of a compound page for the HWPoison flag.

Reported-and-analyzed-by: Qian Cai <cai@lca.pw>
Fixes: b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
Cc: stable
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_isolation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 89c19c0feadb..5fb3fee16fde 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 			 * simple way to verify that as VM_BUG_ON(), though.
 			 */
 			pfn += 1 << page_order(page);
-		else if (skip_hwpoisoned_pages && PageHWPoison(page))
+		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
 			/* A HWPoisoned page cannot be also PageBuddy */
 			pfn++;
 		else
-- 
2.20.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-17 10:01     ` Michal Hocko
@ 2019-10-17 10:03       ` David Hildenbrand
  2019-10-17 18:07         ` Qian Cai
  1 sibling, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2019-10-17 10:03 UTC (permalink / raw)
  To: Michal Hocko, Qian Cai, Naoya Horiguchi
  Cc: linux-kernel, linux-mm, Mike Kravetz

On 17.10.19 12:01, Michal Hocko wrote:
> On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
>> On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> [...]
>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>> index 89c19c0feadb..5fb3fee16fde 100644
>>> --- a/mm/page_isolation.c
>>> +++ b/mm/page_isolation.c
>>> @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>>>   			 * simple way to verify that as VM_BUG_ON(), though.
>>>   			 */
>>>   			pfn += 1 << page_order(page);
>>> -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
>>> +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
>>>   			/* A HWPoisoned page cannot be also PageBuddy */
>>>   			pfn++;
>>>   		else
>>
>> This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
>> we seem to have this issue since the following commit,
> 
> Thanks a lot for double checking Naoya!
>   
>>    commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
>>    Author: Wen Congyang <wency@cn.fujitsu.com>
>>    Date:   Tue Dec 11 16:00:45 2012 -0800
>>    
>>        memory-hotplug: skip HWPoisoned page when offlining pages
>>
>> and extension of LTP coverage finally discovered this.
> 
> Qian, could you give the patch some testing?
> ---
> 
>  From 441a9515dcdb29bb0ca39ff995632907d959032f Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 17 Oct 2019 11:49:15 +0200
> Subject: [PATCH] hugetlb, memory_hotplug: fix HWPoisoned tail pages properly
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Qian Cai has noticed that hwpoisoned hugetlb pages prevent memory
> offlining from making a forward progress. He has nailed down the issue
> to be __test_page_isolated_in_pageblock always returning EBUSY because
> of soft offlined page:
> [  101.665160][ T8885] pfn = 77501, end_pfn = 78000
> [  101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapcount:0
> mapping:0000000000000000 index:0x0
> [  101.665329][ T8885] flags: 0x3fffc000000000()
> [  101.665391][ T8885] raw: 003fffc000000000 0000000000000000 ffffffff01dd0500
> 0000000000000000
> [  101.665498][ T8885] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [  101.665588][ T8885] page dumped because: soft_offline
> [  101.665639][ T8885] page_owner tracks the page as freed
> [  101.665697][ T8885] page last allocated via order 5, migratetype Movable,
> gfp_mask
> 0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
> THISNODE)
> [  101.665924][ T8885]  prep_new_page+0x3c0/0x440
> [  101.665962][ T8885]  get_page_from_freelist+0x2568/0x2bb0
> [  101.666059][ T8885]  __alloc_pages_nodemask+0x1b4/0x670
> [  101.666115][ T8885]  alloc_fresh_huge_page+0x244/0x6e0
> [  101.666183][ T8885]  alloc_migrate_huge_page+0x30/0x70
> [  101.666254][ T8885]  alloc_new_node_page+0xc4/0x380
> [  101.666325][ T8885]  migrate_pages+0x3b4/0x19e0
> [  101.666375][ T8885]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
> [  101.666464][ T8885]  kernel_move_pages+0x498/0xfc0
> [  101.666520][ T8885]  sys_move_pages+0x28/0x40
> [  101.666643][ T8885]  system_call+0x5c/0x68
> [  101.666665][ T8885] page last free stack trace:
> [  101.666704][ T8885]  __free_pages_ok+0xa4c/0xd40
> [  101.666773][ T8885]  update_and_free_page+0x2dc/0x5b0
> [  101.666821][ T8885]  free_huge_page+0x2dc/0x740
> [  101.666875][ T8885]  __put_compound_page+0x64/0xc0
> [  101.666926][ T8885]  putback_active_hugepage+0x228/0x390
> [  101.666990][ T8885]  migrate_pages+0xa78/0x19e0
> [  101.667048][ T8885]  soft_offline_page+0x314/0x1050
> [  101.667117][ T8885]  sys_madvise+0x1068/0x1080
> [  101.667185][ T8885]  system_call+0x5c/0x68
> 
> The reason is that __test_page_isolated_in_pageblock doesn't recognize
> hugetlb tail pages as the HWPoison bit is not transferred from the head
> page. Pfn walker then doesn't recognize those pages and so EBUSY is
> returned up the call chain.
> 
> The proper fix would be to handle HWPoison throughout the huge page but
> considering there is a WIP to rework that code considerably let's go
> with a simple and easily backportable workaround and simply check the
> the head of a compound page for the HWPoison flag.
> 
> Reported-and-analyzed-by: Qian Cai <cai@lca.pw>
> Fixes: b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>   mm/page_isolation.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 89c19c0feadb..5fb3fee16fde 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>   			 * simple way to verify that as VM_BUG_ON(), though.
>   			 */
>   			pfn += 1 << page_order(page);
> -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
>   			/* A HWPoisoned page cannot be also PageBuddy */
>   			pfn++;
>   		else
> 

With the extended description, this makes sense to me now :)

Acked-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-17 10:01     ` Michal Hocko
@ 2019-10-17 18:07         ` Qian Cai
  2019-10-17 18:07         ` Qian Cai
  1 sibling, 0 replies; 27+ messages in thread
From: Qian Cai @ 2019-10-17 18:07 UTC (permalink / raw)
  To: Michal Hocko, Naoya Horiguchi
  Cc: linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
> On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> 
> [...]
> > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > index 89c19c0feadb..5fb3fee16fde 100644
> > > --- a/mm/page_isolation.c
> > > +++ b/mm/page_isolation.c
> > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > >  			 * simple way to verify that as VM_BUG_ON(), though.
> > >  			 */
> > >  			pfn += 1 << page_order(page);
> > > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> > >  			/* A HWPoisoned page cannot be also PageBuddy */
> > >  			pfn++;
> > >  		else
> > 
> > This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> > we seem to have this issue since the following commit,
> 
> Thanks a lot for double checking Naoya!
>  
> >   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
> >   Author: Wen Congyang <wency@cn.fujitsu.com>
> >   Date:   Tue Dec 11 16:00:45 2012 -0800
> >   
> >       memory-hotplug: skip HWPoisoned page when offlining pages
> > 
> > and extension of LTP coverage finally discovered this.
> 
> Qian, could you give the patch some testing?

Unfortunately, this does not solve the problem. It looks to me that in
soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
set.

		if (PageBuddy(page_head) && page_order(page_head) >= order) {
			if (!TestSetPageHWPoison(page))
				hwpoisoned = true;

The below is the dump_page() of the compound_head().

[  113.796632][ T8907] page:c00c000800458040 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[  113.796716][ T8907] flags: 0x83fffc000000000()
[  113.796764][ T8907] raw: 083fffc000000000 0000000000000000 ffffffff00450500
0000000000000000
[  113.796870][ T8907] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[  113.796959][ T8907] page dumped because: soft offline compound_head()
[  113.797037][ T8907] page_owner tracks the page as freed
[  113.797086][ T8907] page last allocated via order 5, migratetype Movable,
gfp_mask
0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
THISNODE)
[  113.797262][ T8907]  prep_new_page+0x3c0/0x440
[  113.797316][ T8907]  get_page_from_freelist+0x2568/0x2bb0
[  113.797395][ T8907]  __alloc_pages_nodemask+0x1b4/0x670
[  113.797443][ T8907]  alloc_fresh_huge_page+0xb8/0x300
[  113.797493][ T8907]  alloc_migrate_huge_page+0x30/0x70
[  113.797550][ T8907]  alloc_new_node_page+0xc4/0x380
[  113.797609][ T8907]  migrate_pages+0x3b4/0x19e0
[  113.797649][ T8907]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
[  113.797713][ T8907]  kernel_move_pages+0x498/0xfc0
[  113.797784][ T8907]  sys_move_pages+0x28/0x40
[  113.797843][ T8907]  system_call+0x5c/0x68
[  113.797895][ T8907] page last free stack trace:
[  113.797947][ T8907]  __free_pages_ok+0xa4c/0xd40
[  113.797991][ T8907]  update_and_free_page+0x2dc/0x5b0
[  113.798059][ T8907]  free_huge_page+0x2dc/0x740
[  113.798103][ T8907]  __put_compound_page+0x64/0xc0
[  113.798171][ T8907]  putback_active_hugepage+0x228/0x390
[  113.798219][ T8907]  migrate_pages+0xa78/0x19e0
[  113.798273][ T8907]  soft_offline_page+0x314/0x1050
[  113.798319][ T8907]  sys_madvise+0x1068/0x1080
[  113.798381][ T8907]  system_call+0x5c/0x68

> ---
> 
> From 441a9515dcdb29bb0ca39ff995632907d959032f Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 17 Oct 2019 11:49:15 +0200
> Subject: [PATCH] hugetlb, memory_hotplug: fix HWPoisoned tail pages properly
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Qian Cai has noticed that hwpoisoned hugetlb pages prevent memory
> offlining from making a forward progress. He has nailed down the issue
> to be __test_page_isolated_in_pageblock always returning EBUSY because
> of soft offlined page:
> [  101.665160][ T8885] pfn = 77501, end_pfn = 78000
> [  101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapcount:0
> mapping:0000000000000000 index:0x0
> [  101.665329][ T8885] flags: 0x3fffc000000000()
> [  101.665391][ T8885] raw: 003fffc000000000 0000000000000000 ffffffff01dd0500
> 0000000000000000
> [  101.665498][ T8885] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [  101.665588][ T8885] page dumped because: soft_offline
> [  101.665639][ T8885] page_owner tracks the page as freed
> [  101.665697][ T8885] page last allocated via order 5, migratetype Movable,
> gfp_mask
> 0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
> THISNODE)
> [  101.665924][ T8885]  prep_new_page+0x3c0/0x440
> [  101.665962][ T8885]  get_page_from_freelist+0x2568/0x2bb0
> [  101.666059][ T8885]  __alloc_pages_nodemask+0x1b4/0x670
> [  101.666115][ T8885]  alloc_fresh_huge_page+0x244/0x6e0
> [  101.666183][ T8885]  alloc_migrate_huge_page+0x30/0x70
> [  101.666254][ T8885]  alloc_new_node_page+0xc4/0x380
> [  101.666325][ T8885]  migrate_pages+0x3b4/0x19e0
> [  101.666375][ T8885]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
> [  101.666464][ T8885]  kernel_move_pages+0x498/0xfc0
> [  101.666520][ T8885]  sys_move_pages+0x28/0x40
> [  101.666643][ T8885]  system_call+0x5c/0x68
> [  101.666665][ T8885] page last free stack trace:
> [  101.666704][ T8885]  __free_pages_ok+0xa4c/0xd40
> [  101.666773][ T8885]  update_and_free_page+0x2dc/0x5b0
> [  101.666821][ T8885]  free_huge_page+0x2dc/0x740
> [  101.666875][ T8885]  __put_compound_page+0x64/0xc0
> [  101.666926][ T8885]  putback_active_hugepage+0x228/0x390
> [  101.666990][ T8885]  migrate_pages+0xa78/0x19e0
> [  101.667048][ T8885]  soft_offline_page+0x314/0x1050
> [  101.667117][ T8885]  sys_madvise+0x1068/0x1080
> [  101.667185][ T8885]  system_call+0x5c/0x68
> 
> The reason is that __test_page_isolated_in_pageblock doesn't recognize
> hugetlb tail pages as the HWPoison bit is not transferred from the head
> page. Pfn walker then doesn't recognize those pages and so EBUSY is
> returned up the call chain.
> 
> The proper fix would be to handle HWPoison throughout the huge page but
> considering there is a WIP to rework that code considerably let's go
> with a simple and easily backportable workaround and simply check the
> the head of a compound page for the HWPoison flag.
> 
> Reported-and-analyzed-by: Qian Cai <cai@lca.pw>
> Fixes: b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_isolation.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 89c19c0feadb..5fb3fee16fde 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>  			 * simple way to verify that as VM_BUG_ON(), though.
>  			 */
>  			pfn += 1 << page_order(page);
> -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
>  			/* A HWPoisoned page cannot be also PageBuddy */
>  			pfn++;
>  		else
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
@ 2019-10-17 18:07         ` Qian Cai
  0 siblings, 0 replies; 27+ messages in thread
From: Qian Cai @ 2019-10-17 18:07 UTC (permalink / raw)
  To: Michal Hocko, Naoya Horiguchi
  Cc: linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
> On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> 
> [...]
> > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > index 89c19c0feadb..5fb3fee16fde 100644
> > > --- a/mm/page_isolation.c
> > > +++ b/mm/page_isolation.c
> > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > >  			 * simple way to verify that as VM_BUG_ON(), though.
> > >  			 */
> > >  			pfn += 1 << page_order(page);
> > > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> > >  			/* A HWPoisoned page cannot be also PageBuddy */
> > >  			pfn++;
> > >  		else
> > 
> > This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> > we seem to have this issue since the following commit,
> 
> Thanks a lot for double checking Naoya!
>  
> >   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
> >   Author: Wen Congyang <wency@cn.fujitsu.com>
> >   Date:   Tue Dec 11 16:00:45 2012 -0800
> >   
> >       memory-hotplug: skip HWPoisoned page when offlining pages
> > 
> > and extension of LTP coverage finally discovered this.
> 
> Qian, could you give the patch some testing?

Unfortunately, this does not solve the problem. It looks to me that in
soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
set.

		if (PageBuddy(page_head) && page_order(page_head) >= order) {
			if (!TestSetPageHWPoison(page))
				hwpoisoned = true;

The below is the dump_page() of the compound_head().

[  113.796632][ T8907] page:c00c000800458040 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[  113.796716][ T8907] flags: 0x83fffc000000000()
[  113.796764][ T8907] raw: 083fffc000000000 0000000000000000 ffffffff00450500
0000000000000000
[  113.796870][ T8907] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[  113.796959][ T8907] page dumped because: soft offline compound_head()
[  113.797037][ T8907] page_owner tracks the page as freed
[  113.797086][ T8907] page last allocated via order 5, migratetype Movable,
gfp_mask
0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
THISNODE)
[  113.797262][ T8907]  prep_new_page+0x3c0/0x440
[  113.797316][ T8907]  get_page_from_freelist+0x2568/0x2bb0
[  113.797395][ T8907]  __alloc_pages_nodemask+0x1b4/0x670
[  113.797443][ T8907]  alloc_fresh_huge_page+0xb8/0x300
[  113.797493][ T8907]  alloc_migrate_huge_page+0x30/0x70
[  113.797550][ T8907]  alloc_new_node_page+0xc4/0x380
[  113.797609][ T8907]  migrate_pages+0x3b4/0x19e0
[  113.797649][ T8907]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
[  113.797713][ T8907]  kernel_move_pages+0x498/0xfc0
[  113.797784][ T8907]  sys_move_pages+0x28/0x40
[  113.797843][ T8907]  system_call+0x5c/0x68
[  113.797895][ T8907] page last free stack trace:
[  113.797947][ T8907]  __free_pages_ok+0xa4c/0xd40
[  113.797991][ T8907]  update_and_free_page+0x2dc/0x5b0
[  113.798059][ T8907]  free_huge_page+0x2dc/0x740
[  113.798103][ T8907]  __put_compound_page+0x64/0xc0
[  113.798171][ T8907]  putback_active_hugepage+0x228/0x390
[  113.798219][ T8907]  migrate_pages+0xa78/0x19e0
[  113.798273][ T8907]  soft_offline_page+0x314/0x1050
[  113.798319][ T8907]  sys_madvise+0x1068/0x1080
[  113.798381][ T8907]  system_call+0x5c/0x68

> ---
> 
> From 441a9515dcdb29bb0ca39ff995632907d959032f Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 17 Oct 2019 11:49:15 +0200
> Subject: [PATCH] hugetlb, memory_hotplug: fix HWPoisoned tail pages properly
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Qian Cai has noticed that hwpoisoned hugetlb pages prevent memory
> offlining from making a forward progress. He has nailed down the issue
> to be __test_page_isolated_in_pageblock always returning EBUSY because
> of soft offlined page:
> [  101.665160][ T8885] pfn = 77501, end_pfn = 78000
> [  101.665245][ T8885] page:c00c000001dd4040 refcount:0 mapcount:0
> mapping:0000000000000000 index:0x0
> [  101.665329][ T8885] flags: 0x3fffc000000000()
> [  101.665391][ T8885] raw: 003fffc000000000 0000000000000000 ffffffff01dd0500
> 0000000000000000
> [  101.665498][ T8885] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [  101.665588][ T8885] page dumped because: soft_offline
> [  101.665639][ T8885] page_owner tracks the page as freed
> [  101.665697][ T8885] page last allocated via order 5, migratetype Movable,
> gfp_mask
> 0x346cca(GFP_HIGHUSER_MOVABLE|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_
> THISNODE)
> [  101.665924][ T8885]  prep_new_page+0x3c0/0x440
> [  101.665962][ T8885]  get_page_from_freelist+0x2568/0x2bb0
> [  101.666059][ T8885]  __alloc_pages_nodemask+0x1b4/0x670
> [  101.666115][ T8885]  alloc_fresh_huge_page+0x244/0x6e0
> [  101.666183][ T8885]  alloc_migrate_huge_page+0x30/0x70
> [  101.666254][ T8885]  alloc_new_node_page+0xc4/0x380
> [  101.666325][ T8885]  migrate_pages+0x3b4/0x19e0
> [  101.666375][ T8885]  do_move_pages_to_node.isra.29.part.30+0x44/0xa0
> [  101.666464][ T8885]  kernel_move_pages+0x498/0xfc0
> [  101.666520][ T8885]  sys_move_pages+0x28/0x40
> [  101.666643][ T8885]  system_call+0x5c/0x68
> [  101.666665][ T8885] page last free stack trace:
> [  101.666704][ T8885]  __free_pages_ok+0xa4c/0xd40
> [  101.666773][ T8885]  update_and_free_page+0x2dc/0x5b0
> [  101.666821][ T8885]  free_huge_page+0x2dc/0x740
> [  101.666875][ T8885]  __put_compound_page+0x64/0xc0
> [  101.666926][ T8885]  putback_active_hugepage+0x228/0x390
> [  101.666990][ T8885]  migrate_pages+0xa78/0x19e0
> [  101.667048][ T8885]  soft_offline_page+0x314/0x1050
> [  101.667117][ T8885]  sys_madvise+0x1068/0x1080
> [  101.667185][ T8885]  system_call+0x5c/0x68
> 
> The reason is that __test_page_isolated_in_pageblock doesn't recognize
> hugetlb tail pages as the HWPoison bit is not transferred from the head
> page. Pfn walker then doesn't recognize those pages and so EBUSY is
> returned up the call chain.
> 
> The proper fix would be to handle HWPoison throughout the huge page but
> considering there is a WIP to rework that code considerably let's go
> with a simple and easily backportable workaround and simply check the
> the head of a compound page for the HWPoison flag.
> 
> Reported-and-analyzed-by: Qian Cai <cai@lca.pw>
> Fixes: b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_isolation.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 89c19c0feadb..5fb3fee16fde 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>  			 * simple way to verify that as VM_BUG_ON(), though.
>  			 */
>  			pfn += 1 << page_order(page);
> -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
>  			/* A HWPoisoned page cannot be also PageBuddy */
>  			pfn++;
>  		else
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-17 18:07         ` Qian Cai
  (?)
@ 2019-10-17 18:27         ` Michal Hocko
  2019-10-18  2:19           ` Naoya Horiguchi
  -1 siblings, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2019-10-17 18:27 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Qian Cai, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Thu 17-10-19 14:07:13, Qian Cai wrote:
> On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
> > On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> > > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> > 
> > [...]
> > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > > index 89c19c0feadb..5fb3fee16fde 100644
> > > > --- a/mm/page_isolation.c
> > > > +++ b/mm/page_isolation.c
> > > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > > >  			 * simple way to verify that as VM_BUG_ON(), though.
> > > >  			 */
> > > >  			pfn += 1 << page_order(page);
> > > > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > > > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> > > >  			/* A HWPoisoned page cannot be also PageBuddy */
> > > >  			pfn++;
> > > >  		else
> > > 
> > > This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> > > we seem to have this issue since the following commit,
> > 
> > Thanks a lot for double checking Naoya!
> >  
> > >   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
> > >   Author: Wen Congyang <wency@cn.fujitsu.com>
> > >   Date:   Tue Dec 11 16:00:45 2012 -0800
> > >   
> > >       memory-hotplug: skip HWPoisoned page when offlining pages
> > > 
> > > and extension of LTP coverage finally discovered this.
> > 
> > Qian, could you give the patch some testing?
> 
> Unfortunately, this does not solve the problem. It looks to me that in
> soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
> PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
> set.
> 
> 		if (PageBuddy(page_head) && page_order(page_head) >= order) {
> 			if (!TestSetPageHWPoison(page))
> 				hwpoisoned = true;

This is more than unexpected. How are we supposed to find out that the
page is poisoned? Any idea Naoya?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-17 18:27         ` Michal Hocko
@ 2019-10-18  2:19           ` Naoya Horiguchi
  2019-10-18  6:06             ` Michal Hocko
  2019-10-18  8:13             ` David Hildenbrand
  0 siblings, 2 replies; 27+ messages in thread
From: Naoya Horiguchi @ 2019-10-18  2:19 UTC (permalink / raw)
  To: Michal Hocko, Qian Cai
  Cc: linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Thu, Oct 17, 2019 at 08:27:59PM +0200, Michal Hocko wrote:
> On Thu 17-10-19 14:07:13, Qian Cai wrote:
> > On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
> > > On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> > > > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> > > 
> > > [...]
> > > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > > > index 89c19c0feadb..5fb3fee16fde 100644
> > > > > --- a/mm/page_isolation.c
> > > > > +++ b/mm/page_isolation.c
> > > > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > > > >  			 * simple way to verify that as VM_BUG_ON(), though.
> > > > >  			 */
> > > > >  			pfn += 1 << page_order(page);
> > > > > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > > > > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> > > > >  			/* A HWPoisoned page cannot be also PageBuddy */
> > > > >  			pfn++;
> > > > >  		else
> > > > 
> > > > This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> > > > we seem to have this issue since the following commit,
> > > 
> > > Thanks a lot for double checking Naoya!
> > >  
> > > >   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
> > > >   Author: Wen Congyang <wency@cn.fujitsu.com>
> > > >   Date:   Tue Dec 11 16:00:45 2012 -0800
> > > >   
> > > >       memory-hotplug: skip HWPoisoned page when offlining pages
> > > > 
> > > > and extension of LTP coverage finally discovered this.
> > > 
> > > Qian, could you give the patch some testing?
> > 
> > Unfortunately, this does not solve the problem. It looks to me that in
> > soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
> > PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
> > set.
> > 
> > 		if (PageBuddy(page_head) && page_order(page_head) >= order) {
> > 			if (!TestSetPageHWPoison(page))
> > 				hwpoisoned = true;
> 
> This is more than unexpected. How are we supposed to find out that the
> page is poisoned? Any idea Naoya?

# sorry for my poor review...

We set PG_hwpoison bit only on the head page for hugetlb, that's because
we handle multiple pages as a single one for hugetlb. So it's enough
to check isolation only on the head page.  Simply skipping pfn cursor to
the page after the hugepage should avoid the infinite loop:

  @@ -274,9 +274,13 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
   			 * simple way to verify that as VM_BUG_ON(), though.
   			 */
   			pfn += 1 << page_order(page);
  -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
  -			/* A HWPoisoned page cannot be also PageBuddy */
  -			pfn++;
  +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
  +			/*
  +			 * A HWPoisoned page cannot be also PageBuddy.
  +			 * PG_hwpoison could be set only on the head page in
  +			 * hugetlb case, so no need to check tail pages.
  +			 */
  +			pfn += 1 << compound_order(page);
   		else
   			break;
   	}

Qian, could you please try this?

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  2:19           ` Naoya Horiguchi
@ 2019-10-18  6:06             ` Michal Hocko
  2019-10-18  6:32               ` Naoya Horiguchi
  2019-10-18  8:13             ` David Hildenbrand
  1 sibling, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2019-10-18  6:06 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Qian Cai, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Fri 18-10-19 02:19:06, Naoya Horiguchi wrote:
> On Thu, Oct 17, 2019 at 08:27:59PM +0200, Michal Hocko wrote:
> > On Thu 17-10-19 14:07:13, Qian Cai wrote:
> > > On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
> > > > On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> > > > > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> > > > 
> > > > [...]
> > > > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > > > > index 89c19c0feadb..5fb3fee16fde 100644
> > > > > > --- a/mm/page_isolation.c
> > > > > > +++ b/mm/page_isolation.c
> > > > > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > > > > >  			 * simple way to verify that as VM_BUG_ON(), though.
> > > > > >  			 */
> > > > > >  			pfn += 1 << page_order(page);
> > > > > > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > > > > > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> > > > > >  			/* A HWPoisoned page cannot be also PageBuddy */
> > > > > >  			pfn++;
> > > > > >  		else
> > > > > 
> > > > > This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> > > > > we seem to have this issue since the following commit,
> > > > 
> > > > Thanks a lot for double checking Naoya!
> > > >  
> > > > >   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
> > > > >   Author: Wen Congyang <wency@cn.fujitsu.com>
> > > > >   Date:   Tue Dec 11 16:00:45 2012 -0800
> > > > >   
> > > > >       memory-hotplug: skip HWPoisoned page when offlining pages
> > > > > 
> > > > > and extension of LTP coverage finally discovered this.
> > > > 
> > > > Qian, could you give the patch some testing?
> > > 
> > > Unfortunately, this does not solve the problem. It looks to me that in
> > > soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
> > > PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
> > > set.
> > > 
> > > 		if (PageBuddy(page_head) && page_order(page_head) >= order) {
> > > 			if (!TestSetPageHWPoison(page))
> > > 				hwpoisoned = true;
> > 
> > This is more than unexpected. How are we supposed to find out that the
> > page is poisoned? Any idea Naoya?
> 
> # sorry for my poor review...
> 
> We set PG_hwpoison bit only on the head page for hugetlb, that's because
> we handle multiple pages as a single one for hugetlb. So it's enough
> to check isolation only on the head page.  Simply skipping pfn cursor to
> the page after the hugepage should avoid the infinite loop:

But the page dump Qian provided shows that the head page doesn't have
HWPoison bit either. If it had then going pfn at a time should just work
because all tail pages would be skipped. Or do I miss something?
 
>   @@ -274,9 +274,13 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>    			 * simple way to verify that as VM_BUG_ON(), though.
>    			 */
>    			pfn += 1 << page_order(page);
>   -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
>   -			/* A HWPoisoned page cannot be also PageBuddy */
>   -			pfn++;
>   +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
>   +			/*
>   +			 * A HWPoisoned page cannot be also PageBuddy.
>   +			 * PG_hwpoison could be set only on the head page in
>   +			 * hugetlb case, so no need to check tail pages.
>   +			 */
>   +			pfn += 1 << compound_order(page);
>    		else
>    			break;
>    	}
> 
> Qian, could you please try this?
> 
> Thanks,
> Naoya Horiguchi

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  6:06             ` Michal Hocko
@ 2019-10-18  6:32               ` Naoya Horiguchi
  2019-10-18  7:33                 ` Michal Hocko
  2019-10-18 11:56                 ` Qian Cai
  0 siblings, 2 replies; 27+ messages in thread
From: Naoya Horiguchi @ 2019-10-18  6:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Qian Cai, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Fri, Oct 18, 2019 at 08:06:35AM +0200, Michal Hocko wrote:
> On Fri 18-10-19 02:19:06, Naoya Horiguchi wrote:
> > On Thu, Oct 17, 2019 at 08:27:59PM +0200, Michal Hocko wrote:
> > > On Thu 17-10-19 14:07:13, Qian Cai wrote:
> > > > On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
> > > > > On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> > > > > > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> > > > > 
> > > > > [...]
> > > > > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > > > > > index 89c19c0feadb..5fb3fee16fde 100644
> > > > > > > --- a/mm/page_isolation.c
> > > > > > > +++ b/mm/page_isolation.c
> > > > > > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > > > > > >  			 * simple way to verify that as VM_BUG_ON(), though.
> > > > > > >  			 */
> > > > > > >  			pfn += 1 << page_order(page);
> > > > > > > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > > > > > > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> > > > > > >  			/* A HWPoisoned page cannot be also PageBuddy */
> > > > > > >  			pfn++;
> > > > > > >  		else
> > > > > > 
> > > > > > This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> > > > > > we seem to have this issue since the following commit,
> > > > > 
> > > > > Thanks a lot for double checking Naoya!
> > > > >  
> > > > > >   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
> > > > > >   Author: Wen Congyang <wency@cn.fujitsu.com>
> > > > > >   Date:   Tue Dec 11 16:00:45 2012 -0800
> > > > > >   
> > > > > >       memory-hotplug: skip HWPoisoned page when offlining pages
> > > > > > 
> > > > > > and extension of LTP coverage finally discovered this.
> > > > > 
> > > > > Qian, could you give the patch some testing?
> > > > 
> > > > Unfortunately, this does not solve the problem. It looks to me that in
> > > > soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
> > > > PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
> > > > set.
> > > > 
> > > > 		if (PageBuddy(page_head) && page_order(page_head) >= order) {
> > > > 			if (!TestSetPageHWPoison(page))
> > > > 				hwpoisoned = true;
> > > 
> > > This is more than unexpected. How are we supposed to find out that the
> > > page is poisoned? Any idea Naoya?
> > 
> > # sorry for my poor review...
> > 
> > We set PG_hwpoison bit only on the head page for hugetlb, that's because
> > we handle multiple pages as a single one for hugetlb. So it's enough
> > to check isolation only on the head page.  Simply skipping pfn cursor to
> > the page after the hugepage should avoid the infinite loop:
> 
> But the page dump Qian provided shows that the head page doesn't have
> HWPoison bit either. If it had then going pfn at a time should just work
> because all tail pages would be skipped. Or do I miss something?

You're right, then I don't see how this happens. If the error hugepage was
isolated without having PG_hwpoison set, it's unexpected and problematic.
I'm testing myself with v5.4-rc2 (simply ran move_pages12 and did hotremove/hotadd)
but don't reproduce the issue yet.  Do we need specific kernel version/config
to trigger this?

Thanks,
Naoya Horiguchi

>  
> >   @@ -274,9 +274,13 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> >    			 * simple way to verify that as VM_BUG_ON(), though.
> >    			 */
> >    			pfn += 1 << page_order(page);
> >   -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> >   -			/* A HWPoisoned page cannot be also PageBuddy */
> >   -			pfn++;
> >   +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> >   +			/*
> >   +			 * A HWPoisoned page cannot be also PageBuddy.
> >   +			 * PG_hwpoison could be set only on the head page in
> >   +			 * hugetlb case, so no need to check tail pages.
> >   +			 */
> >   +			pfn += 1 << compound_order(page);
> >    		else
> >    			break;
> >    	}
> > 
> > Qian, could you please try this?
> > 
> > Thanks,
> > Naoya Horiguchi
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  6:32               ` Naoya Horiguchi
@ 2019-10-18  7:33                 ` Michal Hocko
  2019-10-18  8:46                   ` Naoya Horiguchi
  2019-10-18 11:56                 ` Qian Cai
  1 sibling, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2019-10-18  7:33 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Qian Cai, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Fri 18-10-19 06:32:22, Naoya Horiguchi wrote:
> On Fri, Oct 18, 2019 at 08:06:35AM +0200, Michal Hocko wrote:
> > On Fri 18-10-19 02:19:06, Naoya Horiguchi wrote:
> > > On Thu, Oct 17, 2019 at 08:27:59PM +0200, Michal Hocko wrote:
> > > > On Thu 17-10-19 14:07:13, Qian Cai wrote:
> > > > > On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
> > > > > > On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> > > > > > > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> > > > > > 
> > > > > > [...]
> > > > > > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > > > > > > index 89c19c0feadb..5fb3fee16fde 100644
> > > > > > > > --- a/mm/page_isolation.c
> > > > > > > > +++ b/mm/page_isolation.c
> > > > > > > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > > > > > > >  			 * simple way to verify that as VM_BUG_ON(), though.
> > > > > > > >  			 */
> > > > > > > >  			pfn += 1 << page_order(page);
> > > > > > > > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > > > > > > > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> > > > > > > >  			/* A HWPoisoned page cannot be also PageBuddy */
> > > > > > > >  			pfn++;
> > > > > > > >  		else
> > > > > > > 
> > > > > > > This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> > > > > > > we seem to have this issue since the following commit,
> > > > > > 
> > > > > > Thanks a lot for double checking Naoya!
> > > > > >  
> > > > > > >   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
> > > > > > >   Author: Wen Congyang <wency@cn.fujitsu.com>
> > > > > > >   Date:   Tue Dec 11 16:00:45 2012 -0800
> > > > > > >   
> > > > > > >       memory-hotplug: skip HWPoisoned page when offlining pages
> > > > > > > 
> > > > > > > and extension of LTP coverage finally discovered this.
> > > > > > 
> > > > > > Qian, could you give the patch some testing?
> > > > > 
> > > > > Unfortunately, this does not solve the problem. It looks to me that in
> > > > > soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
> > > > > PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
> > > > > set.
> > > > > 
> > > > > 		if (PageBuddy(page_head) && page_order(page_head) >= order) {
> > > > > 			if (!TestSetPageHWPoison(page))
> > > > > 				hwpoisoned = true;
> > > > 
> > > > This is more than unexpected. How are we supposed to find out that the
> > > > page is poisoned? Any idea Naoya?
> > > 
> > > # sorry for my poor review...
> > > 
> > > We set PG_hwpoison bit only on the head page for hugetlb, that's because
> > > we handle multiple pages as a single one for hugetlb. So it's enough
> > > to check isolation only on the head page.  Simply skipping pfn cursor to
> > > the page after the hugepage should avoid the infinite loop:
> > 
> > But the page dump Qian provided shows that the head page doesn't have
> > HWPoison bit either. If it had then going pfn at a time should just work
> > because all tail pages would be skipped. Or do I miss something?
> 
> You're right, then I don't see how this happens.

OK, this is a bit relieving. I thought that there are legitimate cases
when none of the hugetlb gets the HWPoison bit (e.g. when the page has 0
reference count which is the case here). That would be utterly broken
because we would have no way to tell the page is hwpoisoned.

Anyway, do you think the patch as I've posted makes sense regardless
another potential problem? Or would you like to resend yours which skips
over tail pages at once?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  2:19           ` Naoya Horiguchi
  2019-10-18  6:06             ` Michal Hocko
@ 2019-10-18  8:13             ` David Hildenbrand
  2019-10-18  8:24               ` Michal Hocko
  1 sibling, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2019-10-18  8:13 UTC (permalink / raw)
  To: Naoya Horiguchi, Michal Hocko, Qian Cai
  Cc: linux-kernel, linux-mm, Mike Kravetz

On 18.10.19 04:19, Naoya Horiguchi wrote:
> On Thu, Oct 17, 2019 at 08:27:59PM +0200, Michal Hocko wrote:
>> On Thu 17-10-19 14:07:13, Qian Cai wrote:
>>> On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
>>>> On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
>>>>> On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
>>>>
>>>> [...]
>>>>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>>>>> index 89c19c0feadb..5fb3fee16fde 100644
>>>>>> --- a/mm/page_isolation.c
>>>>>> +++ b/mm/page_isolation.c
>>>>>> @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>>>>>>   			 * simple way to verify that as VM_BUG_ON(), though.
>>>>>>   			 */
>>>>>>   			pfn += 1 << page_order(page);
>>>>>> -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
>>>>>> +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
>>>>>>   			/* A HWPoisoned page cannot be also PageBuddy */
>>>>>>   			pfn++;
>>>>>>   		else
>>>>>
>>>>> This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
>>>>> we seem to have this issue since the following commit,
>>>>
>>>> Thanks a lot for double checking Naoya!
>>>>   
>>>>>    commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
>>>>>    Author: Wen Congyang <wency@cn.fujitsu.com>
>>>>>    Date:   Tue Dec 11 16:00:45 2012 -0800
>>>>>    
>>>>>        memory-hotplug: skip HWPoisoned page when offlining pages
>>>>>
>>>>> and extension of LTP coverage finally discovered this.
>>>>
>>>> Qian, could you give the patch some testing?
>>>
>>> Unfortunately, this does not solve the problem. It looks to me that in
>>> soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
>>> PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
>>> set.
>>>
>>> 		if (PageBuddy(page_head) && page_order(page_head) >= order) {
>>> 			if (!TestSetPageHWPoison(page))
>>> 				hwpoisoned = true;
>>
>> This is more than unexpected. How are we supposed to find out that the
>> page is poisoned? Any idea Naoya?
> 
> # sorry for my poor review...
> 
> We set PG_hwpoison bit only on the head page for hugetlb, that's because
> we handle multiple pages as a single one for hugetlb. So it's enough
> to check isolation only on the head page.  Simply skipping pfn cursor to
> the page after the hugepage should avoid the infinite loop:
> 
>    @@ -274,9 +274,13 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>     			 * simple way to verify that as VM_BUG_ON(), though.
>     			 */
>     			pfn += 1 << page_order(page);
>    -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
>    -			/* A HWPoisoned page cannot be also PageBuddy */
>    -			pfn++;
>    +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
>    +			/*
>    +			 * A HWPoisoned page cannot be also PageBuddy.
>    +			 * PG_hwpoison could be set only on the head page in
>    +			 * hugetlb case, so no need to check tail pages.
>    +			 */
>    +			pfn += 1 << compound_order(page);
>     		else
>     			break;
>     	}
> 
> Qian, could you please try this?

That should result in the same behavior if I'm not wrong.

a) Checking the head, skipping over the tail
b) Checking for every tail the head

However, if the compound page spans multiple pageblocks, I am not sure 
if your change is correct. (if you don't start at the compoound head - 
when page != compoound_head(page), you would still jump 1 << 
compound_order(page))

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  8:13             ` David Hildenbrand
@ 2019-10-18  8:24               ` Michal Hocko
  2019-10-18  8:38                 ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2019-10-18  8:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Naoya Horiguchi, Qian Cai, linux-kernel, linux-mm, Mike Kravetz

On Fri 18-10-19 10:13:36, David Hildenbrand wrote:
[...]
> However, if the compound page spans multiple pageblocks

Although hugetlb pages spanning pageblocks are possible this shouldn't
matter in__test_page_isolated_in_pageblock because this function doesn't
really operate on pageblocks as the name suggests.  It is simply
traversing all valid RAM ranges (see walk_system_ram_range).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  8:24               ` Michal Hocko
@ 2019-10-18  8:38                 ` David Hildenbrand
  2019-10-18  8:55                   ` Michal Hocko
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2019-10-18  8:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, Qian Cai, linux-kernel, linux-mm, Mike Kravetz

On 18.10.19 10:24, Michal Hocko wrote:
> On Fri 18-10-19 10:13:36, David Hildenbrand wrote:
> [...]
>> However, if the compound page spans multiple pageblocks
> 
> Although hugetlb pages spanning pageblocks are possible this shouldn't
> matter in__test_page_isolated_in_pageblock because this function doesn't
> really operate on pageblocks as the name suggests.  It is simply
> traversing all valid RAM ranges (see walk_system_ram_range).

As long as the hugepages don't span memory blocks/sections, you are 
right. I have no experience with gigantic pages in this regard.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  7:33                 ` Michal Hocko
@ 2019-10-18  8:46                   ` Naoya Horiguchi
  0 siblings, 0 replies; 27+ messages in thread
From: Naoya Horiguchi @ 2019-10-18  8:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Qian Cai, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Fri, Oct 18, 2019 at 09:33:10AM +0200, Michal Hocko wrote:
> On Fri 18-10-19 06:32:22, Naoya Horiguchi wrote:
> > On Fri, Oct 18, 2019 at 08:06:35AM +0200, Michal Hocko wrote:
> > > On Fri 18-10-19 02:19:06, Naoya Horiguchi wrote:
> > > > On Thu, Oct 17, 2019 at 08:27:59PM +0200, Michal Hocko wrote:
> > > > > On Thu 17-10-19 14:07:13, Qian Cai wrote:
> > > > > > On Thu, 2019-10-17 at 12:01 +0200, Michal Hocko wrote:
> > > > > > > On Thu 17-10-19 09:34:10, Naoya Horiguchi wrote:
> > > > > > > > On Mon, Oct 14, 2019 at 10:39:14AM +0200, Michal Hocko wrote:
> > > > > > > 
> > > > > > > [...]
> > > > > > > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > > > > > > > index 89c19c0feadb..5fb3fee16fde 100644
> > > > > > > > > --- a/mm/page_isolation.c
> > > > > > > > > +++ b/mm/page_isolation.c
> > > > > > > > > @@ -274,7 +274,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> > > > > > > > >  			 * simple way to verify that as VM_BUG_ON(), though.
> > > > > > > > >  			 */
> > > > > > > > >  			pfn += 1 << page_order(page);
> > > > > > > > > -		else if (skip_hwpoisoned_pages && PageHWPoison(page))
> > > > > > > > > +		else if (skip_hwpoisoned_pages && PageHWPoison(compound_head(page)))
> > > > > > > > >  			/* A HWPoisoned page cannot be also PageBuddy */
> > > > > > > > >  			pfn++;
> > > > > > > > >  		else
> > > > > > > > 
> > > > > > > > This fix looks good to me. The original code only addresses hwpoisoned 4kB-page,
> > > > > > > > we seem to have this issue since the following commit,
> > > > > > > 
> > > > > > > Thanks a lot for double checking Naoya!
> > > > > > >  
> > > > > > > >   commit b023f46813cde6e3b8a8c24f432ff9c1fd8e9a64
> > > > > > > >   Author: Wen Congyang <wency@cn.fujitsu.com>
> > > > > > > >   Date:   Tue Dec 11 16:00:45 2012 -0800
> > > > > > > >   
> > > > > > > >       memory-hotplug: skip HWPoisoned page when offlining pages
> > > > > > > > 
> > > > > > > > and extension of LTP coverage finally discovered this.
> > > > > > > 
> > > > > > > Qian, could you give the patch some testing?
> > > > > > 
> > > > > > Unfortunately, this does not solve the problem. It looks to me that in
> > > > > > soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set
> > > > > > PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison
> > > > > > set.
> > > > > > 
> > > > > > 		if (PageBuddy(page_head) && page_order(page_head) >= order) {
> > > > > > 			if (!TestSetPageHWPoison(page))
> > > > > > 				hwpoisoned = true;
> > > > > 
> > > > > This is more than unexpected. How are we supposed to find out that the
> > > > > page is poisoned? Any idea Naoya?
> > > > 
> > > > # sorry for my poor review...
> > > > 
> > > > We set PG_hwpoison bit only on the head page for hugetlb, that's because
> > > > we handle multiple pages as a single one for hugetlb. So it's enough
> > > > to check isolation only on the head page.  Simply skipping pfn cursor to
> > > > the page after the hugepage should avoid the infinite loop:
> > > 
> > > But the page dump Qian provided shows that the head page doesn't have
> > > HWPoison bit either. If it had then going pfn at a time should just work
> > > because all tail pages would be skipped. Or do I miss something?
> > 
> > You're right, then I don't see how this happens.
> 
> OK, this is a bit relieving. I thought that there are legitimate cases
> when none of the hugetlb gets the HWPoison bit (e.g. when the page has 0
> reference count which is the case here). That would be utterly broken
> because we would have no way to tell the page is hwpoisoned.
> 
> Anyway, do you think the patch as I've posted makes sense regardless
> another potential problem? Or would you like to resend yours which skips
> over tail pages at once?

I think the patch does no harm but we may put this pending until we
understand the mechanism of the bug. I'll dig this more next week.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  8:38                 ` David Hildenbrand
@ 2019-10-18  8:55                   ` Michal Hocko
  2019-10-18 11:00                     ` David Hildenbrand
  0 siblings, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2019-10-18  8:55 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Naoya Horiguchi, Qian Cai, linux-kernel, linux-mm, Mike Kravetz

On Fri 18-10-19 10:38:21, David Hildenbrand wrote:
> On 18.10.19 10:24, Michal Hocko wrote:
> > On Fri 18-10-19 10:13:36, David Hildenbrand wrote:
> > [...]
> > > However, if the compound page spans multiple pageblocks
> > 
> > Although hugetlb pages spanning pageblocks are possible this shouldn't
> > matter in__test_page_isolated_in_pageblock because this function doesn't
> > really operate on pageblocks as the name suggests.  It is simply
> > traversing all valid RAM ranges (see walk_system_ram_range).
> 
> As long as the hugepages don't span memory blocks/sections, you are right. I
> have no experience with gigantic pages in this regard.

They can clearly span sections (1GB is larger than 128MB). Why do you
think it matters actually? walk_system_ram_range walks RAM ranges and no
allocation should span holes in RAM right?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  8:55                   ` Michal Hocko
@ 2019-10-18 11:00                     ` David Hildenbrand
  2019-10-18 11:05                       ` David Hildenbrand
  2019-10-18 11:34                       ` Michal Hocko
  0 siblings, 2 replies; 27+ messages in thread
From: David Hildenbrand @ 2019-10-18 11:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, Qian Cai, linux-kernel, linux-mm, Mike Kravetz

On 18.10.19 10:55, Michal Hocko wrote:
> On Fri 18-10-19 10:38:21, David Hildenbrand wrote:
>> On 18.10.19 10:24, Michal Hocko wrote:
>>> On Fri 18-10-19 10:13:36, David Hildenbrand wrote:
>>> [...]
>>>> However, if the compound page spans multiple pageblocks
>>>
>>> Although hugetlb pages spanning pageblocks are possible this shouldn't
>>> matter in__test_page_isolated_in_pageblock because this function doesn't
>>> really operate on pageblocks as the name suggests.  It is simply
>>> traversing all valid RAM ranges (see walk_system_ram_range).
>>
>> As long as the hugepages don't span memory blocks/sections, you are right. I
>> have no experience with gigantic pages in this regard.
> 
> They can clearly span sections (1GB is larger than 128MB). Why do you
> think it matters actually? walk_system_ram_range walks RAM ranges and no
> allocation should span holes in RAM right?
> 

Let's explore what I was thinking. If we can agree that any compound 
page is always aligned to its size , then what I tell here is not 
applicable. I know it is true for gigantic pages.

Some extreme example to clarify

[ memory block 0 (128MB) ][ memory block 1 (128MB) ]
               [ compound page (128MB)  ]

If you would offline memory block 1, and you detect PG_offline on the 
first page of that memory block (PageHWPoison(compound_head(page))), you 
would jump over the whole memory block (pfn += 1 << 
compound_order(page)), leaving 64MB of the memory block unchecked.

Again, if any compound page has the alignment restrictions (PFN of head 
aligned to 1 << compound_order(page)), this is not possible.


If it is, however, possible, the "clean" thing would be to only jump 
over the remaining part of the compound page, e.g., something like

pfn += (1 << compound_order(page)) - (page - compound_head(page)));



-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18 11:00                     ` David Hildenbrand
@ 2019-10-18 11:05                       ` David Hildenbrand
  2019-10-18 11:34                       ` Michal Hocko
  1 sibling, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2019-10-18 11:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, Qian Cai, linux-kernel, linux-mm, Mike Kravetz

On 18.10.19 13:00, David Hildenbrand wrote:
> On 18.10.19 10:55, Michal Hocko wrote:
>> On Fri 18-10-19 10:38:21, David Hildenbrand wrote:
>>> On 18.10.19 10:24, Michal Hocko wrote:
>>>> On Fri 18-10-19 10:13:36, David Hildenbrand wrote:
>>>> [...]
>>>>> However, if the compound page spans multiple pageblocks
>>>>
>>>> Although hugetlb pages spanning pageblocks are possible this shouldn't
>>>> matter in__test_page_isolated_in_pageblock because this function doesn't
>>>> really operate on pageblocks as the name suggests.  It is simply
>>>> traversing all valid RAM ranges (see walk_system_ram_range).
>>>
>>> As long as the hugepages don't span memory blocks/sections, you are right. I
>>> have no experience with gigantic pages in this regard.
>>
>> They can clearly span sections (1GB is larger than 128MB). Why do you
>> think it matters actually? walk_system_ram_range walks RAM ranges and no
>> allocation should span holes in RAM right?
>>
> 
> Let's explore what I was thinking. If we can agree that any compound
> page is always aligned to its size , then what I tell here is not
> applicable. I know it is true for gigantic pages.
> 
> Some extreme example to clarify
> 
> [ memory block 0 (128MB) ][ memory block 1 (128MB) ]
>                 [ compound page (128MB)  ]
> 
> If you would offline memory block 1, and you detect PG_offline on the

s/PG_offline/PG_hwpoison/ :)

> first page of that memory block (PageHWPoison(compound_head(page))), you
> would jump over the whole memory block (pfn += 1 <<
> compound_order(page)), leaving 64MB of the memory block unchecked.
> 
> Again, if any compound page has the alignment restrictions (PFN of head
> aligned to 1 << compound_order(page)), this is not possible.
> 
> 
> If it is, however, possible, the "clean" thing would be to only jump
> over the remaining part of the compound page, e.g., something like
> 
> pfn += (1 << compound_order(page)) - (page - compound_head(page)));
> 
> 
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18 11:00                     ` David Hildenbrand
  2019-10-18 11:05                       ` David Hildenbrand
@ 2019-10-18 11:34                       ` Michal Hocko
  2019-10-18 11:51                         ` David Hildenbrand
  1 sibling, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2019-10-18 11:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Naoya Horiguchi, Qian Cai, linux-kernel, linux-mm, Mike Kravetz

On Fri 18-10-19 13:00:45, David Hildenbrand wrote:
> On 18.10.19 10:55, Michal Hocko wrote:
> > On Fri 18-10-19 10:38:21, David Hildenbrand wrote:
> > > On 18.10.19 10:24, Michal Hocko wrote:
> > > > On Fri 18-10-19 10:13:36, David Hildenbrand wrote:
> > > > [...]
> > > > > However, if the compound page spans multiple pageblocks
> > > > 
> > > > Although hugetlb pages spanning pageblocks are possible this shouldn't
> > > > matter in__test_page_isolated_in_pageblock because this function doesn't
> > > > really operate on pageblocks as the name suggests.  It is simply
> > > > traversing all valid RAM ranges (see walk_system_ram_range).
> > > 
> > > As long as the hugepages don't span memory blocks/sections, you are right. I
> > > have no experience with gigantic pages in this regard.
> > 
> > They can clearly span sections (1GB is larger than 128MB). Why do you
> > think it matters actually? walk_system_ram_range walks RAM ranges and no
> > allocation should span holes in RAM right?
> > 
> 
> Let's explore what I was thinking. If we can agree that any compound page is
> always aligned to its size , then what I tell here is not applicable. I know
> it is true for gigantic pages.
> 
> Some extreme example to clarify
> 
> [ memory block 0 (128MB) ][ memory block 1 (128MB) ]
>               [ compound page (128MB)  ]
> 
> If you would offline memory block 1, and you detect PG_offline on the first
> page of that memory block (PageHWPoison(compound_head(page))), you would
> jump over the whole memory block (pfn += 1 << compound_order(page)), leaving
> 64MB of the memory block unchecked.
> 
> Again, if any compound page has the alignment restrictions (PFN of head
> aligned to 1 << compound_order(page)), this is not possible.
> 
> 
> If it is, however, possible, the "clean" thing would be to only jump over
> the remaining part of the compound page, e.g., something like
> 
> pfn += (1 << compound_order(page)) - (page - compound_head(page)));

OK, I see what you mean now. In other words similar to eeb0efd071d82.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18 11:34                       ` Michal Hocko
@ 2019-10-18 11:51                         ` David Hildenbrand
  0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2019-10-18 11:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, Qian Cai, linux-kernel, linux-mm, Mike Kravetz

On 18.10.19 13:34, Michal Hocko wrote:
> On Fri 18-10-19 13:00:45, David Hildenbrand wrote:
>> On 18.10.19 10:55, Michal Hocko wrote:
>>> On Fri 18-10-19 10:38:21, David Hildenbrand wrote:
>>>> On 18.10.19 10:24, Michal Hocko wrote:
>>>>> On Fri 18-10-19 10:13:36, David Hildenbrand wrote:
>>>>> [...]
>>>>>> However, if the compound page spans multiple pageblocks
>>>>>
>>>>> Although hugetlb pages spanning pageblocks are possible this shouldn't
>>>>> matter in__test_page_isolated_in_pageblock because this function doesn't
>>>>> really operate on pageblocks as the name suggests.  It is simply
>>>>> traversing all valid RAM ranges (see walk_system_ram_range).
>>>>
>>>> As long as the hugepages don't span memory blocks/sections, you are right. I
>>>> have no experience with gigantic pages in this regard.
>>>
>>> They can clearly span sections (1GB is larger than 128MB). Why do you
>>> think it matters actually? walk_system_ram_range walks RAM ranges and no
>>> allocation should span holes in RAM right?
>>>
>>
>> Let's explore what I was thinking. If we can agree that any compound page is
>> always aligned to its size , then what I tell here is not applicable. I know
>> it is true for gigantic pages.
>>
>> Some extreme example to clarify
>>
>> [ memory block 0 (128MB) ][ memory block 1 (128MB) ]
>>                [ compound page (128MB)  ]
>>
>> If you would offline memory block 1, and you detect PG_offline on the first
>> page of that memory block (PageHWPoison(compound_head(page))), you would
>> jump over the whole memory block (pfn += 1 << compound_order(page)), leaving
>> 64MB of the memory block unchecked.
>>
>> Again, if any compound page has the alignment restrictions (PFN of head
>> aligned to 1 << compound_order(page)), this is not possible.
>>
>>
>> If it is, however, possible, the "clean" thing would be to only jump over
>> the remaining part of the compound page, e.g., something like
>>
>> pfn += (1 << compound_order(page)) - (page - compound_head(page)));
> 
> OK, I see what you mean now. In other words similar to eeb0efd071d82.
> 

Exactly.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18  6:32               ` Naoya Horiguchi
  2019-10-18  7:33                 ` Michal Hocko
@ 2019-10-18 11:56                 ` Qian Cai
  2019-10-21  3:16                   ` Naoya Horiguchi
  1 sibling, 1 reply; 27+ messages in thread
From: Qian Cai @ 2019-10-18 11:56 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Michal Hocko, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]



> On Oct 18, 2019, at 2:35 AM, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> You're right, then I don't see how this happens. If the error hugepage was
> isolated without having PG_hwpoison set, it's unexpected and problematic.
> I'm testing myself with v5.4-rc2 (simply ran move_pages12 and did hotremove/hotadd)
> but don't reproduce the issue yet.  Do we need specific kernel version/config
> to trigger this?

This is reproducible on linux-next with the config. Not sure if it is reproducible on x86.

https://raw.githubusercontent.com/cailca/linux-mm/master/powerpc.config

and kernel cmdline if that matters

page_poison=on page_owner=on numa_balancing=enable \
systemd.unified_cgroup_hierarchy=1 debug_guardpage_minorder=1 \
page_alloc.shuffle=1

BTW, where does the code set PG_hwpoison for the head page?

[-- Attachment #2: Type: text/html, Size: 1391 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-18 11:56                 ` Qian Cai
@ 2019-10-21  3:16                   ` Naoya Horiguchi
  2020-05-15  2:46                     ` Qian Cai
  0 siblings, 1 reply; 27+ messages in thread
From: Naoya Horiguchi @ 2019-10-21  3:16 UTC (permalink / raw)
  To: Qian Cai
  Cc: Michal Hocko, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz

On Fri, Oct 18, 2019 at 07:56:09AM -0400, Qian Cai wrote:
> 
> 
>     On Oct 18, 2019, at 2:35 AM, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>     wrote:
> 
> 
>     You're right, then I don't see how this happens. If the error hugepage was
>     isolated without having PG_hwpoison set, it's unexpected and problematic.
>     I'm testing myself with v5.4-rc2 (simply ran move_pages12 and did hotremove
>     /hotadd)
>     but don't reproduce the issue yet.  Do we need specific kernel version/
>     config
>     to trigger this?
> 
> 
> This is reproducible on linux-next with the config. Not sure if it is
> reproducible on x86.
> 
> https://raw.githubusercontent.com/cailca/linux-mm/master/powerpc.config
> 
> and kernel cmdline if that matters
> 
> page_poison=on page_owner=on numa_balancing=enable \
> systemd.unified_cgroup_hierarchy=1 debug_guardpage_minorder=1 \
> page_alloc.shuffle=1

Thanks for the info.

> 
> BTW, where does the code set PG_hwpoison for the head page?

Precisely speaking, soft offline only sets PG_hwpoison after the target
hugepage is successfully dissolved (then it's not a hugepage any more),
so PG_hwpoison is set on the raw page in set_hwpoison_free_buddy_page().

In move_pages12 case, madvise(MADV_SOFT_OFFLINE) is called for the range
of 2 hugepages, so the expected result is that page offset 0 and 512
are marked as PG_hwpoison after injection.

Looking at your dump_page() output, the end_pfn is page offset 1
("page:c00c000800458040" is likely to point to pfn 0x11601.)
The page belongs to high order buddy free page, but doesn't have
PageBuddy nor PageHWPoison because it was not the head page or
the raw error page.

> Unfortunately, this does not solve the problem. It looks to me that in            
> soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set            
> PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison   
> set.                                                                              

Your analysis is totally correct, and this behavior will be fixed by
the change (https://lkml.org/lkml/2019/10/17/551) in Oscar's rework.
The raw error page will be taken off from buddy system and the other
subpages are properly split into lower orderer pages (we'll properly
manage PageBuddy flags). So all possible cases would be covered by
branches in __test_page_isolated_in_pageblock.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2019-10-21  3:16                   ` Naoya Horiguchi
@ 2020-05-15  2:46                     ` Qian Cai
  2020-05-15  3:48                       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 27+ messages in thread
From: Qian Cai @ 2020-05-15  2:46 UTC (permalink / raw)
  To: Naoya Horiguchi, Oscar Salvador
  Cc: Michal Hocko, linux-kernel, linux-mm, David Hildenbrand, Mike Kravetz



> On Oct 20, 2019, at 11:16 PM, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> On Fri, Oct 18, 2019 at 07:56:09AM -0400, Qian Cai wrote:
>> 
>> 
>>    On Oct 18, 2019, at 2:35 AM, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>    wrote:
>> 
>> 
>>    You're right, then I don't see how this happens. If the error hugepage was
>>    isolated without having PG_hwpoison set, it's unexpected and problematic.
>>    I'm testing myself with v5.4-rc2 (simply ran move_pages12 and did hotremove
>>    /hotadd)
>>    but don't reproduce the issue yet.  Do we need specific kernel version/
>>    config
>>    to trigger this?
>> 
>> 
>> This is reproducible on linux-next with the config. Not sure if it is
>> reproducible on x86.
>> 
>> https://raw.githubusercontent.com/cailca/linux-mm/master/powerpc.config
>> 
>> and kernel cmdline if that matters
>> 
>> page_poison=on page_owner=on numa_balancing=enable \
>> systemd.unified_cgroup_hierarchy=1 debug_guardpage_minorder=1 \
>> page_alloc.shuffle=1
> 
> Thanks for the info.
> 
>> 
>> BTW, where does the code set PG_hwpoison for the head page?
> 
> Precisely speaking, soft offline only sets PG_hwpoison after the target
> hugepage is successfully dissolved (then it's not a hugepage any more),
> so PG_hwpoison is set on the raw page in set_hwpoison_free_buddy_page().
> 
> In move_pages12 case, madvise(MADV_SOFT_OFFLINE) is called for the range
> of 2 hugepages, so the expected result is that page offset 0 and 512
> are marked as PG_hwpoison after injection.
> 
> Looking at your dump_page() output, the end_pfn is page offset 1
> ("page:c00c000800458040" is likely to point to pfn 0x11601.)
> The page belongs to high order buddy free page, but doesn't have
> PageBuddy nor PageHWPoison because it was not the head page or
> the raw error page.
> 
>> Unfortunately, this does not solve the problem. It looks to me that in            
>> soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set            
>> PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison   
>> set.                                                                              
> 
> Your analysis is totally correct, and this behavior will be fixed by
> the change (https://lkml.org/lkml/2019/10/17/551) in Oscar's rework.
> The raw error page will be taken off from buddy system and the other
> subpages are properly split into lower orderer pages (we'll properly
> manage PageBuddy flags). So all possible cases would be covered by
> branches in __test_page_isolated_in_pageblock.

Naoya, Oscar, it looks like this series was stuck.

https://lkml.org/lkml/2019/10/17/551

I can still reproduce this issue as today. Maybe it is best we could post a single patch (which one?) to fix the loop first?





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2020-05-15  2:46                     ` Qian Cai
@ 2020-05-15  3:48                       ` HORIGUCHI NAOYA(堀口 直也)
  2020-05-19  4:17                         ` Qian Cai
  0 siblings, 1 reply; 27+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-05-15  3:48 UTC (permalink / raw)
  To: Qian Cai
  Cc: Naoya Horiguchi, Oscar Salvador, Michal Hocko, linux-kernel,
	linux-mm, David Hildenbrand, Mike Kravetz

On Thu, May 14, 2020 at 10:46:33PM -0400, Qian Cai wrote:
> 
> 
> > On Oct 20, 2019, at 11:16 PM, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> > 
> > On Fri, Oct 18, 2019 at 07:56:09AM -0400, Qian Cai wrote:
> >> 
> >> 
> >>    On Oct 18, 2019, at 2:35 AM, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >>    wrote:
> >> 
> >> 
> >>    You're right, then I don't see how this happens. If the error hugepage was
> >>    isolated without having PG_hwpoison set, it's unexpected and problematic.
> >>    I'm testing myself with v5.4-rc2 (simply ran move_pages12 and did hotremove
> >>    /hotadd)
> >>    but don't reproduce the issue yet.  Do we need specific kernel version/
> >>    config
> >>    to trigger this?
> >> 
> >> 
> >> This is reproducible on linux-next with the config. Not sure if it is
> >> reproducible on x86.
> >> 
> >> https://raw.githubusercontent.com/cailca/linux-mm/master/powerpc.config
> >> 
> >> and kernel cmdline if that matters
> >> 
> >> page_poison=on page_owner=on numa_balancing=enable \
> >> systemd.unified_cgroup_hierarchy=1 debug_guardpage_minorder=1 \
> >> page_alloc.shuffle=1
> > 
> > Thanks for the info.
> > 
> >> 
> >> BTW, where does the code set PG_hwpoison for the head page?
> > 
> > Precisely speaking, soft offline only sets PG_hwpoison after the target
> > hugepage is successfully dissolved (then it's not a hugepage any more),
> > so PG_hwpoison is set on the raw page in set_hwpoison_free_buddy_page().
> > 
> > In move_pages12 case, madvise(MADV_SOFT_OFFLINE) is called for the range
> > of 2 hugepages, so the expected result is that page offset 0 and 512
> > are marked as PG_hwpoison after injection.
> > 
> > Looking at your dump_page() output, the end_pfn is page offset 1
> > ("page:c00c000800458040" is likely to point to pfn 0x11601.)
> > The page belongs to high order buddy free page, but doesn't have
> > PageBuddy nor PageHWPoison because it was not the head page or
> > the raw error page.
> > 
> >> Unfortunately, this does not solve the problem. It looks to me that in            
> >> soft_offline_huge_page(), set_hwpoison_free_buddy_page() will only set            
> >> PG_hwpoison for buddy pages, so the even the compound_head() has no PG_hwpoison   
> >> set.                                                                              
> > 
> > Your analysis is totally correct, and this behavior will be fixed by
> > the change (https://lkml.org/lkml/2019/10/17/551) in Oscar's rework.
> > The raw error page will be taken off from buddy system and the other
> > subpages are properly split into lower orderer pages (we'll properly
> > manage PageBuddy flags). So all possible cases would be covered by
> > branches in __test_page_isolated_in_pageblock.
> 
> Naoya, Oscar, it looks like this series was stuck.
> 
> https://lkml.org/lkml/2019/10/17/551
> 
> I can still reproduce this issue as today. Maybe it is best we could post a single patch (which one?) to fix the loop first?

I'm very sorry to be quiet for long, but I think that I agree with
this patchset and try to see what happend if merged into mmtom,
although we need rebaseing to latest mmotm and some basic testing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: memory offline infinite loop after soft offline
  2020-05-15  3:48                       ` HORIGUCHI NAOYA(堀口 直也)
@ 2020-05-19  4:17                         ` Qian Cai
  0 siblings, 0 replies; 27+ messages in thread
From: Qian Cai @ 2020-05-19  4:17 UTC (permalink / raw)
  To:  HORIGUCHI NAOYA(堀口 直也) 
  Cc: Naoya Horiguchi, Oscar Salvador, Michal Hocko, linux-kernel,
	linux-mm, David Hildenbrand, Mike Kravetz



> On May 14, 2020, at 11:48 PM, HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:
> 
> I'm very sorry to be quiet for long, but I think that I agree with
> this patchset and try to see what happend if merged into mmtom,
> although we need rebaseing to latest mmotm and some basic testing.

Looks like Oscar have been busy those days. Would you have time to take it over and rebase it?

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-05-19  4:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-11 21:32 memory offline infinite loop after soft offline Qian Cai
2019-10-12 10:30 ` osalvador
2019-10-14  8:39 ` Michal Hocko
2019-10-17  9:34   ` Naoya Horiguchi
2019-10-17 10:01     ` Michal Hocko
2019-10-17 10:03       ` David Hildenbrand
2019-10-17 18:07       ` Qian Cai
2019-10-17 18:07         ` Qian Cai
2019-10-17 18:27         ` Michal Hocko
2019-10-18  2:19           ` Naoya Horiguchi
2019-10-18  6:06             ` Michal Hocko
2019-10-18  6:32               ` Naoya Horiguchi
2019-10-18  7:33                 ` Michal Hocko
2019-10-18  8:46                   ` Naoya Horiguchi
2019-10-18 11:56                 ` Qian Cai
2019-10-21  3:16                   ` Naoya Horiguchi
2020-05-15  2:46                     ` Qian Cai
2020-05-15  3:48                       ` HORIGUCHI NAOYA(堀口 直也)
2020-05-19  4:17                         ` Qian Cai
2019-10-18  8:13             ` David Hildenbrand
2019-10-18  8:24               ` Michal Hocko
2019-10-18  8:38                 ` David Hildenbrand
2019-10-18  8:55                   ` Michal Hocko
2019-10-18 11:00                     ` David Hildenbrand
2019-10-18 11:05                       ` David Hildenbrand
2019-10-18 11:34                       ` Michal Hocko
2019-10-18 11:51                         ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.