linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: release private data before split THP
@ 2022-08-05  6:28 Yin Fengwei
  2022-08-08 17:49 ` Yang Shi
  0 siblings, 1 reply; 6+ messages in thread
From: Yin Fengwei @ 2022-08-05  6:28 UTC (permalink / raw)
  To: linux-mm, naoya.horiguchi, linmiaohe, willy, shy828301
  Cc: aaron.lu, tony.luck, qiuxu.zhuo, fengwei.yin

If there is private data attached to THP, the refcount of
THP will be increased and block the THP split. Release
private data attached to THP before split it to increase
the chance of splitting THP successfully.

There was a memory failure issue hit during HW error
injection testing with 5.18 kernel + xfs as rootfs. Test
got killed and system reboot was required to re-run the
test.

The issue was tracked down to THP split failure caused the
memory failure not being handled. The page dump showed:

[ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
[ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
[ 1785.452408] memcg:ff4247f2d28e9000
[ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
[ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
[ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000

It was like the error was injected to a large folio for xfs
with private data attached.

With private data released before split THP, the test case
could be run successfully many times without reboot system.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
---
Changelog from v1:
 - Move private release to split_huge_page_to_list
   to cover wider path per Yang's comment
 - Update to commit message

Changelog from RFC:
 - Use new folio API per Mathhew Wilcox's suggestion
 - Add one line comment before re-get folio of page per
   Miaohe's comment
 - Remove RFC tag
 - Add Co-developed-by of Qiuxu who did a lot of debugging
   work to locate where the real issue is

 mm/huge_memory.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 15965084816d..edcbc6c2bb3f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2590,6 +2590,12 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 			goto out;
 		}
 
+		if (folio_test_private(folio) &&
+				!filemap_release_folio(folio, GFP_KERNEL)) {
+			ret = -EBUSY;
+			goto out;
+		}
+
 		xas_split_alloc(&xas, head, compound_order(head),
 				mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK);
 		if (xas_error(&xas)) {

base-commit: 31be1d0fbd950395701d9fd47d8fb1f99c996f61
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm: release private data before split THP
  2022-08-05  6:28 [PATCH v2] mm: release private data before split THP Yin Fengwei
@ 2022-08-08 17:49 ` Yang Shi
  2022-08-09  1:12   ` Yin Fengwei
  0 siblings, 1 reply; 6+ messages in thread
From: Yang Shi @ 2022-08-08 17:49 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: linux-mm, naoya.horiguchi, linmiaohe, willy, aaron.lu, tony.luck,
	qiuxu.zhuo

On Thu, Aug 4, 2022 at 11:29 PM Yin Fengwei <fengwei.yin@intel.com> wrote:
>
> If there is private data attached to THP, the refcount of
> THP will be increased and block the THP split. Release
> private data attached to THP before split it to increase
> the chance of splitting THP successfully.
>
> There was a memory failure issue hit during HW error
> injection testing with 5.18 kernel + xfs as rootfs. Test
> got killed and system reboot was required to re-run the
> test.
>
> The issue was tracked down to THP split failure caused the
> memory failure not being handled. The page dump showed:
>
> [ 1785.433075] page:0000000025f9530b refcount:18 mapcount:0 mapping:000000008162eea7 index:0xa10 pfn:0x2f0200
> [ 1785.443954] head:0000000025f9530b order:4 compound_mapcount:0 compound_pincount:0
> [ 1785.452408] memcg:ff4247f2d28e9000
> [ 1785.456304] aops:xfs_address_space_operations ino:8555182 dentry name:"baseos-filenames.solvx"
> [ 1785.466612] flags: 0x1000000000012036(referenced|uptodate|lru|active|private|head|node=0|zone=2)
> [ 1785.476514] raw: 1000000000012036 ffb9460f8bc07c08 ffb9460f8bc08408 ff4247f22e6299f8
> [ 1785.485268] raw: 0000000000000a10 ff4247f194ade900 00000012ffffffff ff4247f2d28e9000
>
> It was like the error was injected to a large folio for xfs
> with private data attached.
>
> With private data released before split THP, the test case
> could be run successfully many times without reboot system.
>
> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Reviewed-by: Aaron Lu <aaron.lu@intel.com>
> ---
> Changelog from v1:
>  - Move private release to split_huge_page_to_list
>    to cover wider path per Yang's comment
>  - Update to commit message
>
> Changelog from RFC:
>  - Use new folio API per Mathhew Wilcox's suggestion
>  - Add one line comment before re-get folio of page per
>    Miaohe's comment
>  - Remove RFC tag
>  - Add Co-developed-by of Qiuxu who did a lot of debugging
>    work to locate where the real issue is
>
>  mm/huge_memory.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 15965084816d..edcbc6c2bb3f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2590,6 +2590,12 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>                         goto out;
>                 }
>
> +               if (folio_test_private(folio) &&
> +                               !filemap_release_folio(folio, GFP_KERNEL)) {

The GFP_KERNEL is fine for most THP split callsites except for the
memory reclaim path since it might not allow certain flags to avoid
recursion, for example, nested reclaim, issue I/O, etc. The most
filesystems clear __GFP_FS. However it should not be a real life
problem now since AFAIK just xfs supports large folios for now and xfs
uses iomap release_folio() method which actually ignores gfp flags.

So it sounds safer to follow the gfp convention used by
xas_split_alloc() in the below. The best way is to pass in the gfp
flag from the reclaimer IMO, but it seems overkilling at the moment.

> +                       ret = -EBUSY;
> +                       goto out;
> +               }
> +
>                 xas_split_alloc(&xas, head, compound_order(head),
>                                 mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK);
>                 if (xas_error(&xas)) {
>
> base-commit: 31be1d0fbd950395701d9fd47d8fb1f99c996f61
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm: release private data before split THP
  2022-08-08 17:49 ` Yang Shi
@ 2022-08-09  1:12   ` Yin Fengwei
  2022-08-09  9:08     ` Aaron Lu
  0 siblings, 1 reply; 6+ messages in thread
From: Yin Fengwei @ 2022-08-09  1:12 UTC (permalink / raw)
  To: Yang Shi
  Cc: linux-mm, naoya.horiguchi, linmiaohe, willy, aaron.lu, tony.luck,
	qiuxu.zhuo

Hi Yang,

On 2022/8/9 01:49, Yang Shi wrote:
> The GFP_KERNEL is fine for most THP split callsites except for the
> memory reclaim path since it might not allow certain flags to avoid
> recursion, for example, nested reclaim, issue I/O, etc. The most
> filesystems clear __GFP_FS. However it should not be a real life
> problem now since AFAIK just xfs supports large folios for now and xfs
> uses iomap release_folio() method which actually ignores gfp flags.
Thanks a lot for the valuable comments.


> 
> So it sounds safer to follow the gfp convention used by
> xas_split_alloc() in the below. The best way is to pass in the gfp
> flag from the reclaimer IMO, but it seems overkilling at the moment.

It's possible that the gfp used by xas_split_alloc has __GFP_FS/IO set.
What about to use current_gfp_context(gfp_as_xas_split_alloc)?


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm: release private data before split THP
  2022-08-09  1:12   ` Yin Fengwei
@ 2022-08-09  9:08     ` Aaron Lu
  2022-08-09 16:45       ` Yang Shi
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Lu @ 2022-08-09  9:08 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: Yang Shi, linux-mm, naoya.horiguchi, linmiaohe, willy, tony.luck,
	qiuxu.zhuo

On Tue, Aug 09, 2022 at 09:12:57AM +0800, Yin Fengwei wrote:
> Hi Yang,
> 
> On 2022/8/9 01:49, Yang Shi wrote:
> > The GFP_KERNEL is fine for most THP split callsites except for the
> > memory reclaim path since it might not allow certain flags to avoid
> > recursion, for example, nested reclaim, issue I/O, etc. The most
> > filesystems clear __GFP_FS. However it should not be a real life
> > problem now since AFAIK just xfs supports large folios for now and xfs
> > uses iomap release_folio() method which actually ignores gfp flags.
> Thanks a lot for the valuable comments.
> 
> 
> > 
> > So it sounds safer to follow the gfp convention used by
> > xas_split_alloc() in the below. The best way is to pass in the gfp
> > flag from the reclaimer IMO, but it seems overkilling at the moment.
> 
> It's possible that the gfp used by xas_split_alloc has __GFP_FS/IO set.
> What about to use current_gfp_context(gfp_as_xas_split_alloc)?
> 

Sounds reasonable to me.

Also, the gfp used by xas_split_alloc() should also be modified to:
current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK)?
Since they are in the same context.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm: release private data before split THP
  2022-08-09  9:08     ` Aaron Lu
@ 2022-08-09 16:45       ` Yang Shi
  2022-08-09 23:55         ` Yin Fengwei
  0 siblings, 1 reply; 6+ messages in thread
From: Yang Shi @ 2022-08-09 16:45 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Yin Fengwei, linux-mm, naoya.horiguchi, linmiaohe, willy,
	tony.luck, qiuxu.zhuo

On Tue, Aug 9, 2022 at 2:08 AM Aaron Lu <aaron.lu@intel.com> wrote:
>
> On Tue, Aug 09, 2022 at 09:12:57AM +0800, Yin Fengwei wrote:
> > Hi Yang,
> >
> > On 2022/8/9 01:49, Yang Shi wrote:
> > > The GFP_KERNEL is fine for most THP split callsites except for the
> > > memory reclaim path since it might not allow certain flags to avoid
> > > recursion, for example, nested reclaim, issue I/O, etc. The most
> > > filesystems clear __GFP_FS. However it should not be a real life
> > > problem now since AFAIK just xfs supports large folios for now and xfs
> > > uses iomap release_folio() method which actually ignores gfp flags.
> > Thanks a lot for the valuable comments.
> >
> >
> > >
> > > So it sounds safer to follow the gfp convention used by
> > > xas_split_alloc() in the below. The best way is to pass in the gfp
> > > flag from the reclaimer IMO, but it seems overkilling at the moment.
> >
> > It's possible that the gfp used by xas_split_alloc has __GFP_FS/IO set.
> > What about to use current_gfp_context(gfp_as_xas_split_alloc)?
> >
>
> Sounds reasonable to me.
>
> Also, the gfp used by xas_split_alloc() should also be modified to:
> current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK)?
> Since they are in the same context.

Good point, fine to me.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm: release private data before split THP
  2022-08-09 16:45       ` Yang Shi
@ 2022-08-09 23:55         ` Yin Fengwei
  0 siblings, 0 replies; 6+ messages in thread
From: Yin Fengwei @ 2022-08-09 23:55 UTC (permalink / raw)
  To: Yang Shi, Aaron Lu
  Cc: linux-mm, naoya.horiguchi, linmiaohe, willy, tony.luck, qiuxu.zhuo

On 2022/8/10 00:45, Yang Shi wrote:
> On Tue, Aug 9, 2022 at 2:08 AM Aaron Lu <aaron.lu@intel.com> wrote:
>>
>> On Tue, Aug 09, 2022 at 09:12:57AM +0800, Yin Fengwei wrote:
>>> Hi Yang,
>>>
>>> On 2022/8/9 01:49, Yang Shi wrote:
>>>> The GFP_KERNEL is fine for most THP split callsites except for the
>>>> memory reclaim path since it might not allow certain flags to avoid
>>>> recursion, for example, nested reclaim, issue I/O, etc. The most
>>>> filesystems clear __GFP_FS. However it should not be a real life
>>>> problem now since AFAIK just xfs supports large folios for now and xfs
>>>> uses iomap release_folio() method which actually ignores gfp flags.
>>> Thanks a lot for the valuable comments.
>>>
>>>
>>>>
>>>> So it sounds safer to follow the gfp convention used by
>>>> xas_split_alloc() in the below. The best way is to pass in the gfp
>>>> flag from the reclaimer IMO, but it seems overkilling at the moment.
>>>
>>> It's possible that the gfp used by xas_split_alloc has __GFP_FS/IO set.
>>> What about to use current_gfp_context(gfp_as_xas_split_alloc)?
>>>
>>
>> Sounds reasonable to me.
>>
>> Also, the gfp used by xas_split_alloc() should also be modified to:
>> current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK)?
>> Since they are in the same context.
> 
> Good point, fine to me.
Thanks both of your a lot for the comments. I will update the patch.


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-09 23:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05  6:28 [PATCH v2] mm: release private data before split THP Yin Fengwei
2022-08-08 17:49 ` Yang Shi
2022-08-09  1:12   ` Yin Fengwei
2022-08-09  9:08     ` Aaron Lu
2022-08-09 16:45       ` Yang Shi
2022-08-09 23:55         ` Yin Fengwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).