* [PATCH] mm: use memalloc_nofs_save() in page_cache_ra_order()
@ 2024-04-26 11:29 Kefeng Wang
2024-04-26 18:49 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Kefeng Wang @ 2024-04-26 11:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, Kefeng Wang, zhangyi
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead
path"), ensure that page_cache_ra_order() do not attempt to reclaim
file-backed pages too, or it leads to a deadlock, found issue when
test ext4 large folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200
Call trace:
__switch_to+0x14c/0x240
__schedule+0x82c/0xdd0
schedule+0x58/0xf0
io_schedule+0x24/0xa0
__folio_lock+0x130/0x300
migrate_pages_batch+0x378/0x918
migrate_pages+0x350/0x700
compact_zone+0x63c/0xb38
compact_zone_order+0xc0/0x118
try_to_compact_pages+0xb0/0x280
__alloc_pages_direct_compact+0x98/0x248
__alloc_pages+0x510/0x1110
alloc_pages+0x9c/0x130
folio_alloc+0x20/0x78
filemap_alloc_folio+0x8c/0x1b0
page_cache_ra_order+0x174/0x308
ondemand_readahead+0x1c8/0x2b8
page_cache_async_ra+0x68/0xb8
filemap_readahead.isra.0+0x64/0xa8
filemap_get_pages+0x3fc/0x5b0
filemap_splice_read+0xf4/0x280
ext4_file_splice_read+0x2c/0x48 [ext4]
vfs_splice_read.part.0+0xa8/0x118
splice_direct_to_actor+0xbc/0x288
do_splice_direct+0x9c/0x108
do_sendfile+0x328/0x468
__arm64_sys_sendfile64+0x8c/0x148
invoke_syscall+0x4c/0x118
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x4c/0x1f8
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x188/0x190
Cc: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/readahead.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/readahead.c b/mm/readahead.c
index 63d6000103f0..c1b23989d9ca 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -494,6 +494,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
pgoff_t index = readahead_index(ractl);
pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
pgoff_t mark = index + ra->size - ra->async_size;
+ unsigned int nofs;
int err = 0;
gfp_t gfp = readahead_gfp_mask(mapping);
@@ -508,6 +509,8 @@ void page_cache_ra_order(struct readahead_control *ractl,
new_order = min_t(unsigned int, new_order, ilog2(ra->size));
}
+ /* See comment in page_cache_ra_unbounded() */
+ nofs = memalloc_nofs_save();
filemap_invalidate_lock_shared(mapping);
while (index <= limit) {
unsigned int order = new_order;
@@ -531,6 +534,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
read_pages(ractl);
filemap_invalidate_unlock_shared(mapping);
+ memalloc_nofs_restore(nofs);
/*
* If there were already pages in the page cache, then we may have
--
2.41.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: use memalloc_nofs_save() in page_cache_ra_order()
2024-04-26 11:29 [PATCH] mm: use memalloc_nofs_save() in page_cache_ra_order() Kefeng Wang
@ 2024-04-26 18:49 ` Andrew Morton
2024-04-27 3:45 ` Matthew Wilcox
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2024-04-26 18:49 UTC (permalink / raw)
To: Kefeng Wang; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-fsdevel, zhangyi
On Fri, 26 Apr 2024 19:29:38 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead
> path"), ensure that page_cache_ra_order() do not attempt to reclaim
> file-backed pages too, or it leads to a deadlock, found issue when
> test ext4 large folio.
>
> INFO: task DataXceiver for:7494 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200
> Call trace:
> __switch_to+0x14c/0x240
> __schedule+0x82c/0xdd0
> schedule+0x58/0xf0
> io_schedule+0x24/0xa0
> __folio_lock+0x130/0x300
> migrate_pages_batch+0x378/0x918
> migrate_pages+0x350/0x700
> compact_zone+0x63c/0xb38
> compact_zone_order+0xc0/0x118
> try_to_compact_pages+0xb0/0x280
> __alloc_pages_direct_compact+0x98/0x248
> __alloc_pages+0x510/0x1110
> alloc_pages+0x9c/0x130
> folio_alloc+0x20/0x78
> filemap_alloc_folio+0x8c/0x1b0
> page_cache_ra_order+0x174/0x308
> ondemand_readahead+0x1c8/0x2b8
> page_cache_async_ra+0x68/0xb8
> filemap_readahead.isra.0+0x64/0xa8
> filemap_get_pages+0x3fc/0x5b0
> filemap_splice_read+0xf4/0x280
> ext4_file_splice_read+0x2c/0x48 [ext4]
> vfs_splice_read.part.0+0xa8/0x118
> splice_direct_to_actor+0xbc/0x288
> do_splice_direct+0x9c/0x108
> do_sendfile+0x328/0x468
> __arm64_sys_sendfile64+0x8c/0x148
> invoke_syscall+0x4c/0x118
> el0_svc_common.constprop.0+0xc8/0xf0
> do_el0_svc+0x24/0x38
> el0_svc+0x4c/0x1f8
> el0t_64_sync_handler+0xc0/0xc8
> el0t_64_sync+0x188/0x190
>
> Cc: zhangyi (F) <yi.zhang@huawei.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
I'm thinking
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Cc: stable
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -494,6 +494,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
> pgoff_t index = readahead_index(ractl);
> pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
> pgoff_t mark = index + ra->size - ra->async_size;
> + unsigned int nofs;
> int err = 0;
> gfp_t gfp = readahead_gfp_mask(mapping);
>
> @@ -508,6 +509,8 @@ void page_cache_ra_order(struct readahead_control *ractl,
> new_order = min_t(unsigned int, new_order, ilog2(ra->size));
> }
>
> + /* See comment in page_cache_ra_unbounded() */
> + nofs = memalloc_nofs_save();
> filemap_invalidate_lock_shared(mapping);
> while (index <= limit) {
> unsigned int order = new_order;
> @@ -531,6 +534,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
>
> read_pages(ractl);
> filemap_invalidate_unlock_shared(mapping);
> + memalloc_nofs_restore(nofs);
>
> /*
> * If there were already pages in the page cache, then we may have
> --
> 2.41.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: use memalloc_nofs_save() in page_cache_ra_order()
2024-04-26 18:49 ` Andrew Morton
@ 2024-04-27 3:45 ` Matthew Wilcox
2024-04-28 1:08 ` Kefeng Wang
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2024-04-27 3:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: Kefeng Wang, linux-mm, linux-fsdevel, zhangyi
On Fri, Apr 26, 2024 at 11:49:05AM -0700, Andrew Morton wrote:
> On Fri, 26 Apr 2024 19:29:38 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> > io_schedule+0x24/0xa0
> > __folio_lock+0x130/0x300
> > migrate_pages_batch+0x378/0x918
> > migrate_pages+0x350/0x700
> > compact_zone+0x63c/0xb38
> > compact_zone_order+0xc0/0x118
> > try_to_compact_pages+0xb0/0x280
> > __alloc_pages_direct_compact+0x98/0x248
> > __alloc_pages+0x510/0x1110
> > alloc_pages+0x9c/0x130
> > folio_alloc+0x20/0x78
> > filemap_alloc_folio+0x8c/0x1b0
> > page_cache_ra_order+0x174/0x308
> > ondemand_readahead+0x1c8/0x2b8
>
> I'm thinking
>
> Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
> Cc: stable
I think it goes back earlier than that.
https://lore.kernel.org/linux-mm/20200128060304.GA6615@bombadil.infradead.org/
details how it can happen with the old readpages code. It's just easier
to hit now.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: use memalloc_nofs_save() in page_cache_ra_order()
2024-04-27 3:45 ` Matthew Wilcox
@ 2024-04-28 1:08 ` Kefeng Wang
0 siblings, 0 replies; 4+ messages in thread
From: Kefeng Wang @ 2024-04-28 1:08 UTC (permalink / raw)
To: Matthew Wilcox, Andrew Morton; +Cc: linux-mm, linux-fsdevel, zhangyi
On 2024/4/27 11:45, Matthew Wilcox wrote:
> On Fri, Apr 26, 2024 at 11:49:05AM -0700, Andrew Morton wrote:
>> On Fri, 26 Apr 2024 19:29:38 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>> io_schedule+0x24/0xa0
>>> __folio_lock+0x130/0x300
>>> migrate_pages_batch+0x378/0x918
>>> migrate_pages+0x350/0x700
>>> compact_zone+0x63c/0xb38
>>> compact_zone_order+0xc0/0x118
>>> try_to_compact_pages+0xb0/0x280
>>> __alloc_pages_direct_compact+0x98/0x248
>>> __alloc_pages+0x510/0x1110
>>> alloc_pages+0x9c/0x130
>>> folio_alloc+0x20/0x78
>>> filemap_alloc_folio+0x8c/0x1b0
>>> page_cache_ra_order+0x174/0x308
>>> ondemand_readahead+0x1c8/0x2b8
>>
>> I'm thinking
>>
>> Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
>> Cc: stable
>
> I think it goes back earlier than that.
> https://lore.kernel.org/linux-mm/20200128060304.GA6615@bombadil.infradead.org/
> details how it can happen with the old readpages code. It's just easier
> to hit now.
>
The page_cache_ra_order() is introduced from 793917d997df, but previous
bugfix f2c817bed58d ("mm: use memalloc_nofs_save in readahead path")
don't Cc stable, so the previous patch should be posted to stable?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-28 1:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26 11:29 [PATCH] mm: use memalloc_nofs_save() in page_cache_ra_order() Kefeng Wang
2024-04-26 18:49 ` Andrew Morton
2024-04-27 3:45 ` Matthew Wilcox
2024-04-28 1:08 ` Kefeng Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).