* [PATCH 0/2] Fix I/O high when memory almost met memcg limit @ 2024-02-01 10:08 Liu Shixin 2024-02-01 10:08 ` [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails Liu Shixin 2024-02-01 10:08 ` [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault Liu Shixin 0 siblings, 2 replies; 14+ messages in thread From: Liu Shixin @ 2024-02-01 10:08 UTC (permalink / raw) To: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox, Andrew Morton Cc: linux-fsdevel, linux-kernel, linux-mm, Liu Shixin Recently, when install package in a docker environment where the memory almost reached the memcg limit, the program have no respond severely for more than 15 minutes. During this period, the I/O is high(~1G/s) which cause other programs failed to work properly. The problem can be constructed in the following way: 1. Download the image: docker pull centos:7 2. Create a docker with 4G memory limit and 6G memsw limit(cgroupv1): docker create --name dockerhub_centos7 --cpu-period=100000 --cpu-quota=400000 --memory 4G --memory-swap 6G --cap-add=SYS_PTRACE --cap-add=SYS_ADMIN --cap-add=NET_ADMIN --cap-add=NET_RAW --pids-limit=20000 --ulimit nofile=1048576:1048576 --ulimit memlock=-1:-1 dockerhub_centos7:latest /usr/sbin/init 3. Start the docker: docker start dockerhub_centos7 4. Allocate 6094MB memory in docker. 5. run 'yum install expect'. We found that this problem is caused by a lot ot meaningless readahead. Since memory is almost met memcg limit, the readahead page will be reclaimed immediately and will readahead and reclaim again and again. These two patch will stop readahead early when memcg charge failed and will skip readahead when there are too many active refault. [1] https://lore.kernel.org/linux-mm/c2f4a2fa-3bde-72ce-66f5-db81a373fdbc@huawei.com/T/ Liu Shixin (2): mm/readahead: stop readahead loop if memcg charge fails mm/readahead: limit sync readahead while too many active refault include/linux/fs.h | 2 ++ include/linux/pagemap.h | 1 + mm/filemap.c | 16 ++++++++++++++++ mm/readahead.c | 12 ++++++++++-- 4 files changed, 29 insertions(+), 2 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails 2024-02-01 10:08 [PATCH 0/2] Fix I/O high when memory almost met memcg limit Liu Shixin @ 2024-02-01 10:08 ` Liu Shixin 2024-02-01 9:28 ` Jan Kara 2024-02-01 13:47 ` Matthew Wilcox 2024-02-01 10:08 ` [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault Liu Shixin 1 sibling, 2 replies; 14+ messages in thread From: Liu Shixin @ 2024-02-01 10:08 UTC (permalink / raw) To: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox, Andrew Morton Cc: linux-fsdevel, linux-kernel, linux-mm, Liu Shixin When a task in memcg readaheads file pages, page_cache_ra_unbounded() will try to readahead nr_to_read pages. Even if the new allocated page fails to charge, page_cache_ra_unbounded() still tries to readahead next page. This leads to too much memory reclaim. Stop readahead if mem_cgroup_charge() fails, i.e. add_to_page_cache_lru() returns -ENOMEM. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> --- mm/readahead.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 23620c57c1225..cc4abb67eb223 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -228,6 +228,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, */ for (i = 0; i < nr_to_read; i++) { struct folio *folio = xa_load(&mapping->i_pages, index + i); + int ret; if (folio && !xa_is_value(folio)) { /* @@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, folio = filemap_alloc_folio(gfp_mask, 0); if (!folio) break; - if (filemap_add_folio(mapping, folio, index + i, - gfp_mask) < 0) { + + ret = filemap_add_folio(mapping, folio, index + i, gfp_mask); + if (ret < 0) { folio_put(folio); + if (ret == -ENOMEM) + break; read_pages(ractl); ractl->_index++; i = ractl->_index + ractl->_nr_pages - index - 1; -- 2.25.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails 2024-02-01 10:08 ` [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails Liu Shixin @ 2024-02-01 9:28 ` Jan Kara 2024-02-01 13:47 ` Matthew Wilcox 1 sibling, 0 replies; 14+ messages in thread From: Jan Kara @ 2024-02-01 9:28 UTC (permalink / raw) To: Liu Shixin Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm On Thu 01-02-24 18:08:34, Liu Shixin wrote: > When a task in memcg readaheads file pages, page_cache_ra_unbounded() > will try to readahead nr_to_read pages. Even if the new allocated page > fails to charge, page_cache_ra_unbounded() still tries to readahead > next page. This leads to too much memory reclaim. > > Stop readahead if mem_cgroup_charge() fails, i.e. add_to_page_cache_lru() > returns -ENOMEM. > > Signed-off-by: Liu Shixin <liushixin2@huawei.com> > Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Makes sense. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > mm/readahead.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/readahead.c b/mm/readahead.c > index 23620c57c1225..cc4abb67eb223 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -228,6 +228,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > */ > for (i = 0; i < nr_to_read; i++) { > struct folio *folio = xa_load(&mapping->i_pages, index + i); > + int ret; > > if (folio && !xa_is_value(folio)) { > /* > @@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > folio = filemap_alloc_folio(gfp_mask, 0); > if (!folio) > break; > - if (filemap_add_folio(mapping, folio, index + i, > - gfp_mask) < 0) { > + > + ret = filemap_add_folio(mapping, folio, index + i, gfp_mask); > + if (ret < 0) { > folio_put(folio); > + if (ret == -ENOMEM) > + break; > read_pages(ractl); > ractl->_index++; > i = ractl->_index + ractl->_nr_pages - index - 1; > -- > 2.25.1 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails 2024-02-01 10:08 ` [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails Liu Shixin 2024-02-01 9:28 ` Jan Kara @ 2024-02-01 13:47 ` Matthew Wilcox 2024-02-01 13:52 ` Jan Kara 1 sibling, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2024-02-01 13:47 UTC (permalink / raw) To: Liu Shixin Cc: Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm On Thu, Feb 01, 2024 at 06:08:34PM +0800, Liu Shixin wrote: > @@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > folio = filemap_alloc_folio(gfp_mask, 0); > if (!folio) > break; > - if (filemap_add_folio(mapping, folio, index + i, > - gfp_mask) < 0) { > + > + ret = filemap_add_folio(mapping, folio, index + i, gfp_mask); > + if (ret < 0) { > folio_put(folio); > + if (ret == -ENOMEM) > + break; No, that's too early. You've still got a batch of pages which were successfully added; you have to read them. You were only off by one line though ;-) > read_pages(ractl); > ractl->_index++; > i = ractl->_index + ractl->_nr_pages - index - 1; > -- > 2.25.1 > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails 2024-02-01 13:47 ` Matthew Wilcox @ 2024-02-01 13:52 ` Jan Kara 2024-02-01 13:53 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Jan Kara @ 2024-02-01 13:52 UTC (permalink / raw) To: Matthew Wilcox Cc: Liu Shixin, Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm On Thu 01-02-24 13:47:03, Matthew Wilcox wrote: > On Thu, Feb 01, 2024 at 06:08:34PM +0800, Liu Shixin wrote: > > @@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > > folio = filemap_alloc_folio(gfp_mask, 0); > > if (!folio) > > break; > > - if (filemap_add_folio(mapping, folio, index + i, > > - gfp_mask) < 0) { > > + > > + ret = filemap_add_folio(mapping, folio, index + i, gfp_mask); > > + if (ret < 0) { > > folio_put(folio); > > + if (ret == -ENOMEM) > > + break; > > No, that's too early. You've still got a batch of pages which were > successfully added; you have to read them. You were only off by one > line though ;-) There's a read_pages() call just outside of the loop so this break is actually fine AFAICT. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails 2024-02-01 13:52 ` Jan Kara @ 2024-02-01 13:53 ` Matthew Wilcox 0 siblings, 0 replies; 14+ messages in thread From: Matthew Wilcox @ 2024-02-01 13:53 UTC (permalink / raw) To: Jan Kara Cc: Liu Shixin, Alexander Viro, Christian Brauner, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm On Thu, Feb 01, 2024 at 02:52:31PM +0100, Jan Kara wrote: > On Thu 01-02-24 13:47:03, Matthew Wilcox wrote: > > On Thu, Feb 01, 2024 at 06:08:34PM +0800, Liu Shixin wrote: > > > @@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > > > folio = filemap_alloc_folio(gfp_mask, 0); > > > if (!folio) > > > break; > > > - if (filemap_add_folio(mapping, folio, index + i, > > > - gfp_mask) < 0) { > > > + > > > + ret = filemap_add_folio(mapping, folio, index + i, gfp_mask); > > > + if (ret < 0) { > > > folio_put(folio); > > > + if (ret == -ENOMEM) > > > + break; > > > > No, that's too early. You've still got a batch of pages which were > > successfully added; you have to read them. You were only off by one > > line though ;-) > > There's a read_pages() call just outside of the loop so this break is > actually fine AFAICT. Oh, good point! I withdraw my criticism. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault 2024-02-01 10:08 [PATCH 0/2] Fix I/O high when memory almost met memcg limit Liu Shixin 2024-02-01 10:08 ` [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails Liu Shixin @ 2024-02-01 10:08 ` Liu Shixin 2024-02-01 9:37 ` Jan Kara 2024-03-05 7:07 ` Liu Shixin 1 sibling, 2 replies; 14+ messages in thread From: Liu Shixin @ 2024-02-01 10:08 UTC (permalink / raw) To: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox, Andrew Morton Cc: linux-fsdevel, linux-kernel, linux-mm, Liu Shixin When the pagefault is not for write and the refault distance is close, the page will be activated directly. If there are too many such pages in a file, that means the pages may be reclaimed immediately. In such situation, there is no positive effect to read-ahead since it will only waste IO. So collect the number of such pages and when the number is too large, stop bothering with read-ahead for a while until it decreased automatically. Define 'too large' as 10000 experientially, which can solves the problem and does not affect by the occasional active refault. Signed-off-by: Liu Shixin <liushixin2@huawei.com> --- include/linux/fs.h | 2 ++ include/linux/pagemap.h | 1 + mm/filemap.c | 16 ++++++++++++++++ mm/readahead.c | 4 ++++ 4 files changed, 23 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index ed5966a704951..f2a1825442f5a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -960,6 +960,7 @@ struct fown_struct { * the first of these pages is accessed. * @ra_pages: Maximum size of a readahead request, copied from the bdi. * @mmap_miss: How many mmap accesses missed in the page cache. + * @active_refault: Number of active page refault. * @prev_pos: The last byte in the most recent read request. * * When this structure is passed to ->readahead(), the "most recent" @@ -971,6 +972,7 @@ struct file_ra_state { unsigned int async_size; unsigned int ra_pages; unsigned int mmap_miss; + unsigned int active_refault; loff_t prev_pos; }; diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 2df35e65557d2..da9eaf985dec4 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1256,6 +1256,7 @@ struct readahead_control { pgoff_t _index; unsigned int _nr_pages; unsigned int _batch_count; + unsigned int _active_refault; bool _workingset; unsigned long _pflags; }; diff --git a/mm/filemap.c b/mm/filemap.c index 750e779c23db7..4de80592ab270 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3037,6 +3037,7 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, #ifdef CONFIG_MMU #define MMAP_LOTSAMISS (100) +#define ACTIVE_REFAULT_LIMIT (10000) /* * lock_folio_maybe_drop_mmap - lock the page, possibly dropping the mmap_lock * @vmf - the vm_fault for this fault. @@ -3142,6 +3143,18 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) if (mmap_miss > MMAP_LOTSAMISS) return fpin; + ractl._active_refault = READ_ONCE(ra->active_refault); + if (ractl._active_refault) + WRITE_ONCE(ra->active_refault, --ractl._active_refault); + + /* + * If there are a lot of refault of active pages in this file, + * that means the memory reclaim is ongoing. Stop bothering with + * read-ahead since it will only waste IO. + */ + if (ractl._active_refault >= ACTIVE_REFAULT_LIMIT) + return fpin; + /* * mmap read-around */ @@ -3151,6 +3164,9 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) ra->async_size = ra->ra_pages / 4; ractl._index = ra->start; page_cache_ra_order(&ractl, ra, 0); + + WRITE_ONCE(ra->active_refault, ractl._active_refault); + return fpin; } diff --git a/mm/readahead.c b/mm/readahead.c index cc4abb67eb223..d79bb70a232c4 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -263,6 +263,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, folio_set_readahead(folio); ractl->_workingset |= folio_test_workingset(folio); ractl->_nr_pages++; + if (unlikely(folio_test_workingset(folio))) + ractl->_active_refault++; + else if (unlikely(ractl->_active_refault)) + ractl->_active_refault--; } /* -- 2.25.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault 2024-02-01 10:08 ` [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault Liu Shixin @ 2024-02-01 9:37 ` Jan Kara 2024-02-01 10:41 ` Liu Shixin 2024-03-05 7:07 ` Liu Shixin 1 sibling, 1 reply; 14+ messages in thread From: Jan Kara @ 2024-02-01 9:37 UTC (permalink / raw) To: Liu Shixin Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm On Thu 01-02-24 18:08:35, Liu Shixin wrote: > When the pagefault is not for write and the refault distance is close, > the page will be activated directly. If there are too many such pages in > a file, that means the pages may be reclaimed immediately. > In such situation, there is no positive effect to read-ahead since it will > only waste IO. So collect the number of such pages and when the number is > too large, stop bothering with read-ahead for a while until it decreased > automatically. > > Define 'too large' as 10000 experientially, which can solves the problem > and does not affect by the occasional active refault. > > Signed-off-by: Liu Shixin <liushixin2@huawei.com> So I'm not convinced this new logic is needed. We already have ra->mmap_miss which gets incremented when a page fault has to read the page (and decremented when a page fault found the page already in cache). This should already work to detect trashing as well, shouldn't it? If it does not, why? Honza > --- > include/linux/fs.h | 2 ++ > include/linux/pagemap.h | 1 + > mm/filemap.c | 16 ++++++++++++++++ > mm/readahead.c | 4 ++++ > 4 files changed, 23 insertions(+) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index ed5966a704951..f2a1825442f5a 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -960,6 +960,7 @@ struct fown_struct { > * the first of these pages is accessed. > * @ra_pages: Maximum size of a readahead request, copied from the bdi. > * @mmap_miss: How many mmap accesses missed in the page cache. > + * @active_refault: Number of active page refault. > * @prev_pos: The last byte in the most recent read request. > * > * When this structure is passed to ->readahead(), the "most recent" > @@ -971,6 +972,7 @@ struct file_ra_state { > unsigned int async_size; > unsigned int ra_pages; > unsigned int mmap_miss; > + unsigned int active_refault; > loff_t prev_pos; > }; > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 2df35e65557d2..da9eaf985dec4 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -1256,6 +1256,7 @@ struct readahead_control { > pgoff_t _index; > unsigned int _nr_pages; > unsigned int _batch_count; > + unsigned int _active_refault; > bool _workingset; > unsigned long _pflags; > }; > diff --git a/mm/filemap.c b/mm/filemap.c > index 750e779c23db7..4de80592ab270 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3037,6 +3037,7 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, > > #ifdef CONFIG_MMU > #define MMAP_LOTSAMISS (100) > +#define ACTIVE_REFAULT_LIMIT (10000) > /* > * lock_folio_maybe_drop_mmap - lock the page, possibly dropping the mmap_lock > * @vmf - the vm_fault for this fault. > @@ -3142,6 +3143,18 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) > if (mmap_miss > MMAP_LOTSAMISS) > return fpin; > > + ractl._active_refault = READ_ONCE(ra->active_refault); > + if (ractl._active_refault) > + WRITE_ONCE(ra->active_refault, --ractl._active_refault); > + > + /* > + * If there are a lot of refault of active pages in this file, > + * that means the memory reclaim is ongoing. Stop bothering with > + * read-ahead since it will only waste IO. > + */ > + if (ractl._active_refault >= ACTIVE_REFAULT_LIMIT) > + return fpin; > + > /* > * mmap read-around > */ > @@ -3151,6 +3164,9 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) > ra->async_size = ra->ra_pages / 4; > ractl._index = ra->start; > page_cache_ra_order(&ractl, ra, 0); > + > + WRITE_ONCE(ra->active_refault, ractl._active_refault); > + > return fpin; > } > > diff --git a/mm/readahead.c b/mm/readahead.c > index cc4abb67eb223..d79bb70a232c4 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -263,6 +263,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > folio_set_readahead(folio); > ractl->_workingset |= folio_test_workingset(folio); > ractl->_nr_pages++; > + if (unlikely(folio_test_workingset(folio))) > + ractl->_active_refault++; > + else if (unlikely(ractl->_active_refault)) > + ractl->_active_refault--; > } > > /* > -- > 2.25.1 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault 2024-02-01 9:37 ` Jan Kara @ 2024-02-01 10:41 ` Liu Shixin 2024-02-01 17:31 ` Jan Kara 0 siblings, 1 reply; 14+ messages in thread From: Liu Shixin @ 2024-02-01 10:41 UTC (permalink / raw) To: Jan Kara Cc: Alexander Viro, Christian Brauner, Matthew Wilcox, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm On 2024/2/1 17:37, Jan Kara wrote: > On Thu 01-02-24 18:08:35, Liu Shixin wrote: >> When the pagefault is not for write and the refault distance is close, >> the page will be activated directly. If there are too many such pages in >> a file, that means the pages may be reclaimed immediately. >> In such situation, there is no positive effect to read-ahead since it will >> only waste IO. So collect the number of such pages and when the number is >> too large, stop bothering with read-ahead for a while until it decreased >> automatically. >> >> Define 'too large' as 10000 experientially, which can solves the problem >> and does not affect by the occasional active refault. >> >> Signed-off-by: Liu Shixin <liushixin2@huawei.com> > So I'm not convinced this new logic is needed. We already have > ra->mmap_miss which gets incremented when a page fault has to read the page > (and decremented when a page fault found the page already in cache). This > should already work to detect trashing as well, shouldn't it? If it does > not, why? > > Honza ra->mmap_miss doesn't help, it increased only one in do_sync_mmap_readahead() and then decreased one for every page in filemap_map_pages(). So in this scenario, it can't exceed MMAP_LOTSAMISS. Thanks, > >> --- >> include/linux/fs.h | 2 ++ >> include/linux/pagemap.h | 1 + >> mm/filemap.c | 16 ++++++++++++++++ >> mm/readahead.c | 4 ++++ >> 4 files changed, 23 insertions(+) >> >> diff --git a/include/linux/fs.h b/include/linux/fs.h >> index ed5966a704951..f2a1825442f5a 100644 >> --- a/include/linux/fs.h >> +++ b/include/linux/fs.h >> @@ -960,6 +960,7 @@ struct fown_struct { >> * the first of these pages is accessed. >> * @ra_pages: Maximum size of a readahead request, copied from the bdi. >> * @mmap_miss: How many mmap accesses missed in the page cache. >> + * @active_refault: Number of active page refault. >> * @prev_pos: The last byte in the most recent read request. >> * >> * When this structure is passed to ->readahead(), the "most recent" >> @@ -971,6 +972,7 @@ struct file_ra_state { >> unsigned int async_size; >> unsigned int ra_pages; >> unsigned int mmap_miss; >> + unsigned int active_refault; >> loff_t prev_pos; >> }; >> >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >> index 2df35e65557d2..da9eaf985dec4 100644 >> --- a/include/linux/pagemap.h >> +++ b/include/linux/pagemap.h >> @@ -1256,6 +1256,7 @@ struct readahead_control { >> pgoff_t _index; >> unsigned int _nr_pages; >> unsigned int _batch_count; >> + unsigned int _active_refault; >> bool _workingset; >> unsigned long _pflags; >> }; >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 750e779c23db7..4de80592ab270 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -3037,6 +3037,7 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, >> >> #ifdef CONFIG_MMU >> #define MMAP_LOTSAMISS (100) >> +#define ACTIVE_REFAULT_LIMIT (10000) >> /* >> * lock_folio_maybe_drop_mmap - lock the page, possibly dropping the mmap_lock >> * @vmf - the vm_fault for this fault. >> @@ -3142,6 +3143,18 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) >> if (mmap_miss > MMAP_LOTSAMISS) >> return fpin; >> >> + ractl._active_refault = READ_ONCE(ra->active_refault); >> + if (ractl._active_refault) >> + WRITE_ONCE(ra->active_refault, --ractl._active_refault); >> + >> + /* >> + * If there are a lot of refault of active pages in this file, >> + * that means the memory reclaim is ongoing. Stop bothering with >> + * read-ahead since it will only waste IO. >> + */ >> + if (ractl._active_refault >= ACTIVE_REFAULT_LIMIT) >> + return fpin; >> + >> /* >> * mmap read-around >> */ >> @@ -3151,6 +3164,9 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) >> ra->async_size = ra->ra_pages / 4; >> ractl._index = ra->start; >> page_cache_ra_order(&ractl, ra, 0); >> + >> + WRITE_ONCE(ra->active_refault, ractl._active_refault); >> + >> return fpin; >> } >> >> diff --git a/mm/readahead.c b/mm/readahead.c >> index cc4abb67eb223..d79bb70a232c4 100644 >> --- a/mm/readahead.c >> +++ b/mm/readahead.c >> @@ -263,6 +263,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, >> folio_set_readahead(folio); >> ractl->_workingset |= folio_test_workingset(folio); >> ractl->_nr_pages++; >> + if (unlikely(folio_test_workingset(folio))) >> + ractl->_active_refault++; >> + else if (unlikely(ractl->_active_refault)) >> + ractl->_active_refault--; >> } >> >> /* >> -- >> 2.25.1 >> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault 2024-02-01 10:41 ` Liu Shixin @ 2024-02-01 17:31 ` Jan Kara 2024-02-02 1:25 ` Liu Shixin 2024-02-02 9:02 ` Liu Shixin 0 siblings, 2 replies; 14+ messages in thread From: Jan Kara @ 2024-02-01 17:31 UTC (permalink / raw) To: Liu Shixin Cc: Jan Kara, Alexander Viro, Christian Brauner, Matthew Wilcox, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm [-- Attachment #1: Type: text/plain, Size: 1646 bytes --] On Thu 01-02-24 18:41:30, Liu Shixin wrote: > On 2024/2/1 17:37, Jan Kara wrote: > > On Thu 01-02-24 18:08:35, Liu Shixin wrote: > >> When the pagefault is not for write and the refault distance is close, > >> the page will be activated directly. If there are too many such pages in > >> a file, that means the pages may be reclaimed immediately. > >> In such situation, there is no positive effect to read-ahead since it will > >> only waste IO. So collect the number of such pages and when the number is > >> too large, stop bothering with read-ahead for a while until it decreased > >> automatically. > >> > >> Define 'too large' as 10000 experientially, which can solves the problem > >> and does not affect by the occasional active refault. > >> > >> Signed-off-by: Liu Shixin <liushixin2@huawei.com> > > So I'm not convinced this new logic is needed. We already have > > ra->mmap_miss which gets incremented when a page fault has to read the page > > (and decremented when a page fault found the page already in cache). This > > should already work to detect trashing as well, shouldn't it? If it does > > not, why? > > > > Honza > ra->mmap_miss doesn't help, it increased only one in do_sync_mmap_readahead() > and then decreased one for every page in filemap_map_pages(). So in this scenario, > it can't exceed MMAP_LOTSAMISS. I see, OK. But that's a (longstanding) bug in how mmap_miss is handled. Can you please test whether attached patches fix the trashing for you? At least now I can see mmap_miss properly increments when we are hitting uncached pages... Thanks! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR [-- Attachment #2: 0001-mm-readahead-Improve-page-readaround-miss-detection.patch --] [-- Type: text/x-patch, Size: 4487 bytes --] From f7879373941b7a90b0d30967ec798c000a5ef7b1 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Thu, 1 Feb 2024 14:56:33 +0100 Subject: [PATCH 1/2] mm/readahead: Improve page readaround miss detection filemap_map_pages() decreases ra->mmap_miss for every page it maps. This however overestimates number of real cache hits because we have no idea whether the application will use the pages we map or not. This is problematic in particular in memory constrained situations where we think we have great readahead success rate although in fact we are just trashing page cache & disk. Change filemap_map_pages() to count only success of mapping the page we are faulting in. This should be actually enough to keep mmap_miss close to 0 for workloads doing sequential reads because filemap_map_pages() does not map page with readahead flag and thus these are going to contribute to decreasing the mmap_miss counter. Reported-by: Liu Shixin <liushixin2@huawei.com> Fixes: f1820361f83d ("mm: implement ->map_pages for page cache") Signed-off-by: Jan Kara <jack@suse.cz> --- mm/filemap.c | 39 ++++++++++++++++++++++----------------- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 750e779c23db..0b843f99407c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3436,8 +3436,7 @@ static struct folio *next_uptodate_folio(struct xa_state *xas, */ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, struct folio *folio, unsigned long start, - unsigned long addr, unsigned int nr_pages, - unsigned int *mmap_miss) + unsigned long addr, unsigned int nr_pages) { vm_fault_t ret = 0; struct page *page = folio_page(folio, start); @@ -3448,8 +3447,6 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, if (PageHWPoison(page + count)) goto skip; - (*mmap_miss)++; - /* * NOTE: If there're PTE markers, we'll leave them to be * handled in the specific fault path, and it'll prohibit the @@ -3488,8 +3485,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, } static vm_fault_t filemap_map_order0_folio(struct vm_fault *vmf, - struct folio *folio, unsigned long addr, - unsigned int *mmap_miss) + struct folio *folio, unsigned long addr) { vm_fault_t ret = 0; struct page *page = &folio->page; @@ -3497,8 +3493,6 @@ static vm_fault_t filemap_map_order0_folio(struct vm_fault *vmf, if (PageHWPoison(page)) return ret; - (*mmap_miss)++; - /* * NOTE: If there're PTE markers, we'll leave them to be * handled in the specific fault path, and it'll prohibit @@ -3527,7 +3521,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, XA_STATE(xas, &mapping->i_pages, start_pgoff); struct folio *folio; vm_fault_t ret = 0; - unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved; + unsigned int nr_pages = 0; rcu_read_lock(); folio = next_uptodate_folio(&xas, mapping, end_pgoff); @@ -3556,12 +3550,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, nr_pages = min(end, end_pgoff) - xas.xa_index + 1; if (!folio_test_large(folio)) - ret |= filemap_map_order0_folio(vmf, - folio, addr, &mmap_miss); + ret |= filemap_map_order0_folio(vmf, folio, addr); else ret |= filemap_map_folio_range(vmf, folio, xas.xa_index - folio->index, addr, - nr_pages, &mmap_miss); + nr_pages); folio_unlock(folio); folio_put(folio); @@ -3570,11 +3563,23 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, out: rcu_read_unlock(); - mmap_miss_saved = READ_ONCE(file->f_ra.mmap_miss); - if (mmap_miss >= mmap_miss_saved) - WRITE_ONCE(file->f_ra.mmap_miss, 0); - else - WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss_saved - mmap_miss); + /* VM_FAULT_NOPAGE means we succeeded in mapping desired page */ + if (ret == VM_FAULT_NOPAGE) { + unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss); + + /* + * We've found the page we needed in the page cache, decrease + * mmap_miss. Note that we don't decrease mmap_miss for every + * page we've mapped because we don't know whether the process + * will actually use them. We will thus underestimate number of + * page cache hits but the least the page marked with readahead + * flag will not be mapped by filemap_map_pages() and this will + * contribute to decreasing mmap_miss to make up for occasional + * fault miss. + */ + if (mmap_miss) + WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss - 1); + } return ret; } -- 2.35.3 [-- Attachment #3: 0002-mm-readahead-Fix-readahead-miss-detection-with-FAULT.patch --] [-- Type: text/x-patch, Size: 1817 bytes --] From b1620b02a1ba81f5c6848eca4fd148e9520eafc6 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Thu, 1 Feb 2024 17:28:23 +0100 Subject: [PATCH 2/2] mm/readahead: Fix readahead miss detection with FAULT_FLAG_RETRY_NOWAIT When the page fault happens with FAULT_FLAG_RETRY_NOWAIT (which is common) we will bail out of the page fault after issuing reads and retry the fault. That will then find the created pages in filemap_map_pages() and hence will be treated as cache hit canceling out the cache miss in do_sync_mmap_readahead(). Increment mmap_miss by two in do_sync_mmap_readahead() in case FAULT_FLAG_RETRY_NOWAIT is set to account for the following expected hit. If the page gets evicted even before we manage to retry the fault, we are under so heavy memory pressure that increasing mmap_miss by two is fine. Reported-by: Liu Shixin <liushixin2@huawei.com> Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk transfer") Signed-off-by: Jan Kara <jack@suse.cz> --- mm/filemap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 0b843f99407c..2dda5dc04517 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3132,8 +3132,14 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) /* Avoid banging the cache line if not needed */ mmap_miss = READ_ONCE(ra->mmap_miss); + /* + * Increment mmap_miss by 2 if we are going to bail out of fault after + * issuing IO as we will then go back and map the cached page which is + * accounted as a cache hit. + */ + mmap_miss += 1 + !!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT); if (mmap_miss < MMAP_LOTSAMISS * 10) - WRITE_ONCE(ra->mmap_miss, ++mmap_miss); + WRITE_ONCE(ra->mmap_miss, mmap_miss); /* * Do we miss much more than hit in this file? If so, -- 2.35.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault 2024-02-01 17:31 ` Jan Kara @ 2024-02-02 1:25 ` Liu Shixin 2024-02-02 9:02 ` Liu Shixin 1 sibling, 0 replies; 14+ messages in thread From: Liu Shixin @ 2024-02-02 1:25 UTC (permalink / raw) To: Jan Kara Cc: Alexander Viro, Christian Brauner, Matthew Wilcox, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm On 2024/2/2 1:31, Jan Kara wrote: > On Thu 01-02-24 18:41:30, Liu Shixin wrote: >> On 2024/2/1 17:37, Jan Kara wrote: >>> On Thu 01-02-24 18:08:35, Liu Shixin wrote: >>>> When the pagefault is not for write and the refault distance is close, >>>> the page will be activated directly. If there are too many such pages in >>>> a file, that means the pages may be reclaimed immediately. >>>> In such situation, there is no positive effect to read-ahead since it will >>>> only waste IO. So collect the number of such pages and when the number is >>>> too large, stop bothering with read-ahead for a while until it decreased >>>> automatically. >>>> >>>> Define 'too large' as 10000 experientially, which can solves the problem >>>> and does not affect by the occasional active refault. >>>> >>>> Signed-off-by: Liu Shixin <liushixin2@huawei.com> >>> So I'm not convinced this new logic is needed. We already have >>> ra->mmap_miss which gets incremented when a page fault has to read the page >>> (and decremented when a page fault found the page already in cache). This >>> should already work to detect trashing as well, shouldn't it? If it does >>> not, why? >>> >>> Honza >> ra->mmap_miss doesn't help, it increased only one in do_sync_mmap_readahead() >> and then decreased one for every page in filemap_map_pages(). So in this scenario, >> it can't exceed MMAP_LOTSAMISS. > I see, OK. But that's a (longstanding) bug in how mmap_miss is handled. Can > you please test whether attached patches fix the trashing for you? At least > now I can see mmap_miss properly increments when we are hitting uncached > pages... Thanks! > > Honza Thanks for the patch, I will test it. > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault 2024-02-01 17:31 ` Jan Kara 2024-02-02 1:25 ` Liu Shixin @ 2024-02-02 9:02 ` Liu Shixin 2024-02-29 9:01 ` Liu Shixin 1 sibling, 1 reply; 14+ messages in thread From: Liu Shixin @ 2024-02-02 9:02 UTC (permalink / raw) To: Jan Kara Cc: Alexander Viro, Christian Brauner, Matthew Wilcox, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm [-- Attachment #1: Type: text/plain, Size: 1787 bytes --] On 2024/2/2 1:31, Jan Kara wrote: > On Thu 01-02-24 18:41:30, Liu Shixin wrote: >> On 2024/2/1 17:37, Jan Kara wrote: >>> On Thu 01-02-24 18:08:35, Liu Shixin wrote: >>>> When the pagefault is not for write and the refault distance is close, >>>> the page will be activated directly. If there are too many such pages in >>>> a file, that means the pages may be reclaimed immediately. >>>> In such situation, there is no positive effect to read-ahead since it will >>>> only waste IO. So collect the number of such pages and when the number is >>>> too large, stop bothering with read-ahead for a while until it decreased >>>> automatically. >>>> >>>> Define 'too large' as 10000 experientially, which can solves the problem >>>> and does not affect by the occasional active refault. >>>> >>>> Signed-off-by: Liu Shixin <liushixin2@huawei.com> >>> So I'm not convinced this new logic is needed. We already have >>> ra->mmap_miss which gets incremented when a page fault has to read the page >>> (and decremented when a page fault found the page already in cache). This >>> should already work to detect trashing as well, shouldn't it? If it does >>> not, why? >>> >>> Honza >> ra->mmap_miss doesn't help, it increased only one in do_sync_mmap_readahead() >> and then decreased one for every page in filemap_map_pages(). So in this scenario, >> it can't exceed MMAP_LOTSAMISS. > I see, OK. But that's a (longstanding) bug in how mmap_miss is handled. Can > you please test whether attached patches fix the trashing for you? At least > now I can see mmap_miss properly increments when we are hitting uncached > pages... Thanks! > > Honza The patch doesn't seem to have much effect. I will try to analyze why it doesn't work. The attached file is my testcase. Thanks, > [-- Attachment #2: test.sh --] [-- Type: text/plain, Size: 385 bytes --] #!/bin/bash while true; do flag=$(ps -ef | grep -v grep | grep alloc_page| wc -l) if [ "$flag" -eq 0 ]; then /alloc_page & fi sleep 30 start_time=$(date +%s) yum install -y expect > /dev/null 2>&1 end_time=$(date +%s) elapsed_time=$((end_time - start_time)) echo "$elapsed_time seconds" yum remove -y expect > /dev/null 2>&1 done [-- Attachment #3: alloc_page.c --] [-- Type: text/plain, Size: 418 bytes --] #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #define SIZE 1*1024*1024 //1M int main() { void *ptr = NULL; int i; for (i = 0; i < 1024 * 6 - 50;i++) { ptr = (void *) malloc(SIZE); if (ptr == NULL) { printf("malloc err!"); return -1; } memset(ptr, 0, SIZE); } sleep(99999); free(ptr); return 0; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault 2024-02-02 9:02 ` Liu Shixin @ 2024-02-29 9:01 ` Liu Shixin 0 siblings, 0 replies; 14+ messages in thread From: Liu Shixin @ 2024-02-29 9:01 UTC (permalink / raw) To: Jan Kara Cc: Alexander Viro, Christian Brauner, Matthew Wilcox, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm On 2024/2/2 17:02, Liu Shixin wrote: > > On 2024/2/2 1:31, Jan Kara wrote: >> On Thu 01-02-24 18:41:30, Liu Shixin wrote: >>> On 2024/2/1 17:37, Jan Kara wrote: >>>> On Thu 01-02-24 18:08:35, Liu Shixin wrote: >>>>> When the pagefault is not for write and the refault distance is close, >>>>> the page will be activated directly. If there are too many such pages in >>>>> a file, that means the pages may be reclaimed immediately. >>>>> In such situation, there is no positive effect to read-ahead since it will >>>>> only waste IO. So collect the number of such pages and when the number is >>>>> too large, stop bothering with read-ahead for a while until it decreased >>>>> automatically. >>>>> >>>>> Define 'too large' as 10000 experientially, which can solves the problem >>>>> and does not affect by the occasional active refault. >>>>> >>>>> Signed-off-by: Liu Shixin <liushixin2@huawei.com> >>>> So I'm not convinced this new logic is needed. We already have >>>> ra->mmap_miss which gets incremented when a page fault has to read the page >>>> (and decremented when a page fault found the page already in cache). This >>>> should already work to detect trashing as well, shouldn't it? If it does >>>> not, why? >>>> >>>> Honza >>> ra->mmap_miss doesn't help, it increased only one in do_sync_mmap_readahead() >>> and then decreased one for every page in filemap_map_pages(). So in this scenario, >>> it can't exceed MMAP_LOTSAMISS. >> I see, OK. But that's a (longstanding) bug in how mmap_miss is handled. Can >> you please test whether attached patches fix the trashing for you? At least >> now I can see mmap_miss properly increments when we are hitting uncached >> pages... Thanks! >> >> Honza > The patch doesn't seem to have much effect. I will try to analyze why it doesn't work. > The attached file is my testcase. > > Thanks, I think I figured out why mmap_miss doesn't work. After do_sync_mmap_readahead(), there is a __filemap_get_folio() to make sure the page is ready. Then, it is ready too in filemap_map_pages(), so the mmap_miss will decreased once. mmap_miss goes back to 0, and can't stop read-ahead. Overall, I don't think mmap_miss can solve this problem. . ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault 2024-02-01 10:08 ` [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault Liu Shixin 2024-02-01 9:37 ` Jan Kara @ 2024-03-05 7:07 ` Liu Shixin 1 sibling, 0 replies; 14+ messages in thread From: Liu Shixin @ 2024-03-05 7:07 UTC (permalink / raw) To: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox, Andrew Morton Cc: linux-fsdevel, linux-kernel, linux-mm Hi, Jan, All, Please take a look at this patch again. Although this may not be a graceful way. I can't think any other way to fix the problem except using workingset. Thanks, On 2024/2/1 18:08, Liu Shixin wrote: > When the pagefault is not for write and the refault distance is close, > the page will be activated directly. If there are too many such pages in > a file, that means the pages may be reclaimed immediately. > In such situation, there is no positive effect to read-ahead since it will > only waste IO. So collect the number of such pages and when the number is > too large, stop bothering with read-ahead for a while until it decreased > automatically. > > Define 'too large' as 10000 experientially, which can solves the problem > and does not affect by the occasional active refault. > > Signed-off-by: Liu Shixin <liushixin2@huawei.com> > --- > include/linux/fs.h | 2 ++ > include/linux/pagemap.h | 1 + > mm/filemap.c | 16 ++++++++++++++++ > mm/readahead.c | 4 ++++ > 4 files changed, 23 insertions(+) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index ed5966a704951..f2a1825442f5a 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -960,6 +960,7 @@ struct fown_struct { > * the first of these pages is accessed. > * @ra_pages: Maximum size of a readahead request, copied from the bdi. > * @mmap_miss: How many mmap accesses missed in the page cache. > + * @active_refault: Number of active page refault. > * @prev_pos: The last byte in the most recent read request. > * > * When this structure is passed to ->readahead(), the "most recent" > @@ -971,6 +972,7 @@ struct file_ra_state { > unsigned int async_size; > unsigned int ra_pages; > unsigned int mmap_miss; > + unsigned int active_refault; > loff_t prev_pos; > }; > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 2df35e65557d2..da9eaf985dec4 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -1256,6 +1256,7 @@ struct readahead_control { > pgoff_t _index; > unsigned int _nr_pages; > unsigned int _batch_count; > + unsigned int _active_refault; > bool _workingset; > unsigned long _pflags; > }; > diff --git a/mm/filemap.c b/mm/filemap.c > index 750e779c23db7..4de80592ab270 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3037,6 +3037,7 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, > > #ifdef CONFIG_MMU > #define MMAP_LOTSAMISS (100) > +#define ACTIVE_REFAULT_LIMIT (10000) > /* > * lock_folio_maybe_drop_mmap - lock the page, possibly dropping the mmap_lock > * @vmf - the vm_fault for this fault. > @@ -3142,6 +3143,18 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) > if (mmap_miss > MMAP_LOTSAMISS) > return fpin; > > + ractl._active_refault = READ_ONCE(ra->active_refault); > + if (ractl._active_refault) > + WRITE_ONCE(ra->active_refault, --ractl._active_refault); > + > + /* > + * If there are a lot of refault of active pages in this file, > + * that means the memory reclaim is ongoing. Stop bothering with > + * read-ahead since it will only waste IO. > + */ > + if (ractl._active_refault >= ACTIVE_REFAULT_LIMIT) > + return fpin; > + > /* > * mmap read-around > */ > @@ -3151,6 +3164,9 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) > ra->async_size = ra->ra_pages / 4; > ractl._index = ra->start; > page_cache_ra_order(&ractl, ra, 0); > + > + WRITE_ONCE(ra->active_refault, ractl._active_refault); > + > return fpin; > } > > diff --git a/mm/readahead.c b/mm/readahead.c > index cc4abb67eb223..d79bb70a232c4 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -263,6 +263,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > folio_set_readahead(folio); > ractl->_workingset |= folio_test_workingset(folio); > ractl->_nr_pages++; > + if (unlikely(folio_test_workingset(folio))) > + ractl->_active_refault++; > + else if (unlikely(ractl->_active_refault)) > + ractl->_active_refault--; > } > > /* ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-03-05 7:07 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-02-01 10:08 [PATCH 0/2] Fix I/O high when memory almost met memcg limit Liu Shixin 2024-02-01 10:08 ` [PATCH 1/2] mm/readahead: stop readahead loop if memcg charge fails Liu Shixin 2024-02-01 9:28 ` Jan Kara 2024-02-01 13:47 ` Matthew Wilcox 2024-02-01 13:52 ` Jan Kara 2024-02-01 13:53 ` Matthew Wilcox 2024-02-01 10:08 ` [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault Liu Shixin 2024-02-01 9:37 ` Jan Kara 2024-02-01 10:41 ` Liu Shixin 2024-02-01 17:31 ` Jan Kara 2024-02-02 1:25 ` Liu Shixin 2024-02-02 9:02 ` Liu Shixin 2024-02-29 9:01 ` Liu Shixin 2024-03-05 7:07 ` Liu Shixin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.