linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Fix I/O high when memory almost met memcg limit
@ 2024-03-22  9:35 Liu Shixin
  2024-03-22  9:35 ` [PATCH v2 1/2] mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM Liu Shixin
  2024-03-22  9:35 ` [PATCH v2 2/2] mm/readahead: don't decrease mmap_miss when folio has workingset flags Liu Shixin
  0 siblings, 2 replies; 5+ messages in thread
From: Liu Shixin @ 2024-03-22  9:35 UTC (permalink / raw)
  To: Jan Kara, Matthew Wilcox, Andrew Morton, Alexander Viro,
	Christian Brauner
  Cc: linux-fsdevel, linux-kernel, linux-mm, Liu Shixin

v1->v2:
  1. Replace the variable active_refault with mmap_miss. Now mmap_miss will
     not decreased if folio is active prior to eviction.
  2. Jan has given me other two patches which aims to let mmap_miss properly
     increased when the page is not ready. But in my scenario, the problem
     is that the page will be reclaimed immediately. These two patches have
     no logic conflict with Jan's patches[3].

Recently, when install package in a docker which almost reached its memory
limit, the installer has no respond severely for more than 15 minutes.
During this period, I/O stays high(~1G/s) and influence the whole machine.
I've constructed a use case as follows:

  1. create a docker:

	$ cat test.sh
	#!/bin/bash
  
	docker rm centos7 --force

	docker create --name centos7 --memory 4G --memory-swap 6G centos:7 /usr/sbin/init
	docker start centos7
	sleep 1

	docker cp ./alloc_page centos7:/
	docker cp ./reproduce.sh centos7:/

	docker exec -it centos7 /bin/bash

  2. try reproduce the problem in docker:

	$ cat reproduce.sh
	#!/bin/bash
  
	while true; do
		flag=$(ps -ef | grep -v grep | grep alloc_page| wc -l)
		if [ "$flag" -eq 0 ]; then
			/alloc_page &
		fi

		sleep 30

		start_time=$(date +%s)
		yum install -y expect > /dev/null 2>&1

		end_time=$(date +%s)

		elapsed_time=$((end_time - start_time))

		echo "$elapsed_time seconds"
		yum remove -y expect > /dev/null 2>&1
	done

	$ cat alloc_page.c:
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <string.h>

	#define SIZE 1*1024*1024 //1M

	int main()
	{
		void *addr = NULL;
		int i;

		for (i = 0; i < 1024 * 6 - 50;i++) {
			addr = (void *)malloc(SIZE);
			if (!addr)
				return -1;

			memset(addr, 0, SIZE);
		}

		sleep(99999);
		return 0;
	}


We found that this problem is caused by a lot ot meaningless read-ahead.
Since the docker is almost met memory limit, the page will be reclaimed
immediately after read-ahead and will read-ahead again immediately.
The program is executed slowly and waste a lot of I/O resource.

These two patch aim to break the read-ahead in above scenario.

[1] https://lore.kernel.org/linux-mm/c2f4a2fa-3bde-72ce-66f5-db81a373fdbc@huawei.com/T/
[2] https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/
[3] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/

Liu Shixin (2):
  mm/readahead: break read-ahead loop if filemap_add_folio return
    -ENOMEM
  mm/readahead: increase mmap_miss when folio in workingset

 include/linux/pagemap.h |  2 ++
 mm/filemap.c            |  7 ++++---
 mm/readahead.c          | 15 +++++++++++++--
 3 files changed, 19 insertions(+), 5 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM
  2024-03-22  9:35 [PATCH v2 0/2] Fix I/O high when memory almost met memcg limit Liu Shixin
@ 2024-03-22  9:35 ` Liu Shixin
  2024-03-22  9:35 ` [PATCH v2 2/2] mm/readahead: don't decrease mmap_miss when folio has workingset flags Liu Shixin
  1 sibling, 0 replies; 5+ messages in thread
From: Liu Shixin @ 2024-03-22  9:35 UTC (permalink / raw)
  To: Jan Kara, Matthew Wilcox, Andrew Morton, Alexander Viro,
	Christian Brauner
  Cc: linux-fsdevel, linux-kernel, linux-mm, Liu Shixin

When filemap_add_folio() return -ENOMEM, break read-ahead loop like what
filemap_alloc_folio() does.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 mm/readahead.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 2648ec4f0494..a8513ffbb17f 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -228,6 +228,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 	 */
 	for (i = 0; i < nr_to_read; i++) {
 		struct folio *folio = xa_load(&mapping->i_pages, index + i);
+		int ret;
 
 		if (folio && !xa_is_value(folio)) {
 			/*
@@ -247,9 +248,12 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 		folio = filemap_alloc_folio(gfp_mask, 0);
 		if (!folio)
 			break;
-		if (filemap_add_folio(mapping, folio, index + i,
-					gfp_mask) < 0) {
+
+		ret = filemap_add_folio(mapping, folio, index + i, gfp_mask);
+		if (ret < 0) {
 			folio_put(folio);
+			if (ret == -ENOMEM)
+				break;
 			read_pages(ractl);
 			ractl->_index++;
 			i = ractl->_index + ractl->_nr_pages - index - 1;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] mm/readahead: don't decrease mmap_miss when folio has workingset flags
  2024-03-22  9:35 [PATCH v2 0/2] Fix I/O high when memory almost met memcg limit Liu Shixin
  2024-03-22  9:35 ` [PATCH v2 1/2] mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM Liu Shixin
@ 2024-03-22  9:35 ` Liu Shixin
  2024-03-25 15:30   ` Jan Kara
  2024-03-26  6:50   ` [PATCH v3] mm/filemap: don't decrease mmap_miss when folio has workingset flag Liu Shixin
  1 sibling, 2 replies; 5+ messages in thread
From: Liu Shixin @ 2024-03-22  9:35 UTC (permalink / raw)
  To: Jan Kara, Matthew Wilcox, Andrew Morton, Alexander Viro,
	Christian Brauner
  Cc: linux-fsdevel, linux-kernel, linux-mm, Liu Shixin

If there are too many folios that are recently evicted in a file, then
they will probably continue to be evicted. In such situation, there is
no positive effect to read-ahead this file since it is only a waste of IO.

The mmap_miss is increased in do_sync_mmap_readahead() and decreased in
both do_async_mmap_readahead() and filemap_map_pages(). In order to skip
read-ahead in above scenario, the mmap_miss have to increased exceed
MMAP_LOTSAMISS. This can be done by stop decreased mmap_miss when folio
has workingset flags. The async path is not to care because in above
scenario, it's hard to run into the async path.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
---
 mm/filemap.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 8df4797c5287..753771310127 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3439,7 +3439,8 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 		if (PageHWPoison(page + count))
 			goto skip;
 
-		(*mmap_miss)++;
+		if (!folio_test_workingset(folio))
+			(*mmap_miss)++;
 
 		/*
 		 * NOTE: If there're PTE markers, we'll leave them to be
@@ -3488,7 +3489,8 @@ static vm_fault_t filemap_map_order0_folio(struct vm_fault *vmf,
 	if (PageHWPoison(page))
 		return ret;
 
-	(*mmap_miss)++;
+	if (!folio_test_workingset(folio))
+		(*mmap_miss)++;
 
 	/*
 	 * NOTE: If there're PTE markers, we'll leave them to be
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] mm/readahead: don't decrease mmap_miss when folio has workingset flags
  2024-03-22  9:35 ` [PATCH v2 2/2] mm/readahead: don't decrease mmap_miss when folio has workingset flags Liu Shixin
@ 2024-03-25 15:30   ` Jan Kara
  2024-03-26  6:50   ` [PATCH v3] mm/filemap: don't decrease mmap_miss when folio has workingset flag Liu Shixin
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2024-03-25 15:30 UTC (permalink / raw)
  To: Liu Shixin
  Cc: Jan Kara, Matthew Wilcox, Andrew Morton, Alexander Viro,
	Christian Brauner, linux-fsdevel, linux-kernel, linux-mm

On Fri 22-03-24 17:35:55, Liu Shixin wrote:
> If there are too many folios that are recently evicted in a file, then
> they will probably continue to be evicted. In such situation, there is
> no positive effect to read-ahead this file since it is only a waste of IO.
> 
> The mmap_miss is increased in do_sync_mmap_readahead() and decreased in
> both do_async_mmap_readahead() and filemap_map_pages(). In order to skip
> read-ahead in above scenario, the mmap_miss have to increased exceed
> MMAP_LOTSAMISS. This can be done by stop decreased mmap_miss when folio
> has workingset flags. The async path is not to care because in above
> scenario, it's hard to run into the async path.
> 
> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
...
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 8df4797c5287..753771310127 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3439,7 +3439,8 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>  		if (PageHWPoison(page + count))
>  			goto skip;
>  
> -		(*mmap_miss)++;
> +		if (!folio_test_workingset(folio))
> +			(*mmap_miss)++;

Hum, so this means we consider this a 'hit' if the page is completely new
in the page cache or evicted long time ago. OK, makes sense. It would be
nice to add a comment in this direction to explain the condition. Frankly
the whole mmap_miss accounting is broken as I've outlined in my patch
series. But I guess this works as a fixup for your immediate problem and
we can make mmap_miss accounting sensible later. So for now feel free to
add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3] mm/filemap: don't decrease mmap_miss when folio has workingset flag
  2024-03-22  9:35 ` [PATCH v2 2/2] mm/readahead: don't decrease mmap_miss when folio has workingset flags Liu Shixin
  2024-03-25 15:30   ` Jan Kara
@ 2024-03-26  6:50   ` Liu Shixin
  1 sibling, 0 replies; 5+ messages in thread
From: Liu Shixin @ 2024-03-26  6:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, Matthew Wilcox, Alexander Viro, Christian Brauner,
	linux-fsdevel, linux-kernel, linux-mm, Liu Shixin

If there are too many folios that are recently evicted in a file, then
they will probably continue to be evicted. In such situation, there is
no positive effect to read-ahead this file since it is only a waste of IO.

The mmap_miss is increased in do_sync_mmap_readahead() and decreased in
both do_async_mmap_readahead() and filemap_map_pages(). In order to skip
read-ahead in above scenario, the mmap_miss have to increased exceed
MMAP_LOTSAMISS. This can be done by stop decreased mmap_miss when folio
has workingset flag. The async path is not to care because in above
scenario, it's hard to run into the async path.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
v2->v3: Update the title and comment. And add reviewed-by from Jan.

Andrew, please update patch[2] with this new patch, thanks.

 mm/filemap.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 8df4797c5287..780aad026b26 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3439,7 +3439,15 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 		if (PageHWPoison(page + count))
 			goto skip;
 
-		(*mmap_miss)++;
+		/*
+		 * If there are too many folios that are recently evicted
+		 * in a file, they will probably continue to be evicted.
+		 * In such situation, read-ahead is only a waste of IO.
+		 * Don't decrease mmap_miss in this scenario to make sure
+		 * we can stop read-ahead.
+		 */
+		if (!folio_test_workingset(folio))
+			(*mmap_miss)++;
 
 		/*
 		 * NOTE: If there're PTE markers, we'll leave them to be
@@ -3488,7 +3496,9 @@ static vm_fault_t filemap_map_order0_folio(struct vm_fault *vmf,
 	if (PageHWPoison(page))
 		return ret;
 
-	(*mmap_miss)++;
+	/* See comment of filemap_map_folio_range() */
+	if (!folio_test_workingset(folio))
+		(*mmap_miss)++;
 
 	/*
 	 * NOTE: If there're PTE markers, we'll leave them to be
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-26  6:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-22  9:35 [PATCH v2 0/2] Fix I/O high when memory almost met memcg limit Liu Shixin
2024-03-22  9:35 ` [PATCH v2 1/2] mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM Liu Shixin
2024-03-22  9:35 ` [PATCH v2 2/2] mm/readahead: don't decrease mmap_miss when folio has workingset flags Liu Shixin
2024-03-25 15:30   ` Jan Kara
2024-03-26  6:50   ` [PATCH v3] mm/filemap: don't decrease mmap_miss when folio has workingset flag Liu Shixin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).