[PATCH] mm: consider all swapped back pages in used-once logic

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-17  9:13 ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-05-17  9:13 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Johannes Weiner, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

[64574746 vmscan: detect mapped file pages used only once] made mapped pages
have another round in inactive list because they might be just short
lived and so we could consider them again next time. This heuristic
helps to reduce pressure on the active list with a streaming IO
worklods.
This patch fixes a regression introduced by this commit for heavy shmem
based workloads because unlike Anon pages, which are excluded from this
heuristic because they are usually long lived, shmem pages are handled
as a regular page cache.
This doesn't work quite well, unfortunately, if the workload is mostly
backed by shmem (in memory database sitting on 80% of memory) with a
streaming IO in the background (backup - up to 20% of memory). Anon
inactive list is full of (dirty) shmem pages when watermarks are
hit. Shmem pages are kept in the inactive list (they are referenced)
in the first round and it is hard to reclaim anything else so we reach
lower scanning priorities very quickly which leads to an excessive swap
out.

Let's fix this by excluding all swap backed pages (they tend to be long
lived wrt. the regular page cache anyway) from used-once heuristic and
rather activate them if they are referenced.

CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Minchan Kim <minchan@kernel.org>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Rik van Riel <riel@redhat.com>
CC: stable [2.6.34+]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33dc256..0932dc2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -722,7 +722,7 @@ static enum page_references page_check_references(struct page *page,
 		return PAGEREF_RECLAIM;

 	if (referenced_ptes) {
-		if (PageAnon(page))
+		if (PageSwapBacked(page))
 			return PAGEREF_ACTIVATE;
 		/*
 		 * All mapped pages start out with page table
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-17  9:13 ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-05-17  9:13 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Johannes Weiner, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

[64574746 vmscan: detect mapped file pages used only once] made mapped pages
have another round in inactive list because they might be just short
lived and so we could consider them again next time. This heuristic
helps to reduce pressure on the active list with a streaming IO
worklods.
This patch fixes a regression introduced by this commit for heavy shmem
based workloads because unlike Anon pages, which are excluded from this
heuristic because they are usually long lived, shmem pages are handled
as a regular page cache.
This doesn't work quite well, unfortunately, if the workload is mostly
backed by shmem (in memory database sitting on 80% of memory) with a
streaming IO in the background (backup - up to 20% of memory). Anon
inactive list is full of (dirty) shmem pages when watermarks are
hit. Shmem pages are kept in the inactive list (they are referenced)
in the first round and it is hard to reclaim anything else so we reach
lower scanning priorities very quickly which leads to an excessive swap
out.

Let's fix this by excluding all swap backed pages (they tend to be long
lived wrt. the regular page cache anyway) from used-once heuristic and
rather activate them if they are referenced.

CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Minchan Kim <minchan@kernel.org>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Rik van Riel <riel@redhat.com>
CC: stable [2.6.34+]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33dc256..0932dc2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -722,7 +722,7 @@ static enum page_references page_check_references(struct page *page,
 		return PAGEREF_RECLAIM;

 	if (referenced_ptes) {
-		if (PageAnon(page))
+		if (PageSwapBacked(page))
 			return PAGEREF_ACTIVATE;
 		/*
 		 * All mapped pages start out with page table
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-17  9:13 ` Michal Hocko
@ 2012-05-17  9:24   ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2012-05-17  9:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Johannes Weiner, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 17 May 2012 11:13:53 +0200 Michal Hocko <mhocko@suse.cz> wrote:

> [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> have another round in inactive list because they might be just short
> lived and so we could consider them again next time. This heuristic
> helps to reduce pressure on the active list with a streaming IO
> worklods.
> This patch fixes a regression introduced by this commit for heavy shmem

A performance regression, specifically.

Are you able to quantify it?

> based workloads because unlike Anon pages, which are excluded from this
> heuristic because they are usually long lived, shmem pages are handled
> as a regular page cache.
> This doesn't work quite well, unfortunately, if the workload is mostly
> backed by shmem (in memory database sitting on 80% of memory) with a
> streaming IO in the background (backup - up to 20% of memory). Anon
> inactive list is full of (dirty) shmem pages when watermarks are
> hit. Shmem pages are kept in the inactive list (they are referenced)
> in the first round and it is hard to reclaim anything else so we reach
> lower scanning priorities very quickly which leads to an excessive swap
> out.
> 
> Let's fix this by excluding all swap backed pages (they tend to be long
> lived wrt. the regular page cache anyway) from used-once heuristic and
> rather activate them if they are referenced.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-17  9:24   ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2012-05-17  9:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Johannes Weiner, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 17 May 2012 11:13:53 +0200 Michal Hocko <mhocko@suse.cz> wrote:

> [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> have another round in inactive list because they might be just short
> lived and so we could consider them again next time. This heuristic
> helps to reduce pressure on the active list with a streaming IO
> worklods.
> This patch fixes a regression introduced by this commit for heavy shmem

A performance regression, specifically.

Are you able to quantify it?

> based workloads because unlike Anon pages, which are excluded from this
> heuristic because they are usually long lived, shmem pages are handled
> as a regular page cache.
> This doesn't work quite well, unfortunately, if the workload is mostly
> backed by shmem (in memory database sitting on 80% of memory) with a
> streaming IO in the background (backup - up to 20% of memory). Anon
> inactive list is full of (dirty) shmem pages when watermarks are
> hit. Shmem pages are kept in the inactive list (they are referenced)
> in the first round and it is hard to reclaim anything else so we reach
> lower scanning priorities very quickly which leads to an excessive swap
> out.
> 
> Let's fix this by excluding all swap backed pages (they tend to be long
> lived wrt. the regular page cache anyway) from used-once heuristic and
> rather activate them if they are referenced.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-17  9:24   ` Andrew Morton
@ 2012-05-17 12:10     ` Michal Hocko
  -1 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-05-17 12:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Johannes Weiner, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu 17-05-12 02:24:12, Andrew Morton wrote:
> On Thu, 17 May 2012 11:13:53 +0200 Michal Hocko <mhocko@suse.cz> wrote:
> 
> > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > have another round in inactive list because they might be just short
> > lived and so we could consider them again next time. This heuristic
> > helps to reduce pressure on the active list with a streaming IO
> > worklods.
> > This patch fixes a regression introduced by this commit for heavy shmem
> 
> A performance regression, specifically.
> 
> Are you able to quantify it?

The customer's workload is shmem backed database (80% of RAM) and
they are measuring transactions/s with an IO in the background (20%).
Transactions touch more or less random rows in the table.
The rate goes down drastically when we start swapping out memory.

Numbers are more descriptive (without the patch is 100%, with 5
representative runs)
Average rate	315.83%
Best rate	131.76%
Worst rate	641.25%

Standard deviation (calibrated to average) is ~4% while without the
patch we are at 62.82%. 
The big variance without the patch is caused by the excessive swapping
which doesn't occur with the patch applied.

* Worst run (100%) compared to a random run with the patch
pgpgin	pswpin	pswpout	pgmajfault
1.58%	0.00%	0.01%	0.22%

Average size of the LRU lists:
nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
52.91%           7234.72%       249.39%          126.64%

* Best run
pgpgin	pswpin	pswpout	pgmajfault
3.37%	0.00%	0.11%	0.39%

nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
49.85%           3868.74%       175.03%          121.27%

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-17 12:10     ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-05-17 12:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Johannes Weiner, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu 17-05-12 02:24:12, Andrew Morton wrote:
> On Thu, 17 May 2012 11:13:53 +0200 Michal Hocko <mhocko@suse.cz> wrote:
> 
> > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > have another round in inactive list because they might be just short
> > lived and so we could consider them again next time. This heuristic
> > helps to reduce pressure on the active list with a streaming IO
> > worklods.
> > This patch fixes a regression introduced by this commit for heavy shmem
> 
> A performance regression, specifically.
> 
> Are you able to quantify it?

The customer's workload is shmem backed database (80% of RAM) and
they are measuring transactions/s with an IO in the background (20%).
Transactions touch more or less random rows in the table.
The rate goes down drastically when we start swapping out memory.

Numbers are more descriptive (without the patch is 100%, with 5
representative runs)
Average rate	315.83%
Best rate	131.76%
Worst rate	641.25%

Standard deviation (calibrated to average) is ~4% while without the
patch we are at 62.82%. 
The big variance without the patch is caused by the excessive swapping
which doesn't occur with the patch applied.

* Worst run (100%) compared to a random run with the patch
pgpgin	pswpin	pswpout	pgmajfault
1.58%	0.00%	0.01%	0.22%

Average size of the LRU lists:
nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
52.91%           7234.72%       249.39%          126.64%

* Best run
pgpgin	pswpin	pswpout	pgmajfault
3.37%	0.00%	0.11%	0.39%

nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
49.85%           3868.74%       175.03%          121.27%

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-17  9:13 ` Michal Hocko
@ 2012-05-17 13:14   ` Rik van Riel
  -1 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2012-05-17 13:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Johannes Weiner, Andrew Morton,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki

On 05/17/2012 05:13 AM, Michal Hocko wrote:
> [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> have another round in inactive list because they might be just short
> lived and so we could consider them again next time. This heuristic
> helps to reduce pressure on the active list with a streaming IO
> worklods.
> This patch fixes a regression introduced by this commit for heavy shmem
> based workloads because unlike Anon pages, which are excluded from this
> heuristic because they are usually long lived, shmem pages are handled
> as a regular page cache.
> This doesn't work quite well, unfortunately, if the workload is mostly
> backed by shmem (in memory database sitting on 80% of memory) with a
> streaming IO in the background (backup - up to 20% of memory). Anon
> inactive list is full of (dirty) shmem pages when watermarks are
> hit. Shmem pages are kept in the inactive list (they are referenced)
> in the first round and it is hard to reclaim anything else so we reach
> lower scanning priorities very quickly which leads to an excessive swap
> out.
>
> Let's fix this by excluding all swap backed pages (they tend to be long
> lived wrt. the regular page cache anyway) from used-once heuristic and
> rather activate them if they are referenced.
>
> CC: Johannes Weiner<hannes@cmpxchg.org>
> CC: Andrew Morton<akpm@linux-foundation.org>
> CC: Mel Gorman<mel@csn.ul.ie>
> CC: Minchan Kim<minchan@kernel.org>
> CC: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> CC: Rik van Riel<riel@redhat.com>
> CC: stable [2.6.34+]
> Signed-off-by: Michal Hocko<mhocko@suse.cz>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-17 13:14   ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2012-05-17 13:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Johannes Weiner, Andrew Morton,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki

On 05/17/2012 05:13 AM, Michal Hocko wrote:
> [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> have another round in inactive list because they might be just short
> lived and so we could consider them again next time. This heuristic
> helps to reduce pressure on the active list with a streaming IO
> worklods.
> This patch fixes a regression introduced by this commit for heavy shmem
> based workloads because unlike Anon pages, which are excluded from this
> heuristic because they are usually long lived, shmem pages are handled
> as a regular page cache.
> This doesn't work quite well, unfortunately, if the workload is mostly
> backed by shmem (in memory database sitting on 80% of memory) with a
> streaming IO in the background (backup - up to 20% of memory). Anon
> inactive list is full of (dirty) shmem pages when watermarks are
> hit. Shmem pages are kept in the inactive list (they are referenced)
> in the first round and it is hard to reclaim anything else so we reach
> lower scanning priorities very quickly which leads to an excessive swap
> out.
>
> Let's fix this by excluding all swap backed pages (they tend to be long
> lived wrt. the regular page cache anyway) from used-once heuristic and
> rather activate them if they are referenced.
>
> CC: Johannes Weiner<hannes@cmpxchg.org>
> CC: Andrew Morton<akpm@linux-foundation.org>
> CC: Mel Gorman<mel@csn.ul.ie>
> CC: Minchan Kim<minchan@kernel.org>
> CC: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> CC: Rik van Riel<riel@redhat.com>
> CC: stable [2.6.34+]
> Signed-off-by: Michal Hocko<mhocko@suse.cz>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-17  9:13 ` Michal Hocko
@ 2012-05-17 19:54   ` Johannes Weiner
  -1 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-05-17 19:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

Hi Michal,

On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> have another round in inactive list because they might be just short
> lived and so we could consider them again next time. This heuristic
> helps to reduce pressure on the active list with a streaming IO
> worklods.
> This patch fixes a regression introduced by this commit for heavy shmem
> based workloads because unlike Anon pages, which are excluded from this
> heuristic because they are usually long lived, shmem pages are handled
> as a regular page cache.
> This doesn't work quite well, unfortunately, if the workload is mostly
> backed by shmem (in memory database sitting on 80% of memory) with a
> streaming IO in the background (backup - up to 20% of memory). Anon
> inactive list is full of (dirty) shmem pages when watermarks are
> hit. Shmem pages are kept in the inactive list (they are referenced)
> in the first round and it is hard to reclaim anything else so we reach
> lower scanning priorities very quickly which leads to an excessive swap
> out.
> 
> Let's fix this by excluding all swap backed pages (they tend to be long
> lived wrt. the regular page cache anyway) from used-once heuristic and
> rather activate them if they are referenced.

Yes, the algorithm only makes sense for file cache, which is easy to
reclaim.  Thanks for the fix!

> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Minchan Kim <minchan@kernel.org>
> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Rik van Riel <riel@redhat.com>
> CC: stable [2.6.34+]
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-17 19:54   ` Johannes Weiner
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-05-17 19:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

Hi Michal,

On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> have another round in inactive list because they might be just short
> lived and so we could consider them again next time. This heuristic
> helps to reduce pressure on the active list with a streaming IO
> worklods.
> This patch fixes a regression introduced by this commit for heavy shmem
> based workloads because unlike Anon pages, which are excluded from this
> heuristic because they are usually long lived, shmem pages are handled
> as a regular page cache.
> This doesn't work quite well, unfortunately, if the workload is mostly
> backed by shmem (in memory database sitting on 80% of memory) with a
> streaming IO in the background (backup - up to 20% of memory). Anon
> inactive list is full of (dirty) shmem pages when watermarks are
> hit. Shmem pages are kept in the inactive list (they are referenced)
> in the first round and it is hard to reclaim anything else so we reach
> lower scanning priorities very quickly which leads to an excessive swap
> out.
> 
> Let's fix this by excluding all swap backed pages (they tend to be long
> lived wrt. the regular page cache anyway) from used-once heuristic and
> rather activate them if they are referenced.

Yes, the algorithm only makes sense for file cache, which is easy to
reclaim.  Thanks for the fix!

> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Minchan Kim <minchan@kernel.org>
> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Rik van Riel <riel@redhat.com>
> CC: stable [2.6.34+]
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-17 12:10     ` Michal Hocko
@ 2012-05-17 20:23       ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2012-05-17 20:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Johannes Weiner, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 17 May 2012 14:10:49 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> > > This patch fixes a regression introduced by this commit for heavy shmem
> > 
> > A performance regression, specifically.
> > 
> > Are you able to quantify it?
> 
> The customer's workload is shmem backed database (80% of RAM) and
> they are measuring transactions/s with an IO in the background (20%).
> Transactions touch more or less random rows in the table.
> The rate goes down drastically when we start swapping out memory.
> 
> Numbers are more descriptive (without the patch is 100%, with 5
> representative runs)
> Average rate	315.83%
> Best rate	131.76%
> Worst rate	641.25%
> 
> Standard deviation (calibrated to average) is ~4% while without the
> patch we are at 62.82%. 
> The big variance without the patch is caused by the excessive swapping
> which doesn't occur with the patch applied.
> 
> * Worst run (100%) compared to a random run with the patch
> pgpgin	pswpin	pswpout	pgmajfault
> 1.58%	0.00%	0.01%	0.22%
> 
> Average size of the LRU lists:
> nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
> 52.91%           7234.72%       249.39%          126.64%
> 
> * Best run
> pgpgin	pswpin	pswpout	pgmajfault
> 3.37%	0.00%	0.11%	0.39%
> 
> nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
> 49.85%           3868.74%       175.03%          121.27%

I turned the above into this soundbite:

: The customer's workload is shmem backed database (80% of RAM) and they are
: measuring transactions/s with an IO in the background (20%).  Transactions
: touch more or less random rows in the table.  Total runtime was
: approximately tripled by commit 64574746 and this patch restores the
: previous throughput levels.

Was that truthful?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-17 20:23       ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2012-05-17 20:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Johannes Weiner, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu, 17 May 2012 14:10:49 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> > > This patch fixes a regression introduced by this commit for heavy shmem
> > 
> > A performance regression, specifically.
> > 
> > Are you able to quantify it?
> 
> The customer's workload is shmem backed database (80% of RAM) and
> they are measuring transactions/s with an IO in the background (20%).
> Transactions touch more or less random rows in the table.
> The rate goes down drastically when we start swapping out memory.
> 
> Numbers are more descriptive (without the patch is 100%, with 5
> representative runs)
> Average rate	315.83%
> Best rate	131.76%
> Worst rate	641.25%
> 
> Standard deviation (calibrated to average) is ~4% while without the
> patch we are at 62.82%. 
> The big variance without the patch is caused by the excessive swapping
> which doesn't occur with the patch applied.
> 
> * Worst run (100%) compared to a random run with the patch
> pgpgin	pswpin	pswpout	pgmajfault
> 1.58%	0.00%	0.01%	0.22%
> 
> Average size of the LRU lists:
> nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
> 52.91%           7234.72%       249.39%          126.64%
> 
> * Best run
> pgpgin	pswpin	pswpout	pgmajfault
> 3.37%	0.00%	0.11%	0.39%
> 
> nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
> 49.85%           3868.74%       175.03%          121.27%

I turned the above into this soundbite:

: The customer's workload is shmem backed database (80% of RAM) and they are
: measuring transactions/s with an IO in the background (20%).  Transactions
: touch more or less random rows in the table.  Total runtime was
: approximately tripled by commit 64574746 and this patch restores the
: previous throughput levels.

Was that truthful?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-17  9:13 ` Michal Hocko
@ 2012-05-18  0:40   ` Minchan Kim
  -1 siblings, 0 replies; 40+ messages in thread
From: Minchan Kim @ 2012-05-18  0:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Johannes Weiner, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On 05/17/2012 06:13 PM, Michal Hocko wrote:

> [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> have another round in inactive list because they might be just short
> lived and so we could consider them again next time. This heuristic
> helps to reduce pressure on the active list with a streaming IO
> worklods.
> This patch fixes a regression introduced by this commit for heavy shmem
> based workloads because unlike Anon pages, which are excluded from this
> heuristic because they are usually long lived, shmem pages are handled
> as a regular page cache.
> This doesn't work quite well, unfortunately, if the workload is mostly
> backed by shmem (in memory database sitting on 80% of memory) with a
> streaming IO in the background (backup - up to 20% of memory). Anon
> inactive list is full of (dirty) shmem pages when watermarks are
> hit. Shmem pages are kept in the inactive list (they are referenced)
> in the first round and it is hard to reclaim anything else so we reach
> lower scanning priorities very quickly which leads to an excessive swap
> out.
> 
> Let's fix this by excluding all swap backed pages (they tend to be long
> lived wrt. the regular page cache anyway) from used-once heuristic and
> rather activate them if they are referenced.
> 
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Minchan Kim <minchan@kernel.org>
> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Rik van Riel <riel@redhat.com>
> CC: stable [2.6.34+]
> Signed-off-by: Michal Hocko <mhocko@suse.cz>


Reviewed-by: Minchan Kim <minchan@kernel.org>

Good spot!
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-18  0:40   ` Minchan Kim
  0 siblings, 0 replies; 40+ messages in thread
From: Minchan Kim @ 2012-05-18  0:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Johannes Weiner, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On 05/17/2012 06:13 PM, Michal Hocko wrote:

> [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> have another round in inactive list because they might be just short
> lived and so we could consider them again next time. This heuristic
> helps to reduce pressure on the active list with a streaming IO
> worklods.
> This patch fixes a regression introduced by this commit for heavy shmem
> based workloads because unlike Anon pages, which are excluded from this
> heuristic because they are usually long lived, shmem pages are handled
> as a regular page cache.
> This doesn't work quite well, unfortunately, if the workload is mostly
> backed by shmem (in memory database sitting on 80% of memory) with a
> streaming IO in the background (backup - up to 20% of memory). Anon
> inactive list is full of (dirty) shmem pages when watermarks are
> hit. Shmem pages are kept in the inactive list (they are referenced)
> in the first round and it is hard to reclaim anything else so we reach
> lower scanning priorities very quickly which leads to an excessive swap
> out.
> 
> Let's fix this by excluding all swap backed pages (they tend to be long
> lived wrt. the regular page cache anyway) from used-once heuristic and
> rather activate them if they are referenced.
> 
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Mel Gorman <mel@csn.ul.ie>
> CC: Minchan Kim <minchan@kernel.org>
> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Rik van Riel <riel@redhat.com>
> CC: stable [2.6.34+]
> Signed-off-by: Michal Hocko <mhocko@suse.cz>


Reviewed-by: Minchan Kim <minchan@kernel.org>

Good spot!
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-17 20:23       ` Andrew Morton
@ 2012-05-18  6:50         ` Michal Hocko
  -1 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-05-18  6:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Johannes Weiner, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu 17-05-12 13:23:24, Andrew Morton wrote:
> On Thu, 17 May 2012 14:10:49 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > > > This patch fixes a regression introduced by this commit for heavy shmem
> > > 
> > > A performance regression, specifically.
> > > 
> > > Are you able to quantify it?
> > 
> > The customer's workload is shmem backed database (80% of RAM) and
> > they are measuring transactions/s with an IO in the background (20%).
> > Transactions touch more or less random rows in the table.
> > The rate goes down drastically when we start swapping out memory.
> > 
> > Numbers are more descriptive (without the patch is 100%, with 5
> > representative runs)
> > Average rate	315.83%
> > Best rate	131.76%
> > Worst rate	641.25%
> > 
> > Standard deviation (calibrated to average) is ~4% while without the
> > patch we are at 62.82%. 
> > The big variance without the patch is caused by the excessive swapping
> > which doesn't occur with the patch applied.
> > 
> > * Worst run (100%) compared to a random run with the patch
> > pgpgin	pswpin	pswpout	pgmajfault
> > 1.58%	0.00%	0.01%	0.22%
> > 
> > Average size of the LRU lists:
> > nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
> > 52.91%           7234.72%       249.39%          126.64%
> > 
> > * Best run
> > pgpgin	pswpin	pswpout	pgmajfault
> > 3.37%	0.00%	0.11%	0.39%
> > 
> > nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
> > 49.85%           3868.74%       175.03%          121.27%
> 
> I turned the above into this soundbite:
> 
> : The customer's workload is shmem backed database (80% of RAM) and they are
> : measuring transactions/s with an IO in the background (20%).  Transactions
> : touch more or less random rows in the table.  Total runtime was
> : approximately tripled by commit 64574746 and this patch restores the
> : previous throughput levels.
> 
> Was that truthful?

Total runtime was same for all the runs. It is the number of executed
transactions that was measured. I guess that what you wrote should be
more or less equivalent but it's is not what I have numbers for.
How about:
"
Total number of transactions went down 3 times (in the worst case)
because of commit 64574746. This patch restores the previous numbers.
"

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-18  6:50         ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-05-18  6:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Johannes Weiner, Mel Gorman, Minchan Kim,
	KAMEZAWA Hiroyuki, Rik van Riel

On Thu 17-05-12 13:23:24, Andrew Morton wrote:
> On Thu, 17 May 2012 14:10:49 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > > > This patch fixes a regression introduced by this commit for heavy shmem
> > > 
> > > A performance regression, specifically.
> > > 
> > > Are you able to quantify it?
> > 
> > The customer's workload is shmem backed database (80% of RAM) and
> > they are measuring transactions/s with an IO in the background (20%).
> > Transactions touch more or less random rows in the table.
> > The rate goes down drastically when we start swapping out memory.
> > 
> > Numbers are more descriptive (without the patch is 100%, with 5
> > representative runs)
> > Average rate	315.83%
> > Best rate	131.76%
> > Worst rate	641.25%
> > 
> > Standard deviation (calibrated to average) is ~4% while without the
> > patch we are at 62.82%. 
> > The big variance without the patch is caused by the excessive swapping
> > which doesn't occur with the patch applied.
> > 
> > * Worst run (100%) compared to a random run with the patch
> > pgpgin	pswpin	pswpout	pgmajfault
> > 1.58%	0.00%	0.01%	0.22%
> > 
> > Average size of the LRU lists:
> > nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
> > 52.91%           7234.72%       249.39%          126.64%
> > 
> > * Best run
> > pgpgin	pswpin	pswpout	pgmajfault
> > 3.37%	0.00%	0.11%	0.39%
> > 
> > nr_inactive_anon nr_active_anon nr_inactive_file nr_active_file
> > 49.85%           3868.74%       175.03%          121.27%
> 
> I turned the above into this soundbite:
> 
> : The customer's workload is shmem backed database (80% of RAM) and they are
> : measuring transactions/s with an IO in the background (20%).  Transactions
> : touch more or less random rows in the table.  Total runtime was
> : approximately tripled by commit 64574746 and this patch restores the
> : previous throughput levels.
> 
> Was that truthful?

Total runtime was same for all the runs. It is the number of executed
transactions that was measured. I guess that what you wrote should be
more or less equivalent but it's is not what I have numbers for.
How about:
"
Total number of transactions went down 3 times (in the worst case)
because of commit 64574746. This patch restores the previous numbers.
"

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-17 19:54   ` Johannes Weiner
@ 2012-05-21  2:51     ` Zheng Liu
  -1 siblings, 0 replies; 40+ messages in thread
From: Zheng Liu @ 2012-05-21  2:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > have another round in inactive list because they might be just short
> > lived and so we could consider them again next time. This heuristic
> > helps to reduce pressure on the active list with a streaming IO
> > worklods.
> > This patch fixes a regression introduced by this commit for heavy shmem
> > based workloads because unlike Anon pages, which are excluded from this
> > heuristic because they are usually long lived, shmem pages are handled
> > as a regular page cache.
> > This doesn't work quite well, unfortunately, if the workload is mostly
> > backed by shmem (in memory database sitting on 80% of memory) with a
> > streaming IO in the background (backup - up to 20% of memory). Anon
> > inactive list is full of (dirty) shmem pages when watermarks are
> > hit. Shmem pages are kept in the inactive list (they are referenced)
> > in the first round and it is hard to reclaim anything else so we reach
> > lower scanning priorities very quickly which leads to an excessive swap
> > out.
> > 
> > Let's fix this by excluding all swap backed pages (they tend to be long
> > lived wrt. the regular page cache anyway) from used-once heuristic and
> > rather activate them if they are referenced.
> 
> Yes, the algorithm only makes sense for file cache, which is easy to
> reclaim.  Thanks for the fix!

Hi Johannes,

Out of curiosity, I notice that, in this patch (64574746), the commit log
said that this patch aims to reduce the impact of pages used only once.
Could you please tell why you think these pages will flood the active
list?  How do you find this problem?

Actually, we met a huge regression in our product system.  This
application uses mmap/munmap and read/write simultaneously.  Meanwhile
it wants to keep mapped file pages in memory as much as possible.  But
this patch causes that mapped file pages are reclaimed frequently.  So I
want to know whether or not this patch consider this situation.  Thank
you.

Regards,
Zheng

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-21  2:51     ` Zheng Liu
  0 siblings, 0 replies; 40+ messages in thread
From: Zheng Liu @ 2012-05-21  2:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > have another round in inactive list because they might be just short
> > lived and so we could consider them again next time. This heuristic
> > helps to reduce pressure on the active list with a streaming IO
> > worklods.
> > This patch fixes a regression introduced by this commit for heavy shmem
> > based workloads because unlike Anon pages, which are excluded from this
> > heuristic because they are usually long lived, shmem pages are handled
> > as a regular page cache.
> > This doesn't work quite well, unfortunately, if the workload is mostly
> > backed by shmem (in memory database sitting on 80% of memory) with a
> > streaming IO in the background (backup - up to 20% of memory). Anon
> > inactive list is full of (dirty) shmem pages when watermarks are
> > hit. Shmem pages are kept in the inactive list (they are referenced)
> > in the first round and it is hard to reclaim anything else so we reach
> > lower scanning priorities very quickly which leads to an excessive swap
> > out.
> > 
> > Let's fix this by excluding all swap backed pages (they tend to be long
> > lived wrt. the regular page cache anyway) from used-once heuristic and
> > rather activate them if they are referenced.
> 
> Yes, the algorithm only makes sense for file cache, which is easy to
> reclaim.  Thanks for the fix!

Hi Johannes,

Out of curiosity, I notice that, in this patch (64574746), the commit log
said that this patch aims to reduce the impact of pages used only once.
Could you please tell why you think these pages will flood the active
list?  How do you find this problem?

Actually, we met a huge regression in our product system.  This
application uses mmap/munmap and read/write simultaneously.  Meanwhile
it wants to keep mapped file pages in memory as much as possible.  But
this patch causes that mapped file pages are reclaimed frequently.  So I
want to know whether or not this patch consider this situation.  Thank
you.

Regards,
Zheng

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-21  2:51     ` Zheng Liu
@ 2012-05-21  7:36       ` Johannes Weiner
  -1 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-05-21  7:36 UTC (permalink / raw)
  To: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 10:51:49AM +0800, Zheng Liu wrote:
> On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> > On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > > have another round in inactive list because they might be just short
> > > lived and so we could consider them again next time. This heuristic
> > > helps to reduce pressure on the active list with a streaming IO
> > > worklods.
> > > This patch fixes a regression introduced by this commit for heavy shmem
> > > based workloads because unlike Anon pages, which are excluded from this
> > > heuristic because they are usually long lived, shmem pages are handled
> > > as a regular page cache.
> > > This doesn't work quite well, unfortunately, if the workload is mostly
> > > backed by shmem (in memory database sitting on 80% of memory) with a
> > > streaming IO in the background (backup - up to 20% of memory). Anon
> > > inactive list is full of (dirty) shmem pages when watermarks are
> > > hit. Shmem pages are kept in the inactive list (they are referenced)
> > > in the first round and it is hard to reclaim anything else so we reach
> > > lower scanning priorities very quickly which leads to an excessive swap
> > > out.
> > > 
> > > Let's fix this by excluding all swap backed pages (they tend to be long
> > > lived wrt. the regular page cache anyway) from used-once heuristic and
> > > rather activate them if they are referenced.
> > 
> > Yes, the algorithm only makes sense for file cache, which is easy to
> > reclaim.  Thanks for the fix!
> 
> Hi Johannes,
> 
> Out of curiosity, I notice that, in this patch (64574746), the commit log
> said that this patch aims to reduce the impact of pages used only once.
> Could you please tell why you think these pages will flood the active
> list?  How do you find this problem?

Applications that use mmap for large, linear used-once IO.  Reclaim
used to just activate every mapped file page it encountered for the
first time (activate referenced ones, but they all start referenced) .
This resulted in horrible reclaim latency as most pages in memory
where active.

> Actually, we met a huge regression in our product system.  This
> application uses mmap/munmap and read/write simultaneously.  Meanwhile
> it wants to keep mapped file pages in memory as much as possible.  But
> this patch causes that mapped file pages are reclaimed frequently.  So I
> want to know whether or not this patch consider this situation.  Thank
> you.

Is it because the read()/write() IO is high throughput and pushes
pages through the LRU lists faster than the mmap pages are referenced?

Are the mmap pages executable or shared between tasks?  If so, does
the kernel you are using include '34dbc67 vmscan: promote shared file
mapped pages' and 'c909e99 vmscan: activate executable pages after
first usage'?

All of this is very lame.  I see no way to automatically detect when
you really want to keep mapped pages over unmapped ones.  And making
this assumption hurt some loads, while not making it now hurts others.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-21  7:36       ` Johannes Weiner
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-05-21  7:36 UTC (permalink / raw)
  To: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 10:51:49AM +0800, Zheng Liu wrote:
> On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> > On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > > have another round in inactive list because they might be just short
> > > lived and so we could consider them again next time. This heuristic
> > > helps to reduce pressure on the active list with a streaming IO
> > > worklods.
> > > This patch fixes a regression introduced by this commit for heavy shmem
> > > based workloads because unlike Anon pages, which are excluded from this
> > > heuristic because they are usually long lived, shmem pages are handled
> > > as a regular page cache.
> > > This doesn't work quite well, unfortunately, if the workload is mostly
> > > backed by shmem (in memory database sitting on 80% of memory) with a
> > > streaming IO in the background (backup - up to 20% of memory). Anon
> > > inactive list is full of (dirty) shmem pages when watermarks are
> > > hit. Shmem pages are kept in the inactive list (they are referenced)
> > > in the first round and it is hard to reclaim anything else so we reach
> > > lower scanning priorities very quickly which leads to an excessive swap
> > > out.
> > > 
> > > Let's fix this by excluding all swap backed pages (they tend to be long
> > > lived wrt. the regular page cache anyway) from used-once heuristic and
> > > rather activate them if they are referenced.
> > 
> > Yes, the algorithm only makes sense for file cache, which is easy to
> > reclaim.  Thanks for the fix!
> 
> Hi Johannes,
> 
> Out of curiosity, I notice that, in this patch (64574746), the commit log
> said that this patch aims to reduce the impact of pages used only once.
> Could you please tell why you think these pages will flood the active
> list?  How do you find this problem?

Applications that use mmap for large, linear used-once IO.  Reclaim
used to just activate every mapped file page it encountered for the
first time (activate referenced ones, but they all start referenced) .
This resulted in horrible reclaim latency as most pages in memory
where active.

> Actually, we met a huge regression in our product system.  This
> application uses mmap/munmap and read/write simultaneously.  Meanwhile
> it wants to keep mapped file pages in memory as much as possible.  But
> this patch causes that mapped file pages are reclaimed frequently.  So I
> want to know whether or not this patch consider this situation.  Thank
> you.

Is it because the read()/write() IO is high throughput and pushes
pages through the LRU lists faster than the mmap pages are referenced?

Are the mmap pages executable or shared between tasks?  If so, does
the kernel you are using include '34dbc67 vmscan: promote shared file
mapped pages' and 'c909e99 vmscan: activate executable pages after
first usage'?

All of this is very lame.  I see no way to automatically detect when
you really want to keep mapped pages over unmapped ones.  And making
this assumption hurt some loads, while not making it now hurts others.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-21  7:36       ` Johannes Weiner
@ 2012-05-21  8:59         ` Zheng Liu
  -1 siblings, 0 replies; 40+ messages in thread
From: Zheng Liu @ 2012-05-21  8:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 09:36:32AM +0200, Johannes Weiner wrote:
> On Mon, May 21, 2012 at 10:51:49AM +0800, Zheng Liu wrote:
> > On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> > > On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > > > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > > > have another round in inactive list because they might be just short
> > > > lived and so we could consider them again next time. This heuristic
> > > > helps to reduce pressure on the active list with a streaming IO
> > > > worklods.
> > > > This patch fixes a regression introduced by this commit for heavy shmem
> > > > based workloads because unlike Anon pages, which are excluded from this
> > > > heuristic because they are usually long lived, shmem pages are handled
> > > > as a regular page cache.
> > > > This doesn't work quite well, unfortunately, if the workload is mostly
> > > > backed by shmem (in memory database sitting on 80% of memory) with a
> > > > streaming IO in the background (backup - up to 20% of memory). Anon
> > > > inactive list is full of (dirty) shmem pages when watermarks are
> > > > hit. Shmem pages are kept in the inactive list (they are referenced)
> > > > in the first round and it is hard to reclaim anything else so we reach
> > > > lower scanning priorities very quickly which leads to an excessive swap
> > > > out.
> > > > 
> > > > Let's fix this by excluding all swap backed pages (they tend to be long
> > > > lived wrt. the regular page cache anyway) from used-once heuristic and
> > > > rather activate them if they are referenced.
> > > 
> > > Yes, the algorithm only makes sense for file cache, which is easy to
> > > reclaim.  Thanks for the fix!
> > 
> > Hi Johannes,
> > 
> > Out of curiosity, I notice that, in this patch (64574746), the commit log
> > said that this patch aims to reduce the impact of pages used only once.
> > Could you please tell why you think these pages will flood the active
> > list?  How do you find this problem?
> 
> Applications that use mmap for large, linear used-once IO.  Reclaim
> used to just activate every mapped file page it encountered for the
> first time (activate referenced ones, but they all start referenced) .
> This resulted in horrible reclaim latency as most pages in memory
> where active.

Thanks for your explanation. :-)

> 
> > Actually, we met a huge regression in our product system.  This
> > application uses mmap/munmap and read/write simultaneously.  Meanwhile
> > it wants to keep mapped file pages in memory as much as possible.  But
> > this patch causes that mapped file pages are reclaimed frequently.  So I
> > want to know whether or not this patch consider this situation.  Thank
> > you.
> 
> Is it because the read()/write() IO is high throughput and pushes
> pages through the LRU lists faster than the mmap pages are referenced?

Yes, in this application, one query needs to access mapped file page
twice and file page cache twice.  Namely, one query needs to do 4 disk
I/Os.  We have used fadvise(2) to reduce file page cache accessing to
only once.  For mapped file page, in fact them are accessed only once
because in one query the same data is accessed twice.  Thus, one query
causes 2 disk I/Os now.  The size of read/write is quite larger than
mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
as much as possible, we will gain the better performance.

> 
> Are the mmap pages executable or shared between tasks?  If so, does
> the kernel you are using include '34dbc67 vmscan: promote shared file
> mapped pages' and 'c909e99 vmscan: activate executable pages after
> first usage'?

Thanks for your advice.  Our application has only one process.  So I
think that 34dbc67 is not useful for this application.  We have tried to
mmap file with PROT_EXEC flag to use this patch (c909e99).  But it seems
that the result is not good as we expected.

In addition, another factor also has some impacts for this application.
In inactive_file_is_low_global(), it is different between 2.6.18 and
upstream kernel.  IMHO, it causes that mapped file pages in active list
are moved into inactive list frequently.

Currently, we add a parameter in inactive_file_is_low_global() to adjust
this ratio.  Meanwhile we activate every mapped file pages for the first
time.  Then the performance gets better, but it still doesn't reach the
performance of 2.6.18.

> 
> All of this is very lame.  I see no way to automatically detect when
> you really want to keep mapped pages over unmapped ones.  And making
> this assumption hurt some loads, while not making it now hurts others.

Yeah, as you said, this kind of changes always hurts some loads and
doesn't hurt others. ;-)

Regards,
Zheng

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-21  8:59         ` Zheng Liu
  0 siblings, 0 replies; 40+ messages in thread
From: Zheng Liu @ 2012-05-21  8:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 09:36:32AM +0200, Johannes Weiner wrote:
> On Mon, May 21, 2012 at 10:51:49AM +0800, Zheng Liu wrote:
> > On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> > > On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > > > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > > > have another round in inactive list because they might be just short
> > > > lived and so we could consider them again next time. This heuristic
> > > > helps to reduce pressure on the active list with a streaming IO
> > > > worklods.
> > > > This patch fixes a regression introduced by this commit for heavy shmem
> > > > based workloads because unlike Anon pages, which are excluded from this
> > > > heuristic because they are usually long lived, shmem pages are handled
> > > > as a regular page cache.
> > > > This doesn't work quite well, unfortunately, if the workload is mostly
> > > > backed by shmem (in memory database sitting on 80% of memory) with a
> > > > streaming IO in the background (backup - up to 20% of memory). Anon
> > > > inactive list is full of (dirty) shmem pages when watermarks are
> > > > hit. Shmem pages are kept in the inactive list (they are referenced)
> > > > in the first round and it is hard to reclaim anything else so we reach
> > > > lower scanning priorities very quickly which leads to an excessive swap
> > > > out.
> > > > 
> > > > Let's fix this by excluding all swap backed pages (they tend to be long
> > > > lived wrt. the regular page cache anyway) from used-once heuristic and
> > > > rather activate them if they are referenced.
> > > 
> > > Yes, the algorithm only makes sense for file cache, which is easy to
> > > reclaim.  Thanks for the fix!
> > 
> > Hi Johannes,
> > 
> > Out of curiosity, I notice that, in this patch (64574746), the commit log
> > said that this patch aims to reduce the impact of pages used only once.
> > Could you please tell why you think these pages will flood the active
> > list?  How do you find this problem?
> 
> Applications that use mmap for large, linear used-once IO.  Reclaim
> used to just activate every mapped file page it encountered for the
> first time (activate referenced ones, but they all start referenced) .
> This resulted in horrible reclaim latency as most pages in memory
> where active.

Thanks for your explanation. :-)

> 
> > Actually, we met a huge regression in our product system.  This
> > application uses mmap/munmap and read/write simultaneously.  Meanwhile
> > it wants to keep mapped file pages in memory as much as possible.  But
> > this patch causes that mapped file pages are reclaimed frequently.  So I
> > want to know whether or not this patch consider this situation.  Thank
> > you.
> 
> Is it because the read()/write() IO is high throughput and pushes
> pages through the LRU lists faster than the mmap pages are referenced?

Yes, in this application, one query needs to access mapped file page
twice and file page cache twice.  Namely, one query needs to do 4 disk
I/Os.  We have used fadvise(2) to reduce file page cache accessing to
only once.  For mapped file page, in fact them are accessed only once
because in one query the same data is accessed twice.  Thus, one query
causes 2 disk I/Os now.  The size of read/write is quite larger than
mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
as much as possible, we will gain the better performance.

> 
> Are the mmap pages executable or shared between tasks?  If so, does
> the kernel you are using include '34dbc67 vmscan: promote shared file
> mapped pages' and 'c909e99 vmscan: activate executable pages after
> first usage'?

Thanks for your advice.  Our application has only one process.  So I
think that 34dbc67 is not useful for this application.  We have tried to
mmap file with PROT_EXEC flag to use this patch (c909e99).  But it seems
that the result is not good as we expected.

In addition, another factor also has some impacts for this application.
In inactive_file_is_low_global(), it is different between 2.6.18 and
upstream kernel.  IMHO, it causes that mapped file pages in active list
are moved into inactive list frequently.

Currently, we add a parameter in inactive_file_is_low_global() to adjust
this ratio.  Meanwhile we activate every mapped file pages for the first
time.  Then the performance gets better, but it still doesn't reach the
performance of 2.6.18.

> 
> All of this is very lame.  I see no way to automatically detect when
> you really want to keep mapped pages over unmapped ones.  And making
> this assumption hurt some loads, while not making it now hurts others.

Yeah, as you said, this kind of changes always hurts some loads and
doesn't hurt others. ;-)

Regards,
Zheng

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-21  8:59         ` Zheng Liu
@ 2012-05-21  9:37           ` Johannes Weiner
  -1 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-05-21  9:37 UTC (permalink / raw)
  To: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 04:59:52PM +0800, Zheng Liu wrote:
> On Mon, May 21, 2012 at 09:36:32AM +0200, Johannes Weiner wrote:
> > On Mon, May 21, 2012 at 10:51:49AM +0800, Zheng Liu wrote:
> > > On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> > > > On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > > > > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > > > > have another round in inactive list because they might be just short
> > > > > lived and so we could consider them again next time. This heuristic
> > > > > helps to reduce pressure on the active list with a streaming IO
> > > > > worklods.
> > > > > This patch fixes a regression introduced by this commit for heavy shmem
> > > > > based workloads because unlike Anon pages, which are excluded from this
> > > > > heuristic because they are usually long lived, shmem pages are handled
> > > > > as a regular page cache.
> > > > > This doesn't work quite well, unfortunately, if the workload is mostly
> > > > > backed by shmem (in memory database sitting on 80% of memory) with a
> > > > > streaming IO in the background (backup - up to 20% of memory). Anon
> > > > > inactive list is full of (dirty) shmem pages when watermarks are
> > > > > hit. Shmem pages are kept in the inactive list (they are referenced)
> > > > > in the first round and it is hard to reclaim anything else so we reach
> > > > > lower scanning priorities very quickly which leads to an excessive swap
> > > > > out.
> > > > > 
> > > > > Let's fix this by excluding all swap backed pages (they tend to be long
> > > > > lived wrt. the regular page cache anyway) from used-once heuristic and
> > > > > rather activate them if they are referenced.
> > > > 
> > > > Yes, the algorithm only makes sense for file cache, which is easy to
> > > > reclaim.  Thanks for the fix!
> > > 
> > > Hi Johannes,
> > > 
> > > Out of curiosity, I notice that, in this patch (64574746), the commit log
> > > said that this patch aims to reduce the impact of pages used only once.
> > > Could you please tell why you think these pages will flood the active
> > > list?  How do you find this problem?
> > 
> > Applications that use mmap for large, linear used-once IO.  Reclaim
> > used to just activate every mapped file page it encountered for the
> > first time (activate referenced ones, but they all start referenced) .
> > This resulted in horrible reclaim latency as most pages in memory
> > where active.
> 
> Thanks for your explanation. :-)
> 
> > 
> > > Actually, we met a huge regression in our product system.  This
> > > application uses mmap/munmap and read/write simultaneously.  Meanwhile
> > > it wants to keep mapped file pages in memory as much as possible.  But
> > > this patch causes that mapped file pages are reclaimed frequently.  So I
> > > want to know whether or not this patch consider this situation.  Thank
> > > you.
> > 
> > Is it because the read()/write() IO is high throughput and pushes
> > pages through the LRU lists faster than the mmap pages are referenced?
> 
> Yes, in this application, one query needs to access mapped file page
> twice and file page cache twice.  Namely, one query needs to do 4 disk
> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> only once.  For mapped file page, in fact them are accessed only once
> because in one query the same data is accessed twice.  Thus, one query
> causes 2 disk I/Os now.  The size of read/write is quite larger than
> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> as much as possible, we will gain the better performance.

You access the same unmapped cache twice, i.e. repeated reads or
writes against the same file offset?

How do you use fadvise?

> > Are the mmap pages executable or shared between tasks?  If so, does
> > the kernel you are using include '34dbc67 vmscan: promote shared file
> > mapped pages' and 'c909e99 vmscan: activate executable pages after
> > first usage'?
> 
> Thanks for your advice.  Our application has only one process.  So I
> think that 34dbc67 is not useful for this application.  We have tried to
> mmap file with PROT_EXEC flag to use this patch (c909e99).  But it seems
> that the result is not good as we expected.

Used-once detection should not apply to executably mapped pages at all
and just activate the page as before.  So I think there must be more
going on.

> In addition, another factor also has some impacts for this application.
> In inactive_file_is_low_global(), it is different between 2.6.18 and
> upstream kernel.  IMHO, it causes that mapped file pages in active list
> are moved into inactive list frequently.
> 
> Currently, we add a parameter in inactive_file_is_low_global() to adjust
> this ratio.  Meanwhile we activate every mapped file pages for the first
> time.  Then the performance gets better, but it still doesn't reach the
> performance of 2.6.18.

2.6.18 didn't have the active list protection at all and always
forcibly deactivated pages during reclaim.  Have you tried fully
reverting to this by making inactive_file_is_low_global() return true
unconditionally?

Could it be that by accessing your "used-once" unmapped cache twice in
short succession, you accidentally activate it all?

Thereby having ONLY mapped file pages on the inactive list, adding to
the pressure on them?

And, by having the wrong pages on the active list, actually benefit
from the active list not being protected from inactive list cycle
speed and instead pushed out quickly again?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-21  9:37           ` Johannes Weiner
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-05-21  9:37 UTC (permalink / raw)
  To: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 04:59:52PM +0800, Zheng Liu wrote:
> On Mon, May 21, 2012 at 09:36:32AM +0200, Johannes Weiner wrote:
> > On Mon, May 21, 2012 at 10:51:49AM +0800, Zheng Liu wrote:
> > > On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> > > > On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > > > > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > > > > have another round in inactive list because they might be just short
> > > > > lived and so we could consider them again next time. This heuristic
> > > > > helps to reduce pressure on the active list with a streaming IO
> > > > > worklods.
> > > > > This patch fixes a regression introduced by this commit for heavy shmem
> > > > > based workloads because unlike Anon pages, which are excluded from this
> > > > > heuristic because they are usually long lived, shmem pages are handled
> > > > > as a regular page cache.
> > > > > This doesn't work quite well, unfortunately, if the workload is mostly
> > > > > backed by shmem (in memory database sitting on 80% of memory) with a
> > > > > streaming IO in the background (backup - up to 20% of memory). Anon
> > > > > inactive list is full of (dirty) shmem pages when watermarks are
> > > > > hit. Shmem pages are kept in the inactive list (they are referenced)
> > > > > in the first round and it is hard to reclaim anything else so we reach
> > > > > lower scanning priorities very quickly which leads to an excessive swap
> > > > > out.
> > > > > 
> > > > > Let's fix this by excluding all swap backed pages (they tend to be long
> > > > > lived wrt. the regular page cache anyway) from used-once heuristic and
> > > > > rather activate them if they are referenced.
> > > > 
> > > > Yes, the algorithm only makes sense for file cache, which is easy to
> > > > reclaim.  Thanks for the fix!
> > > 
> > > Hi Johannes,
> > > 
> > > Out of curiosity, I notice that, in this patch (64574746), the commit log
> > > said that this patch aims to reduce the impact of pages used only once.
> > > Could you please tell why you think these pages will flood the active
> > > list?  How do you find this problem?
> > 
> > Applications that use mmap for large, linear used-once IO.  Reclaim
> > used to just activate every mapped file page it encountered for the
> > first time (activate referenced ones, but they all start referenced) .
> > This resulted in horrible reclaim latency as most pages in memory
> > where active.
> 
> Thanks for your explanation. :-)
> 
> > 
> > > Actually, we met a huge regression in our product system.  This
> > > application uses mmap/munmap and read/write simultaneously.  Meanwhile
> > > it wants to keep mapped file pages in memory as much as possible.  But
> > > this patch causes that mapped file pages are reclaimed frequently.  So I
> > > want to know whether or not this patch consider this situation.  Thank
> > > you.
> > 
> > Is it because the read()/write() IO is high throughput and pushes
> > pages through the LRU lists faster than the mmap pages are referenced?
> 
> Yes, in this application, one query needs to access mapped file page
> twice and file page cache twice.  Namely, one query needs to do 4 disk
> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> only once.  For mapped file page, in fact them are accessed only once
> because in one query the same data is accessed twice.  Thus, one query
> causes 2 disk I/Os now.  The size of read/write is quite larger than
> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> as much as possible, we will gain the better performance.

You access the same unmapped cache twice, i.e. repeated reads or
writes against the same file offset?

How do you use fadvise?

> > Are the mmap pages executable or shared between tasks?  If so, does
> > the kernel you are using include '34dbc67 vmscan: promote shared file
> > mapped pages' and 'c909e99 vmscan: activate executable pages after
> > first usage'?
> 
> Thanks for your advice.  Our application has only one process.  So I
> think that 34dbc67 is not useful for this application.  We have tried to
> mmap file with PROT_EXEC flag to use this patch (c909e99).  But it seems
> that the result is not good as we expected.

Used-once detection should not apply to executably mapped pages at all
and just activate the page as before.  So I think there must be more
going on.

> In addition, another factor also has some impacts for this application.
> In inactive_file_is_low_global(), it is different between 2.6.18 and
> upstream kernel.  IMHO, it causes that mapped file pages in active list
> are moved into inactive list frequently.
> 
> Currently, we add a parameter in inactive_file_is_low_global() to adjust
> this ratio.  Meanwhile we activate every mapped file pages for the first
> time.  Then the performance gets better, but it still doesn't reach the
> performance of 2.6.18.

2.6.18 didn't have the active list protection at all and always
forcibly deactivated pages during reclaim.  Have you tried fully
reverting to this by making inactive_file_is_low_global() return true
unconditionally?

Could it be that by accessing your "used-once" unmapped cache twice in
short succession, you accidentally activate it all?

Thereby having ONLY mapped file pages on the inactive list, adding to
the pressure on them?

And, by having the wrong pages on the active list, actually benefit
from the active list not being protected from inactive list cycle
speed and instead pushed out quickly again?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-21  9:37           ` Johannes Weiner
@ 2012-05-21 11:07             ` Zheng Liu
  -1 siblings, 0 replies; 40+ messages in thread
From: Zheng Liu @ 2012-05-21 11:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
[snip]
> > > Is it because the read()/write() IO is high throughput and pushes
> > > pages through the LRU lists faster than the mmap pages are referenced?
> > 
> > Yes, in this application, one query needs to access mapped file page
> > twice and file page cache twice.  Namely, one query needs to do 4 disk
> > I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> > only once.  For mapped file page, in fact them are accessed only once
> > because in one query the same data is accessed twice.  Thus, one query
> > causes 2 disk I/Os now.  The size of read/write is quite larger than
> > mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> > as much as possible, we will gain the better performance.
> 
> You access the same unmapped cache twice, i.e. repeated reads or
> writes against the same file offset?

No.  We access the same mapped file twice.

> 
> How do you use fadvise?

We access the header and content of the file respectively using read/write.
The header and content are sequentially.  So we use fadivse(2) with
FADV_WILLNEED flag to do a readahead.

> > In addition, another factor also has some impacts for this application.
> > In inactive_file_is_low_global(), it is different between 2.6.18 and
> > upstream kernel.  IMHO, it causes that mapped file pages in active list
> > are moved into inactive list frequently.
> > 
> > Currently, we add a parameter in inactive_file_is_low_global() to adjust
> > this ratio.  Meanwhile we activate every mapped file pages for the first
> > time.  Then the performance gets better, but it still doesn't reach the
> > performance of 2.6.18.
> 
> 2.6.18 didn't have the active list protection at all and always
> forcibly deactivated pages during reclaim.  Have you tried fully
> reverting to this by making inactive_file_is_low_global() return true
> unconditionally?

No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
it doesn't always forcibly deactivate the pages.  I remember that in
2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
doesn't reclaim mapped file pages.  By this equation, if the sum of the
anonymous pages and mapped file pages is not greater than the 50% of
total pages, we don't deactivate these pages.  Am I missing something?

> Could it be that by accessing your "used-once" unmapped cache twice in
> short succession, you accidentally activate it all?

It could not happen.  Certainly it is possible to access a file twice at
the same offset in product system.  That is reason why we use buffered
IO rather than direct IO.  But in testing system we could not access the
same file twice at the same offset.

> Thereby having ONLY mapped file pages on the inactive list, adding to
> the pressure on them?
> 
> And, by having the wrong pages on the active list, actually benefit
> from the active list not being protected from inactive list cycle
> speed and instead pushed out quickly again?

Sorry, you mean that in 2.6.18 kernel it benefits from the wrong pages
on the active list, isn't it?

Regards,
Zheng

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-05-21 11:07             ` Zheng Liu
  0 siblings, 0 replies; 40+ messages in thread
From: Zheng Liu @ 2012-05-21 11:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
[snip]
> > > Is it because the read()/write() IO is high throughput and pushes
> > > pages through the LRU lists faster than the mmap pages are referenced?
> > 
> > Yes, in this application, one query needs to access mapped file page
> > twice and file page cache twice.  Namely, one query needs to do 4 disk
> > I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> > only once.  For mapped file page, in fact them are accessed only once
> > because in one query the same data is accessed twice.  Thus, one query
> > causes 2 disk I/Os now.  The size of read/write is quite larger than
> > mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> > as much as possible, we will gain the better performance.
> 
> You access the same unmapped cache twice, i.e. repeated reads or
> writes against the same file offset?

No.  We access the same mapped file twice.

> 
> How do you use fadvise?

We access the header and content of the file respectively using read/write.
The header and content are sequentially.  So we use fadivse(2) with
FADV_WILLNEED flag to do a readahead.

> > In addition, another factor also has some impacts for this application.
> > In inactive_file_is_low_global(), it is different between 2.6.18 and
> > upstream kernel.  IMHO, it causes that mapped file pages in active list
> > are moved into inactive list frequently.
> > 
> > Currently, we add a parameter in inactive_file_is_low_global() to adjust
> > this ratio.  Meanwhile we activate every mapped file pages for the first
> > time.  Then the performance gets better, but it still doesn't reach the
> > performance of 2.6.18.
> 
> 2.6.18 didn't have the active list protection at all and always
> forcibly deactivated pages during reclaim.  Have you tried fully
> reverting to this by making inactive_file_is_low_global() return true
> unconditionally?

No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
it doesn't always forcibly deactivate the pages.  I remember that in
2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
doesn't reclaim mapped file pages.  By this equation, if the sum of the
anonymous pages and mapped file pages is not greater than the 50% of
total pages, we don't deactivate these pages.  Am I missing something?

> Could it be that by accessing your "used-once" unmapped cache twice in
> short succession, you accidentally activate it all?

It could not happen.  Certainly it is possible to access a file twice at
the same offset in product system.  That is reason why we use buffered
IO rather than direct IO.  But in testing system we could not access the
same file twice at the same offset.

> Thereby having ONLY mapped file pages on the inactive list, adding to
> the pressure on them?
> 
> And, by having the wrong pages on the active list, actually benefit
> from the active list not being protected from inactive list cycle
> speed and instead pushed out quickly again?

Sorry, you mean that in 2.6.18 kernel it benefits from the wrong pages
on the active list, isn't it?

Regards,
Zheng

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-05-21 11:07             ` Zheng Liu
@ 2012-06-23 11:04               ` Johannes Weiner
  -1 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-06-23 11:04 UTC (permalink / raw)
  To: Zheng Liu
  Cc: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
> [snip]
> > > > Is it because the read()/write() IO is high throughput and pushes
> > > > pages through the LRU lists faster than the mmap pages are referenced?
> > > 
> > > Yes, in this application, one query needs to access mapped file page
> > > twice and file page cache twice.  Namely, one query needs to do 4 disk
> > > I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> > > only once.  For mapped file page, in fact them are accessed only once
> > > because in one query the same data is accessed twice.  Thus, one query
> > > causes 2 disk I/Os now.  The size of read/write is quite larger than
> > > mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> > > as much as possible, we will gain the better performance.
> > 
> > You access the same unmapped cache twice, i.e. repeated reads or
> > writes against the same file offset?
> 
> No.  We access the same mapped file twice.
> 
> > 
> > How do you use fadvise?
> 
> We access the header and content of the file respectively using read/write.
> The header and content are sequentially.  So we use fadivse(2) with
> FADV_WILLNEED flag to do a readahead.
> 
> > > In addition, another factor also has some impacts for this application.
> > > In inactive_file_is_low_global(), it is different between 2.6.18 and
> > > upstream kernel.  IMHO, it causes that mapped file pages in active list
> > > are moved into inactive list frequently.
> > > 
> > > Currently, we add a parameter in inactive_file_is_low_global() to adjust
> > > this ratio.  Meanwhile we activate every mapped file pages for the first
> > > time.  Then the performance gets better, but it still doesn't reach the
> > > performance of 2.6.18.
> > 
> > 2.6.18 didn't have the active list protection at all and always
> > forcibly deactivated pages during reclaim.  Have you tried fully
> > reverting to this by making inactive_file_is_low_global() return true
> > unconditionally?
> 
> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
> it doesn't always forcibly deactivate the pages.  I remember that in
> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
> doesn't reclaim mapped file pages.  By this equation, if the sum of the
> anonymous pages and mapped file pages is not greater than the 50% of
> total pages, we don't deactivate these pages.  Am I missing something?

I think we need to go back to protecting mapped pages based on how
much of reclaimable memory they make up, one way or another.

Minchan suggested recently to have a separate LRU list for easily
reclaimable pages.  If we balance the lists according to relative
size, we have pressure on mapped pages dictated by availability of
clean cache that is easier to reclaim.

Rik, Minchan, what do you think?

> > Could it be that by accessing your "used-once" unmapped cache twice in
> > short succession, you accidentally activate it all?
> 
> It could not happen.  Certainly it is possible to access a file twice at
> the same offset in product system.  That is reason why we use buffered
> IO rather than direct IO.  But in testing system we could not access the
> same file twice at the same offset.
> 
> > Thereby having ONLY mapped file pages on the inactive list, adding to
> > the pressure on them?
> > 
> > And, by having the wrong pages on the active list, actually benefit
> > from the active list not being protected from inactive list cycle
> > speed and instead pushed out quickly again?
> 
> Sorry, you mean that in 2.6.18 kernel it benefits from the wrong pages
> on the active list, isn't it?

I meant that at least 2.6.18 wouldn't be so eager to protect active
pages, which a workload with many "false" active pages would benefit
from.  But it's a moot point, as it's not what happens in your case.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-06-23 11:04               ` Johannes Weiner
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-06-23 11:04 UTC (permalink / raw)
  To: Zheng Liu
  Cc: Michal Hocko, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Minchan Kim, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
> [snip]
> > > > Is it because the read()/write() IO is high throughput and pushes
> > > > pages through the LRU lists faster than the mmap pages are referenced?
> > > 
> > > Yes, in this application, one query needs to access mapped file page
> > > twice and file page cache twice.  Namely, one query needs to do 4 disk
> > > I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> > > only once.  For mapped file page, in fact them are accessed only once
> > > because in one query the same data is accessed twice.  Thus, one query
> > > causes 2 disk I/Os now.  The size of read/write is quite larger than
> > > mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> > > as much as possible, we will gain the better performance.
> > 
> > You access the same unmapped cache twice, i.e. repeated reads or
> > writes against the same file offset?
> 
> No.  We access the same mapped file twice.
> 
> > 
> > How do you use fadvise?
> 
> We access the header and content of the file respectively using read/write.
> The header and content are sequentially.  So we use fadivse(2) with
> FADV_WILLNEED flag to do a readahead.
> 
> > > In addition, another factor also has some impacts for this application.
> > > In inactive_file_is_low_global(), it is different between 2.6.18 and
> > > upstream kernel.  IMHO, it causes that mapped file pages in active list
> > > are moved into inactive list frequently.
> > > 
> > > Currently, we add a parameter in inactive_file_is_low_global() to adjust
> > > this ratio.  Meanwhile we activate every mapped file pages for the first
> > > time.  Then the performance gets better, but it still doesn't reach the
> > > performance of 2.6.18.
> > 
> > 2.6.18 didn't have the active list protection at all and always
> > forcibly deactivated pages during reclaim.  Have you tried fully
> > reverting to this by making inactive_file_is_low_global() return true
> > unconditionally?
> 
> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
> it doesn't always forcibly deactivate the pages.  I remember that in
> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
> doesn't reclaim mapped file pages.  By this equation, if the sum of the
> anonymous pages and mapped file pages is not greater than the 50% of
> total pages, we don't deactivate these pages.  Am I missing something?

I think we need to go back to protecting mapped pages based on how
much of reclaimable memory they make up, one way or another.

Minchan suggested recently to have a separate LRU list for easily
reclaimable pages.  If we balance the lists according to relative
size, we have pressure on mapped pages dictated by availability of
clean cache that is easier to reclaim.

Rik, Minchan, what do you think?

> > Could it be that by accessing your "used-once" unmapped cache twice in
> > short succession, you accidentally activate it all?
> 
> It could not happen.  Certainly it is possible to access a file twice at
> the same offset in product system.  That is reason why we use buffered
> IO rather than direct IO.  But in testing system we could not access the
> same file twice at the same offset.
> 
> > Thereby having ONLY mapped file pages on the inactive list, adding to
> > the pressure on them?
> > 
> > And, by having the wrong pages on the active list, actually benefit
> > from the active list not being protected from inactive list cycle
> > speed and instead pushed out quickly again?
> 
> Sorry, you mean that in 2.6.18 kernel it benefits from the wrong pages
> on the active list, isn't it?

I meant that at least 2.6.18 wouldn't be so eager to protect active
pages, which a workload with many "false" active pages would benefit
from.  But it's a moot point, as it's not what happens in your case.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-06-23 11:04               ` Johannes Weiner
@ 2012-06-23 15:22                 ` Rik van Riel
  -1 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2012-06-23 15:22 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki

On 06/23/2012 07:04 AM, Johannes Weiner wrote:

> I think we need to go back to protecting mapped pages based on how
> much of reclaimable memory they make up, one way or another.
>
> Minchan suggested recently to have a separate LRU list for easily
> reclaimable pages.  If we balance the lists according to relative
> size, we have pressure on mapped pages dictated by availability of
> clean cache that is easier to reclaim.
>
> Rik, Minchan, what do you think?

I believe the main reclaim balancings should probably be
"cgroup vs cgroup", "anon vs file" and "active vs inactive".

However, there is some room for sorting within the inactive
file list, to ensure we actually have a bunch of easily
reclaimable pages available at all times.

This would not work on the anon side, because there are
workloads that fill up 95% of memory with working set,
but on the file side it could actually solve some real
issues.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-06-23 15:22                 ` Rik van Riel
  0 siblings, 0 replies; 40+ messages in thread
From: Rik van Riel @ 2012-06-23 15:22 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, Minchan Kim, KAMEZAWA Hiroyuki

On 06/23/2012 07:04 AM, Johannes Weiner wrote:

> I think we need to go back to protecting mapped pages based on how
> much of reclaimable memory they make up, one way or another.
>
> Minchan suggested recently to have a separate LRU list for easily
> reclaimable pages.  If we balance the lists according to relative
> size, we have pressure on mapped pages dictated by availability of
> clean cache that is easier to reclaim.
>
> Rik, Minchan, what do you think?

I believe the main reclaim balancings should probably be
"cgroup vs cgroup", "anon vs file" and "active vs inactive".

However, there is some room for sorting within the inactive
file list, to ensure we actually have a bunch of easily
reclaimable pages available at all times.

This would not work on the anon side, because there are
workloads that fill up 95% of memory with working set,
but on the file side it could actually solve some real
issues.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-06-23 11:04               ` Johannes Weiner
@ 2012-06-24 23:53                 ` Minchan Kim
  -1 siblings, 0 replies; 40+ messages in thread
From: Minchan Kim @ 2012-06-24 23:53 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

Hi Hannes,

On 06/23/2012 08:04 PM, Johannes Weiner wrote:

> On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
>> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
>> [snip]
>>>>> Is it because the read()/write() IO is high throughput and pushes
>>>>> pages through the LRU lists faster than the mmap pages are referenced?
>>>>
>>>> Yes, in this application, one query needs to access mapped file page
>>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
>>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
>>>> only once.  For mapped file page, in fact them are accessed only once
>>>> because in one query the same data is accessed twice.  Thus, one query
>>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
>>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
>>>> as much as possible, we will gain the better performance.
>>>
>>> You access the same unmapped cache twice, i.e. repeated reads or
>>> writes against the same file offset?
>>
>> No.  We access the same mapped file twice.
>>
>>>
>>> How do you use fadvise?
>>
>> We access the header and content of the file respectively using read/write.
>> The header and content are sequentially.  So we use fadivse(2) with
>> FADV_WILLNEED flag to do a readahead.
>>
>>>> In addition, another factor also has some impacts for this application.
>>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
>>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
>>>> are moved into inactive list frequently.
>>>>
>>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
>>>> this ratio.  Meanwhile we activate every mapped file pages for the first
>>>> time.  Then the performance gets better, but it still doesn't reach the
>>>> performance of 2.6.18.
>>>
>>> 2.6.18 didn't have the active list protection at all and always
>>> forcibly deactivated pages during reclaim.  Have you tried fully
>>> reverting to this by making inactive_file_is_low_global() return true
>>> unconditionally?
>>
>> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
>> it doesn't always forcibly deactivate the pages.  I remember that in
>> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
>> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
>> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
>> doesn't reclaim mapped file pages.  By this equation, if the sum of the
>> anonymous pages and mapped file pages is not greater than the 50% of
>> total pages, we don't deactivate these pages.  Am I missing something?
> 
> I think we need to go back to protecting mapped pages based on how
> much of reclaimable memory they make up, one way or another.


I partly agreed it with POV regression.
But I would like to understand rationale of "Why we should handle specially mmapped page".
In case of code pages(VM_EXEC), we already have handled it specially and
I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
_frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
compared to other pages.
But in case of mapped data pages, why we should handle specially?
I guess mapped data pages would have higher access chance than unmapped page because
unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
has a owner above.

Doesn't it make sense?

If we don't have any rationale, I would like to add new explicit API(ex, madvise(WORING_SET))
rather than depending VM internal implementation.

> 
> Minchan suggested recently to have a separate LRU list for easily
> reclaimable pages.  If we balance the lists according to relative
> size, we have pressure on mapped pages dictated by availability of
> clean cache that is easier to reclaim.

> 

> Rik, Minchan, what do you think?


Yes. with Ereclaimable LRU list, we could do it. :)

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-06-24 23:53                 ` Minchan Kim
  0 siblings, 0 replies; 40+ messages in thread
From: Minchan Kim @ 2012-06-24 23:53 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

Hi Hannes,

On 06/23/2012 08:04 PM, Johannes Weiner wrote:

> On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
>> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
>> [snip]
>>>>> Is it because the read()/write() IO is high throughput and pushes
>>>>> pages through the LRU lists faster than the mmap pages are referenced?
>>>>
>>>> Yes, in this application, one query needs to access mapped file page
>>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
>>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
>>>> only once.  For mapped file page, in fact them are accessed only once
>>>> because in one query the same data is accessed twice.  Thus, one query
>>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
>>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
>>>> as much as possible, we will gain the better performance.
>>>
>>> You access the same unmapped cache twice, i.e. repeated reads or
>>> writes against the same file offset?
>>
>> No.  We access the same mapped file twice.
>>
>>>
>>> How do you use fadvise?
>>
>> We access the header and content of the file respectively using read/write.
>> The header and content are sequentially.  So we use fadivse(2) with
>> FADV_WILLNEED flag to do a readahead.
>>
>>>> In addition, another factor also has some impacts for this application.
>>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
>>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
>>>> are moved into inactive list frequently.
>>>>
>>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
>>>> this ratio.  Meanwhile we activate every mapped file pages for the first
>>>> time.  Then the performance gets better, but it still doesn't reach the
>>>> performance of 2.6.18.
>>>
>>> 2.6.18 didn't have the active list protection at all and always
>>> forcibly deactivated pages during reclaim.  Have you tried fully
>>> reverting to this by making inactive_file_is_low_global() return true
>>> unconditionally?
>>
>> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
>> it doesn't always forcibly deactivate the pages.  I remember that in
>> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
>> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
>> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
>> doesn't reclaim mapped file pages.  By this equation, if the sum of the
>> anonymous pages and mapped file pages is not greater than the 50% of
>> total pages, we don't deactivate these pages.  Am I missing something?
> 
> I think we need to go back to protecting mapped pages based on how
> much of reclaimable memory they make up, one way or another.


I partly agreed it with POV regression.
But I would like to understand rationale of "Why we should handle specially mmapped page".
In case of code pages(VM_EXEC), we already have handled it specially and
I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
_frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
compared to other pages.
But in case of mapped data pages, why we should handle specially?
I guess mapped data pages would have higher access chance than unmapped page because
unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
has a owner above.

Doesn't it make sense?

If we don't have any rationale, I would like to add new explicit API(ex, madvise(WORING_SET))
rather than depending VM internal implementation.

> 
> Minchan suggested recently to have a separate LRU list for easily
> reclaimable pages.  If we balance the lists according to relative
> size, we have pressure on mapped pages dictated by availability of
> clean cache that is easier to reclaim.

> 

> Rik, Minchan, what do you think?


Yes. with Ereclaimable LRU list, we could do it. :)

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-06-24 23:53                 ` Minchan Kim
@ 2012-06-25  8:08                   ` Johannes Weiner
  -1 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-06-25  8:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, Jun 25, 2012 at 08:53:11AM +0900, Minchan Kim wrote:
> Hi Hannes,
> 
> On 06/23/2012 08:04 PM, Johannes Weiner wrote:
> 
> > On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
> >> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
> >> [snip]
> >>>>> Is it because the read()/write() IO is high throughput and pushes
> >>>>> pages through the LRU lists faster than the mmap pages are referenced?
> >>>>
> >>>> Yes, in this application, one query needs to access mapped file page
> >>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
> >>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> >>>> only once.  For mapped file page, in fact them are accessed only once
> >>>> because in one query the same data is accessed twice.  Thus, one query
> >>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
> >>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> >>>> as much as possible, we will gain the better performance.
> >>>
> >>> You access the same unmapped cache twice, i.e. repeated reads or
> >>> writes against the same file offset?
> >>
> >> No.  We access the same mapped file twice.
> >>
> >>>
> >>> How do you use fadvise?
> >>
> >> We access the header and content of the file respectively using read/write.
> >> The header and content are sequentially.  So we use fadivse(2) with
> >> FADV_WILLNEED flag to do a readahead.
> >>
> >>>> In addition, another factor also has some impacts for this application.
> >>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
> >>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
> >>>> are moved into inactive list frequently.
> >>>>
> >>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
> >>>> this ratio.  Meanwhile we activate every mapped file pages for the first
> >>>> time.  Then the performance gets better, but it still doesn't reach the
> >>>> performance of 2.6.18.
> >>>
> >>> 2.6.18 didn't have the active list protection at all and always
> >>> forcibly deactivated pages during reclaim.  Have you tried fully
> >>> reverting to this by making inactive_file_is_low_global() return true
> >>> unconditionally?
> >>
> >> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
> >> it doesn't always forcibly deactivate the pages.  I remember that in
> >> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
> >> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
> >> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
> >> doesn't reclaim mapped file pages.  By this equation, if the sum of the
> >> anonymous pages and mapped file pages is not greater than the 50% of
> >> total pages, we don't deactivate these pages.  Am I missing something?
> > 
> > I think we need to go back to protecting mapped pages based on how
> > much of reclaimable memory they make up, one way or another.
> 
> 
> I partly agreed it with POV regression.
> But I would like to understand rationale of "Why we should handle specially mmapped page".
> In case of code pages(VM_EXEC), we already have handled it specially and
> I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
> _frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
> compared to other pages.
> But in case of mapped data pages, why we should handle specially?
> I guess mapped data pages would have higher access chance than unmapped page because
> unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
> has a owner above.
> 
> Doesn't it make sense?

I agree that the reason behind protecting VM_EXEC pages was that our
frequency information for mapped pages is at LRU cycle granularity.

But I don't see why you think this problem wouldn't apply to all
mapped pages in general.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-06-25  8:08                   ` Johannes Weiner
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-06-25  8:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, Jun 25, 2012 at 08:53:11AM +0900, Minchan Kim wrote:
> Hi Hannes,
> 
> On 06/23/2012 08:04 PM, Johannes Weiner wrote:
> 
> > On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
> >> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
> >> [snip]
> >>>>> Is it because the read()/write() IO is high throughput and pushes
> >>>>> pages through the LRU lists faster than the mmap pages are referenced?
> >>>>
> >>>> Yes, in this application, one query needs to access mapped file page
> >>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
> >>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> >>>> only once.  For mapped file page, in fact them are accessed only once
> >>>> because in one query the same data is accessed twice.  Thus, one query
> >>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
> >>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> >>>> as much as possible, we will gain the better performance.
> >>>
> >>> You access the same unmapped cache twice, i.e. repeated reads or
> >>> writes against the same file offset?
> >>
> >> No.  We access the same mapped file twice.
> >>
> >>>
> >>> How do you use fadvise?
> >>
> >> We access the header and content of the file respectively using read/write.
> >> The header and content are sequentially.  So we use fadivse(2) with
> >> FADV_WILLNEED flag to do a readahead.
> >>
> >>>> In addition, another factor also has some impacts for this application.
> >>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
> >>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
> >>>> are moved into inactive list frequently.
> >>>>
> >>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
> >>>> this ratio.  Meanwhile we activate every mapped file pages for the first
> >>>> time.  Then the performance gets better, but it still doesn't reach the
> >>>> performance of 2.6.18.
> >>>
> >>> 2.6.18 didn't have the active list protection at all and always
> >>> forcibly deactivated pages during reclaim.  Have you tried fully
> >>> reverting to this by making inactive_file_is_low_global() return true
> >>> unconditionally?
> >>
> >> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
> >> it doesn't always forcibly deactivate the pages.  I remember that in
> >> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
> >> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
> >> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
> >> doesn't reclaim mapped file pages.  By this equation, if the sum of the
> >> anonymous pages and mapped file pages is not greater than the 50% of
> >> total pages, we don't deactivate these pages.  Am I missing something?
> > 
> > I think we need to go back to protecting mapped pages based on how
> > much of reclaimable memory they make up, one way or another.
> 
> 
> I partly agreed it with POV regression.
> But I would like to understand rationale of "Why we should handle specially mmapped page".
> In case of code pages(VM_EXEC), we already have handled it specially and
> I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
> _frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
> compared to other pages.
> But in case of mapped data pages, why we should handle specially?
> I guess mapped data pages would have higher access chance than unmapped page because
> unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
> has a owner above.
> 
> Doesn't it make sense?

I agree that the reason behind protecting VM_EXEC pages was that our
frequency information for mapped pages is at LRU cycle granularity.

But I don't see why you think this problem wouldn't apply to all
mapped pages in general.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-06-25  8:08                   ` Johannes Weiner
@ 2012-06-25  8:25                     ` Minchan Kim
  -1 siblings, 0 replies; 40+ messages in thread
From: Minchan Kim @ 2012-06-25  8:25 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On 06/25/2012 05:08 PM, Johannes Weiner wrote:

> On Mon, Jun 25, 2012 at 08:53:11AM +0900, Minchan Kim wrote:
>> Hi Hannes,
>>
>> On 06/23/2012 08:04 PM, Johannes Weiner wrote:
>>
>>> On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
>>>> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
>>>> [snip]
>>>>>>> Is it because the read()/write() IO is high throughput and pushes
>>>>>>> pages through the LRU lists faster than the mmap pages are referenced?
>>>>>>
>>>>>> Yes, in this application, one query needs to access mapped file page
>>>>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
>>>>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
>>>>>> only once.  For mapped file page, in fact them are accessed only once
>>>>>> because in one query the same data is accessed twice.  Thus, one query
>>>>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
>>>>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
>>>>>> as much as possible, we will gain the better performance.
>>>>>
>>>>> You access the same unmapped cache twice, i.e. repeated reads or
>>>>> writes against the same file offset?
>>>>
>>>> No.  We access the same mapped file twice.
>>>>
>>>>>
>>>>> How do you use fadvise?
>>>>
>>>> We access the header and content of the file respectively using read/write.
>>>> The header and content are sequentially.  So we use fadivse(2) with
>>>> FADV_WILLNEED flag to do a readahead.
>>>>
>>>>>> In addition, another factor also has some impacts for this application.
>>>>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
>>>>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
>>>>>> are moved into inactive list frequently.
>>>>>>
>>>>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
>>>>>> this ratio.  Meanwhile we activate every mapped file pages for the first
>>>>>> time.  Then the performance gets better, but it still doesn't reach the
>>>>>> performance of 2.6.18.
>>>>>
>>>>> 2.6.18 didn't have the active list protection at all and always
>>>>> forcibly deactivated pages during reclaim.  Have you tried fully
>>>>> reverting to this by making inactive_file_is_low_global() return true
>>>>> unconditionally?
>>>>
>>>> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
>>>> it doesn't always forcibly deactivate the pages.  I remember that in
>>>> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
>>>> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
>>>> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
>>>> doesn't reclaim mapped file pages.  By this equation, if the sum of the
>>>> anonymous pages and mapped file pages is not greater than the 50% of
>>>> total pages, we don't deactivate these pages.  Am I missing something?
>>>
>>> I think we need to go back to protecting mapped pages based on how
>>> much of reclaimable memory they make up, one way or another.
>>
>>
>> I partly agreed it with POV regression.
>> But I would like to understand rationale of "Why we should handle specially mmapped page".
>> In case of code pages(VM_EXEC), we already have handled it specially and
>> I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
>> _frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
>> compared to other pages.
>> But in case of mapped data pages, why we should handle specially?
>> I guess mapped data pages would have higher access chance than unmapped page because
>> unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
>> has a owner above.
>>
>> Doesn't it make sense?
> 
> I agree that the reason behind protecting VM_EXEC pages was that our
> frequency information for mapped pages is at LRU cycle granularity.
> 
> But I don't see why you think this problem wouldn't apply to all
> mapped pages in general.


Code page is very likely to share by other processes so I think it's very special
than normal mmaped page. So I would like to raise bonus on code page than normal mmaped pages.
So I would like to make following as if we can.

Reclaim preference :
unmapped page >> mapped page > VM_EXEC mapped page

But at least, we can't solve Zheng's regression with current VM_EXEC protection logic
because it seems he already used VM_EXEC tric :(
I hope Erecalaimbe LRU list can solve it.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-06-25  8:25                     ` Minchan Kim
  0 siblings, 0 replies; 40+ messages in thread
From: Minchan Kim @ 2012-06-25  8:25 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On 06/25/2012 05:08 PM, Johannes Weiner wrote:

> On Mon, Jun 25, 2012 at 08:53:11AM +0900, Minchan Kim wrote:
>> Hi Hannes,
>>
>> On 06/23/2012 08:04 PM, Johannes Weiner wrote:
>>
>>> On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
>>>> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
>>>> [snip]
>>>>>>> Is it because the read()/write() IO is high throughput and pushes
>>>>>>> pages through the LRU lists faster than the mmap pages are referenced?
>>>>>>
>>>>>> Yes, in this application, one query needs to access mapped file page
>>>>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
>>>>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
>>>>>> only once.  For mapped file page, in fact them are accessed only once
>>>>>> because in one query the same data is accessed twice.  Thus, one query
>>>>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
>>>>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
>>>>>> as much as possible, we will gain the better performance.
>>>>>
>>>>> You access the same unmapped cache twice, i.e. repeated reads or
>>>>> writes against the same file offset?
>>>>
>>>> No.  We access the same mapped file twice.
>>>>
>>>>>
>>>>> How do you use fadvise?
>>>>
>>>> We access the header and content of the file respectively using read/write.
>>>> The header and content are sequentially.  So we use fadivse(2) with
>>>> FADV_WILLNEED flag to do a readahead.
>>>>
>>>>>> In addition, another factor also has some impacts for this application.
>>>>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
>>>>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
>>>>>> are moved into inactive list frequently.
>>>>>>
>>>>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
>>>>>> this ratio.  Meanwhile we activate every mapped file pages for the first
>>>>>> time.  Then the performance gets better, but it still doesn't reach the
>>>>>> performance of 2.6.18.
>>>>>
>>>>> 2.6.18 didn't have the active list protection at all and always
>>>>> forcibly deactivated pages during reclaim.  Have you tried fully
>>>>> reverting to this by making inactive_file_is_low_global() return true
>>>>> unconditionally?
>>>>
>>>> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
>>>> it doesn't always forcibly deactivate the pages.  I remember that in
>>>> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
>>>> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
>>>> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
>>>> doesn't reclaim mapped file pages.  By this equation, if the sum of the
>>>> anonymous pages and mapped file pages is not greater than the 50% of
>>>> total pages, we don't deactivate these pages.  Am I missing something?
>>>
>>> I think we need to go back to protecting mapped pages based on how
>>> much of reclaimable memory they make up, one way or another.
>>
>>
>> I partly agreed it with POV regression.
>> But I would like to understand rationale of "Why we should handle specially mmapped page".
>> In case of code pages(VM_EXEC), we already have handled it specially and
>> I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
>> _frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
>> compared to other pages.
>> But in case of mapped data pages, why we should handle specially?
>> I guess mapped data pages would have higher access chance than unmapped page because
>> unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
>> has a owner above.
>>
>> Doesn't it make sense?
> 
> I agree that the reason behind protecting VM_EXEC pages was that our
> frequency information for mapped pages is at LRU cycle granularity.
> 
> But I don't see why you think this problem wouldn't apply to all
> mapped pages in general.


Code page is very likely to share by other processes so I think it's very special
than normal mmaped page. So I would like to raise bonus on code page than normal mmaped pages.
So I would like to make following as if we can.

Reclaim preference :
unmapped page >> mapped page > VM_EXEC mapped page

But at least, we can't solve Zheng's regression with current VM_EXEC protection logic
because it seems he already used VM_EXEC tric :(
I hope Erecalaimbe LRU list can solve it.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-06-25  8:25                     ` Minchan Kim
@ 2012-06-26 13:51                       ` Johannes Weiner
  -1 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-06-26 13:51 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, Jun 25, 2012 at 05:25:56PM +0900, Minchan Kim wrote:
> On 06/25/2012 05:08 PM, Johannes Weiner wrote:
> 
> > On Mon, Jun 25, 2012 at 08:53:11AM +0900, Minchan Kim wrote:
> >> Hi Hannes,
> >>
> >> On 06/23/2012 08:04 PM, Johannes Weiner wrote:
> >>
> >>> On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
> >>>> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
> >>>> [snip]
> >>>>>>> Is it because the read()/write() IO is high throughput and pushes
> >>>>>>> pages through the LRU lists faster than the mmap pages are referenced?
> >>>>>>
> >>>>>> Yes, in this application, one query needs to access mapped file page
> >>>>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
> >>>>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> >>>>>> only once.  For mapped file page, in fact them are accessed only once
> >>>>>> because in one query the same data is accessed twice.  Thus, one query
> >>>>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
> >>>>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> >>>>>> as much as possible, we will gain the better performance.
> >>>>>
> >>>>> You access the same unmapped cache twice, i.e. repeated reads or
> >>>>> writes against the same file offset?
> >>>>
> >>>> No.  We access the same mapped file twice.
> >>>>
> >>>>>
> >>>>> How do you use fadvise?
> >>>>
> >>>> We access the header and content of the file respectively using read/write.
> >>>> The header and content are sequentially.  So we use fadivse(2) with
> >>>> FADV_WILLNEED flag to do a readahead.
> >>>>
> >>>>>> In addition, another factor also has some impacts for this application.
> >>>>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
> >>>>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
> >>>>>> are moved into inactive list frequently.
> >>>>>>
> >>>>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
> >>>>>> this ratio.  Meanwhile we activate every mapped file pages for the first
> >>>>>> time.  Then the performance gets better, but it still doesn't reach the
> >>>>>> performance of 2.6.18.
> >>>>>
> >>>>> 2.6.18 didn't have the active list protection at all and always
> >>>>> forcibly deactivated pages during reclaim.  Have you tried fully
> >>>>> reverting to this by making inactive_file_is_low_global() return true
> >>>>> unconditionally?
> >>>>
> >>>> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
> >>>> it doesn't always forcibly deactivate the pages.  I remember that in
> >>>> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
> >>>> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
> >>>> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
> >>>> doesn't reclaim mapped file pages.  By this equation, if the sum of the
> >>>> anonymous pages and mapped file pages is not greater than the 50% of
> >>>> total pages, we don't deactivate these pages.  Am I missing something?
> >>>
> >>> I think we need to go back to protecting mapped pages based on how
> >>> much of reclaimable memory they make up, one way or another.
> >>
> >>
> >> I partly agreed it with POV regression.
> >> But I would like to understand rationale of "Why we should handle specially mmapped page".
> >> In case of code pages(VM_EXEC), we already have handled it specially and
> >> I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
> >> _frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
> >> compared to other pages.
> >> But in case of mapped data pages, why we should handle specially?
> >> I guess mapped data pages would have higher access chance than unmapped page because
> >> unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
> >> has a owner above.
> >>
> >> Doesn't it make sense?
> > 
> > I agree that the reason behind protecting VM_EXEC pages was that our
> > frequency information for mapped pages is at LRU cycle granularity.
> > 
> > But I don't see why you think this problem wouldn't apply to all
> > mapped pages in general.
> 
> 
> Code page is very likely to share by other processes so I think it's very special
> than normal mmaped page. So I would like to raise bonus on code page than normal mmaped pages.

I think the problem really is that, in the presence of fast used-once
streams of unmapped cache, multiple read()s in short succession
activate the page away indefinitely from any reclaim pressure, while a
mapped page will get a second chance on the high paced inactive list.

This is only okay if there are a lot of mapped pages, which there were
in the loads improved by the used-once detection, but it creates
problems for loads with small amounts of mapped pages that are
accessed in bursts.

And this applies to all mapped file pages, not just VM_EXEC ones.

> So I would like to make following as if we can.
> 
> Reclaim preference :
> unmapped page >> mapped page > VM_EXEC mapped page

Not necessarily.  I would like to protect mapped pages if there are
very few of them, because we can not tell how costly their reclaim
will end up being while we DO know that reclaiming them won't free up
much memory.  Only as they make up an increasing share of memory, this
class of pages becomes a more attractive reclaim candidate, and while
we still can't tell for sure the cost of reclaiming them, the cost of
NOT reclaiming them (burned cpu time, allocation stalls) increases.

So I think I disagree.  We should only compensate for the fact that we
have less usage information on mapped pages, not treat unmapped cache
like a third class citizen per default.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-06-26 13:51                       ` Johannes Weiner
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Weiner @ 2012-06-26 13:51 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On Mon, Jun 25, 2012 at 05:25:56PM +0900, Minchan Kim wrote:
> On 06/25/2012 05:08 PM, Johannes Weiner wrote:
> 
> > On Mon, Jun 25, 2012 at 08:53:11AM +0900, Minchan Kim wrote:
> >> Hi Hannes,
> >>
> >> On 06/23/2012 08:04 PM, Johannes Weiner wrote:
> >>
> >>> On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
> >>>> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
> >>>> [snip]
> >>>>>>> Is it because the read()/write() IO is high throughput and pushes
> >>>>>>> pages through the LRU lists faster than the mmap pages are referenced?
> >>>>>>
> >>>>>> Yes, in this application, one query needs to access mapped file page
> >>>>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
> >>>>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
> >>>>>> only once.  For mapped file page, in fact them are accessed only once
> >>>>>> because in one query the same data is accessed twice.  Thus, one query
> >>>>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
> >>>>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
> >>>>>> as much as possible, we will gain the better performance.
> >>>>>
> >>>>> You access the same unmapped cache twice, i.e. repeated reads or
> >>>>> writes against the same file offset?
> >>>>
> >>>> No.  We access the same mapped file twice.
> >>>>
> >>>>>
> >>>>> How do you use fadvise?
> >>>>
> >>>> We access the header and content of the file respectively using read/write.
> >>>> The header and content are sequentially.  So we use fadivse(2) with
> >>>> FADV_WILLNEED flag to do a readahead.
> >>>>
> >>>>>> In addition, another factor also has some impacts for this application.
> >>>>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
> >>>>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
> >>>>>> are moved into inactive list frequently.
> >>>>>>
> >>>>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
> >>>>>> this ratio.  Meanwhile we activate every mapped file pages for the first
> >>>>>> time.  Then the performance gets better, but it still doesn't reach the
> >>>>>> performance of 2.6.18.
> >>>>>
> >>>>> 2.6.18 didn't have the active list protection at all and always
> >>>>> forcibly deactivated pages during reclaim.  Have you tried fully
> >>>>> reverting to this by making inactive_file_is_low_global() return true
> >>>>> unconditionally?
> >>>>
> >>>> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
> >>>> it doesn't always forcibly deactivate the pages.  I remember that in
> >>>> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
> >>>> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
> >>>> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
> >>>> doesn't reclaim mapped file pages.  By this equation, if the sum of the
> >>>> anonymous pages and mapped file pages is not greater than the 50% of
> >>>> total pages, we don't deactivate these pages.  Am I missing something?
> >>>
> >>> I think we need to go back to protecting mapped pages based on how
> >>> much of reclaimable memory they make up, one way or another.
> >>
> >>
> >> I partly agreed it with POV regression.
> >> But I would like to understand rationale of "Why we should handle specially mmapped page".
> >> In case of code pages(VM_EXEC), we already have handled it specially and
> >> I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
> >> _frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
> >> compared to other pages.
> >> But in case of mapped data pages, why we should handle specially?
> >> I guess mapped data pages would have higher access chance than unmapped page because
> >> unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
> >> has a owner above.
> >>
> >> Doesn't it make sense?
> > 
> > I agree that the reason behind protecting VM_EXEC pages was that our
> > frequency information for mapped pages is at LRU cycle granularity.
> > 
> > But I don't see why you think this problem wouldn't apply to all
> > mapped pages in general.
> 
> 
> Code page is very likely to share by other processes so I think it's very special
> than normal mmaped page. So I would like to raise bonus on code page than normal mmaped pages.

I think the problem really is that, in the presence of fast used-once
streams of unmapped cache, multiple read()s in short succession
activate the page away indefinitely from any reclaim pressure, while a
mapped page will get a second chance on the high paced inactive list.

This is only okay if there are a lot of mapped pages, which there were
in the loads improved by the used-once detection, but it creates
problems for loads with small amounts of mapped pages that are
accessed in bursts.

And this applies to all mapped file pages, not just VM_EXEC ones.

> So I would like to make following as if we can.
> 
> Reclaim preference :
> unmapped page >> mapped page > VM_EXEC mapped page

Not necessarily.  I would like to protect mapped pages if there are
very few of them, because we can not tell how costly their reclaim
will end up being while we DO know that reclaiming them won't free up
much memory.  Only as they make up an increasing share of memory, this
class of pages becomes a more attractive reclaim candidate, and while
we still can't tell for sure the cost of reclaiming them, the cost of
NOT reclaiming them (burned cpu time, allocation stalls) increases.

So I think I disagree.  We should only compensate for the fact that we
have less usage information on mapped pages, not treat unmapped cache
like a third class citizen per default.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
  2012-06-26 13:51                       ` Johannes Weiner
@ 2012-06-26 23:47                         ` Minchan Kim
  -1 siblings, 0 replies; 40+ messages in thread
From: Minchan Kim @ 2012-06-26 23:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

On 06/26/2012 10:51 PM, Johannes Weiner wrote:

> On Mon, Jun 25, 2012 at 05:25:56PM +0900, Minchan Kim wrote:
>> On 06/25/2012 05:08 PM, Johannes Weiner wrote:
>>
>>> On Mon, Jun 25, 2012 at 08:53:11AM +0900, Minchan Kim wrote:
>>>> Hi Hannes,
>>>>
>>>> On 06/23/2012 08:04 PM, Johannes Weiner wrote:
>>>>
>>>>> On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
>>>>>> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
>>>>>> [snip]
>>>>>>>>> Is it because the read()/write() IO is high throughput and pushes
>>>>>>>>> pages through the LRU lists faster than the mmap pages are referenced?
>>>>>>>>
>>>>>>>> Yes, in this application, one query needs to access mapped file page
>>>>>>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
>>>>>>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
>>>>>>>> only once.  For mapped file page, in fact them are accessed only once
>>>>>>>> because in one query the same data is accessed twice.  Thus, one query
>>>>>>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
>>>>>>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
>>>>>>>> as much as possible, we will gain the better performance.
>>>>>>>
>>>>>>> You access the same unmapped cache twice, i.e. repeated reads or
>>>>>>> writes against the same file offset?
>>>>>>
>>>>>> No.  We access the same mapped file twice.
>>>>>>
>>>>>>>
>>>>>>> How do you use fadvise?
>>>>>>
>>>>>> We access the header and content of the file respectively using read/write.
>>>>>> The header and content are sequentially.  So we use fadivse(2) with
>>>>>> FADV_WILLNEED flag to do a readahead.
>>>>>>
>>>>>>>> In addition, another factor also has some impacts for this application.
>>>>>>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
>>>>>>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
>>>>>>>> are moved into inactive list frequently.
>>>>>>>>
>>>>>>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
>>>>>>>> this ratio.  Meanwhile we activate every mapped file pages for the first
>>>>>>>> time.  Then the performance gets better, but it still doesn't reach the
>>>>>>>> performance of 2.6.18.
>>>>>>>
>>>>>>> 2.6.18 didn't have the active list protection at all and always
>>>>>>> forcibly deactivated pages during reclaim.  Have you tried fully
>>>>>>> reverting to this by making inactive_file_is_low_global() return true
>>>>>>> unconditionally?
>>>>>>
>>>>>> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
>>>>>> it doesn't always forcibly deactivate the pages.  I remember that in
>>>>>> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
>>>>>> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
>>>>>> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
>>>>>> doesn't reclaim mapped file pages.  By this equation, if the sum of the
>>>>>> anonymous pages and mapped file pages is not greater than the 50% of
>>>>>> total pages, we don't deactivate these pages.  Am I missing something?
>>>>>
>>>>> I think we need to go back to protecting mapped pages based on how
>>>>> much of reclaimable memory they make up, one way or another.
>>>>
>>>>
>>>> I partly agreed it with POV regression.
>>>> But I would like to understand rationale of "Why we should handle specially mmapped page".
>>>> In case of code pages(VM_EXEC), we already have handled it specially and
>>>> I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
>>>> _frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
>>>> compared to other pages.
>>>> But in case of mapped data pages, why we should handle specially?
>>>> I guess mapped data pages would have higher access chance than unmapped page because
>>>> unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
>>>> has a owner above.
>>>>
>>>> Doesn't it make sense?
>>>
>>> I agree that the reason behind protecting VM_EXEC pages was that our
>>> frequency information for mapped pages is at LRU cycle granularity.
>>>
>>> But I don't see why you think this problem wouldn't apply to all
>>> mapped pages in general.
>>
>>
>> Code page is very likely to share by other processes so I think it's very special
>> than normal mmaped page. So I would like to raise bonus on code page than normal mmaped pages.
> 
> I think the problem really is that, in the presence of fast used-once
> streams of unmapped cache, multiple read()s in short succession
> activate the page away indefinitely from any reclaim pressure, while a
> mapped page will get a second chance on the high paced inactive list.
> 
> This is only okay if there are a lot of mapped pages, which there were
> in the loads improved by the used-once detection, but it creates
> problems for loads with small amounts of mapped pages that are
> accessed in bursts.
> 
> And this applies to all mapped file pages, not just VM_EXEC ones.


Agree. Couldn't we enhance it with Ereclaimable LRU list?
I mean we can put unmapped file pages into Ereclaimable LRU list from the beginning.
So multi touch in short succession would promote it from Ereclaimable to inactive
instead of active. And through balancing between file inactive VS Ereclaiamble,
we can maintain the size of Ereclaimable LRU to prevent early eviction of frequent
hitted page in page cache(ie, prevent regression about read(2) intensive workload)

The problem is how to balance with them.

>> So I would like to make following as if we can.
>>
>> Reclaim preference :
>> unmapped page >> mapped page > VM_EXEC mapped page
> 
> Not necessarily.  I would like to protect mapped pages if there are

> very few of them, because we can not tell how costly their reclaim

> will end up being while we DO know that reclaiming them won't free up


And it's likely to turn out real working set like code page if they are very few.
But I'm not sure we should protect it. If we protect it, we can kill someone earlier
than now but user might not lose latency. If we don't protect it, we can save someone
instead of OOM killing but user might feel big latency of interactive processes.
It's a kinda policy problem.

> much memory.  Only as they make up an increasing share of memory, this
> class of pages becomes a more attractive reclaim candidate, and while


-ENOPARSE. increase share of memory, but why does it become more attractive?

> we still can't tell for sure the cost of reclaiming them, the cost of
> NOT reclaiming them (burned cpu time, allocation stalls) increases.
> 
> So I think I disagree.  We should only compensate for the fact that we
> have less usage information on mapped pages, not treat unmapped cache
> like a third class citizen per default.


Agree. It's not good to reclaim unmapped page cache firstly without any compensation rule.
We need balancing logic for moving from mapped page to Ereclimable LRU list by unmapping(try_to_unmap).
It can happen minor fault more than now on mapped page cache workload. But I think maybe we could
solve by creating new knob "unmapped_ratio" like "inactive_ratio". 

Please put more comment if there are more stuff we should consider for Ereclaimable LRU list.
Thanks for good comment, Hannes.

-- 
Kind regards,
Minchan Kim


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm: consider all swapped back pages in used-once logic
@ 2012-06-26 23:47                         ` Minchan Kim
  0 siblings, 0 replies; 40+ messages in thread
From: Minchan Kim @ 2012-06-26 23:47 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Zheng Liu, Michal Hocko, linux-mm, linux-kernel, Andrew Morton,
	Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel

On 06/26/2012 10:51 PM, Johannes Weiner wrote:

> On Mon, Jun 25, 2012 at 05:25:56PM +0900, Minchan Kim wrote:
>> On 06/25/2012 05:08 PM, Johannes Weiner wrote:
>>
>>> On Mon, Jun 25, 2012 at 08:53:11AM +0900, Minchan Kim wrote:
>>>> Hi Hannes,
>>>>
>>>> On 06/23/2012 08:04 PM, Johannes Weiner wrote:
>>>>
>>>>> On Mon, May 21, 2012 at 07:07:00PM +0800, Zheng Liu wrote:
>>>>>> On Mon, May 21, 2012 at 11:37:05AM +0200, Johannes Weiner wrote:
>>>>>> [snip]
>>>>>>>>> Is it because the read()/write() IO is high throughput and pushes
>>>>>>>>> pages through the LRU lists faster than the mmap pages are referenced?
>>>>>>>>
>>>>>>>> Yes, in this application, one query needs to access mapped file page
>>>>>>>> twice and file page cache twice.  Namely, one query needs to do 4 disk
>>>>>>>> I/Os.  We have used fadvise(2) to reduce file page cache accessing to
>>>>>>>> only once.  For mapped file page, in fact them are accessed only once
>>>>>>>> because in one query the same data is accessed twice.  Thus, one query
>>>>>>>> causes 2 disk I/Os now.  The size of read/write is quite larger than
>>>>>>>> mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
>>>>>>>> as much as possible, we will gain the better performance.
>>>>>>>
>>>>>>> You access the same unmapped cache twice, i.e. repeated reads or
>>>>>>> writes against the same file offset?
>>>>>>
>>>>>> No.  We access the same mapped file twice.
>>>>>>
>>>>>>>
>>>>>>> How do you use fadvise?
>>>>>>
>>>>>> We access the header and content of the file respectively using read/write.
>>>>>> The header and content are sequentially.  So we use fadivse(2) with
>>>>>> FADV_WILLNEED flag to do a readahead.
>>>>>>
>>>>>>>> In addition, another factor also has some impacts for this application.
>>>>>>>> In inactive_file_is_low_global(), it is different between 2.6.18 and
>>>>>>>> upstream kernel.  IMHO, it causes that mapped file pages in active list
>>>>>>>> are moved into inactive list frequently.
>>>>>>>>
>>>>>>>> Currently, we add a parameter in inactive_file_is_low_global() to adjust
>>>>>>>> this ratio.  Meanwhile we activate every mapped file pages for the first
>>>>>>>> time.  Then the performance gets better, but it still doesn't reach the
>>>>>>>> performance of 2.6.18.
>>>>>>>
>>>>>>> 2.6.18 didn't have the active list protection at all and always
>>>>>>> forcibly deactivated pages during reclaim.  Have you tried fully
>>>>>>> reverting to this by making inactive_file_is_low_global() return true
>>>>>>> unconditionally?
>>>>>>
>>>>>> No, I don't try it.  AFAIK, 2.6.18 didn't protect the active list.  But
>>>>>> it doesn't always forcibly deactivate the pages.  I remember that in
>>>>>> 2.6.18 kernel we calculate 'mapped_ratio' in shrink_active_list(), and
>>>>>> then we get 'swap_tendency' according to 'mapped_ratio', 'distress', and
>>>>>> 'sc->swappiness'.  If 'swap_tendency' is not greater than 100.  It
>>>>>> doesn't reclaim mapped file pages.  By this equation, if the sum of the
>>>>>> anonymous pages and mapped file pages is not greater than the 50% of
>>>>>> total pages, we don't deactivate these pages.  Am I missing something?
>>>>>
>>>>> I think we need to go back to protecting mapped pages based on how
>>>>> much of reclaimable memory they make up, one way or another.
>>>>
>>>>
>>>> I partly agreed it with POV regression.
>>>> But I would like to understand rationale of "Why we should handle specially mmapped page".
>>>> In case of code pages(VM_EXEC), we already have handled it specially and
>>>> I understand why we did. At least, my opinion was that our LRU algorithm doesn't consider
>>>> _frequency_ fully while it does _recency_ well. I thought code page would be high frequency of access
>>>> compared to other pages.
>>>> But in case of mapped data pages, why we should handle specially?
>>>> I guess mapped data pages would have higher access chance than unmapped page because
>>>> unmapped page doesn't have any owner(it's just for caching for reducing I/O) while mapped page
>>>> has a owner above.
>>>>
>>>> Doesn't it make sense?
>>>
>>> I agree that the reason behind protecting VM_EXEC pages was that our
>>> frequency information for mapped pages is at LRU cycle granularity.
>>>
>>> But I don't see why you think this problem wouldn't apply to all
>>> mapped pages in general.
>>
>>
>> Code page is very likely to share by other processes so I think it's very special
>> than normal mmaped page. So I would like to raise bonus on code page than normal mmaped pages.
> 
> I think the problem really is that, in the presence of fast used-once
> streams of unmapped cache, multiple read()s in short succession
> activate the page away indefinitely from any reclaim pressure, while a
> mapped page will get a second chance on the high paced inactive list.
> 
> This is only okay if there are a lot of mapped pages, which there were
> in the loads improved by the used-once detection, but it creates
> problems for loads with small amounts of mapped pages that are
> accessed in bursts.
> 
> And this applies to all mapped file pages, not just VM_EXEC ones.


Agree. Couldn't we enhance it with Ereclaimable LRU list?
I mean we can put unmapped file pages into Ereclaimable LRU list from the beginning.
So multi touch in short succession would promote it from Ereclaimable to inactive
instead of active. And through balancing between file inactive VS Ereclaiamble,
we can maintain the size of Ereclaimable LRU to prevent early eviction of frequent
hitted page in page cache(ie, prevent regression about read(2) intensive workload)

The problem is how to balance with them.

>> So I would like to make following as if we can.
>>
>> Reclaim preference :
>> unmapped page >> mapped page > VM_EXEC mapped page
> 
> Not necessarily.  I would like to protect mapped pages if there are

> very few of them, because we can not tell how costly their reclaim

> will end up being while we DO know that reclaiming them won't free up


And it's likely to turn out real working set like code page if they are very few.
But I'm not sure we should protect it. If we protect it, we can kill someone earlier
than now but user might not lose latency. If we don't protect it, we can save someone
instead of OOM killing but user might feel big latency of interactive processes.
It's a kinda policy problem.

> much memory.  Only as they make up an increasing share of memory, this
> class of pages becomes a more attractive reclaim candidate, and while


-ENOPARSE. increase share of memory, but why does it become more attractive?

> we still can't tell for sure the cost of reclaiming them, the cost of
> NOT reclaiming them (burned cpu time, allocation stalls) increases.
> 
> So I think I disagree.  We should only compensate for the fact that we
> have less usage information on mapped pages, not treat unmapped cache
> like a third class citizen per default.


Agree. It's not good to reclaim unmapped page cache firstly without any compensation rule.
We need balancing logic for moving from mapped page to Ereclimable LRU list by unmapping(try_to_unmap).
It can happen minor fault more than now on mapped page cache workload. But I think maybe we could
solve by creating new knob "unmapped_ratio" like "inactive_ratio". 

Please put more comment if there are more stuff we should consider for Ereclaimable LRU list.
Thanks for good comment, Hannes.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2012-06-26 23:47 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-17  9:13 [PATCH] mm: consider all swapped back pages in used-once logic Michal Hocko
2012-05-17  9:13 ` Michal Hocko
2012-05-17  9:24 ` Andrew Morton
2012-05-17  9:24   ` Andrew Morton
2012-05-17 12:10   ` Michal Hocko
2012-05-17 12:10     ` Michal Hocko
2012-05-17 20:23     ` Andrew Morton
2012-05-17 20:23       ` Andrew Morton
2012-05-18  6:50       ` Michal Hocko
2012-05-18  6:50         ` Michal Hocko
2012-05-17 13:14 ` Rik van Riel
2012-05-17 13:14   ` Rik van Riel
2012-05-17 19:54 ` Johannes Weiner
2012-05-17 19:54   ` Johannes Weiner
2012-05-21  2:51   ` Zheng Liu
2012-05-21  2:51     ` Zheng Liu
2012-05-21  7:36     ` Johannes Weiner
2012-05-21  7:36       ` Johannes Weiner
2012-05-21  8:59       ` Zheng Liu
2012-05-21  8:59         ` Zheng Liu
2012-05-21  9:37         ` Johannes Weiner
2012-05-21  9:37           ` Johannes Weiner
2012-05-21 11:07           ` Zheng Liu
2012-05-21 11:07             ` Zheng Liu
2012-06-23 11:04             ` Johannes Weiner
2012-06-23 11:04               ` Johannes Weiner
2012-06-23 15:22               ` Rik van Riel
2012-06-23 15:22                 ` Rik van Riel
2012-06-24 23:53               ` Minchan Kim
2012-06-24 23:53                 ` Minchan Kim
2012-06-25  8:08                 ` Johannes Weiner
2012-06-25  8:08                   ` Johannes Weiner
2012-06-25  8:25                   ` Minchan Kim
2012-06-25  8:25                     ` Minchan Kim
2012-06-26 13:51                     ` Johannes Weiner
2012-06-26 13:51                       ` Johannes Weiner
2012-06-26 23:47                       ` Minchan Kim
2012-06-26 23:47                         ` Minchan Kim
2012-05-18  0:40 ` Minchan Kim
2012-05-18  0:40   ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.