linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] [PATCH 0/3] swsusp: shrink file cache first
@ 2009-02-06  3:11 Johannes Weiner
  2009-02-06  3:11 ` [PATCH 1/3] swsusp: clean up shrink_all_zones() Johannes Weiner
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06  3:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

[ grumble.  always check software for new behaviour after upgrade.
  sorry for the mess up :/ ]

Hello!

here are three patches that adjust the memory shrinking code used for
suspend-to-disk.

The first two patches are cleanups only and can probably go in
regardless of the third one.

The third patch changes the shrink_all_memory() logic to drop the file
cache first before touching any mapped files and only then goes for
anon pages.

The reason is that everything not shrunk before suspension has to go
into the image and will be 'prefaulted' before the processes can
resume and the system is usable again, so the image should be small
and contain only pages that are likely to be used right after resume
again.  And this in turn means that the inactive file cache is the
best point to start decimating used memory.

Also, right now, subsequent faults of contiguously mapped files are
likely to perform better than swapin (see
http://kernelnewbies.org/KernelProjects/SwapoutClustering), so not
only file cache is preferred over other pages, but file pages over
anon pages in general.

Testing up to this point shows that the patch does what is intended,
shrinking file cache in favor of anon pages.  But whether the idea is
correct to begin with is a bit hard to quantify and I am still working
on it, so RFC only.

	Hannes


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/3] swsusp: clean up shrink_all_zones()
  2009-02-06  3:11 [PATCH 0/3] [PATCH 0/3] swsusp: shrink file cache first Johannes Weiner
@ 2009-02-06  3:11 ` Johannes Weiner
  2009-02-06  3:20   ` KOSAKI Motohiro
  2009-02-06  3:11 ` [PATCH 2/3] swsusp: dont fiddle with swappiness Johannes Weiner
  2009-02-06  3:11 ` [PATCH 3/3][RFC] swsusp: shrink file cache first Johannes Weiner
  2 siblings, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06  3:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

[-- Attachment #1: swsusp-clean-up-shrink_all_zones.patch --]
[-- Type: text/plain, Size: 1688 bytes --]

Move local variables to innermost possible scopes and use local
variables to cache calculations/reads done more than once.

No change in functionality (intended).

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2057,31 +2057,31 @@ static unsigned long shrink_all_zones(un
 				      int pass, struct scan_control *sc)
 {
 	struct zone *zone;
-	unsigned long nr_to_scan, ret = 0;
-	enum lru_list l;
+	unsigned long ret = 0;
 
 	for_each_zone(zone) {
+		enum lru_list l;
 
 		if (!populated_zone(zone))
 			continue;
-
 		if (zone_is_all_unreclaimable(zone) && prio != DEF_PRIORITY)
 			continue;
 
 		for_each_evictable_lru(l) {
+			enum zone_stat_item ls = NR_LRU_BASE + l;
+			unsigned long lru_pages = zone_page_state(zone, ls);
+
 			/* For pass = 0, we don't shrink the active list */
-			if (pass == 0 &&
-				(l == LRU_ACTIVE || l == LRU_ACTIVE_FILE))
+			if (pass == 0 && (l == LRU_ACTIVE_ANON ||
+						l == LRU_ACTIVE_FILE))
 				continue;
 
-			zone->lru[l].nr_scan +=
-				(zone_page_state(zone, NR_LRU_BASE + l)
-								>> prio) + 1;
+			zone->lru[l].nr_scan += (lru_pages >> prio) + 1;
 			if (zone->lru[l].nr_scan >= nr_pages || pass > 3) {
+				unsigned long nr_to_scan;
+
 				zone->lru[l].nr_scan = 0;
-				nr_to_scan = min(nr_pages,
-					zone_page_state(zone,
-							NR_LRU_BASE + l));
+				nr_to_scan = min(nr_pages, lru_pages);
 				ret += shrink_list(l, nr_to_scan, zone,
 								sc, prio);
 				if (ret >= nr_pages)
@@ -2089,7 +2089,6 @@ static unsigned long shrink_all_zones(un
 			}
 		}
 	}
-
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 2/3] swsusp: dont fiddle with swappiness
  2009-02-06  3:11 [PATCH 0/3] [PATCH 0/3] swsusp: shrink file cache first Johannes Weiner
  2009-02-06  3:11 ` [PATCH 1/3] swsusp: clean up shrink_all_zones() Johannes Weiner
@ 2009-02-06  3:11 ` Johannes Weiner
  2009-02-06  3:21   ` KOSAKI Motohiro
  2009-02-06  3:11 ` [PATCH 3/3][RFC] swsusp: shrink file cache first Johannes Weiner
  2 siblings, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06  3:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

[-- Attachment #1: swsusp-dont-fiddle-with-swappiness.patch --]
[-- Type: text/plain, Size: 794 bytes --]

sc.swappiness is not used in the swsusp memory shrinking path, do not
set it.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2111,7 +2111,6 @@ unsigned long shrink_all_memory(unsigned
 		.may_swap = 0,
 		.swap_cluster_max = nr_pages,
 		.may_writepage = 1,
-		.swappiness = vm_swappiness,
 		.isolate_pages = isolate_pages_global,
 	};
 
@@ -2145,10 +2144,8 @@ unsigned long shrink_all_memory(unsigned
 		int prio;
 
 		/* Force reclaiming mapped pages in the passes #3 and #4 */
-		if (pass > 2) {
+		if (pass > 2)
 			sc.may_swap = 1;
-			sc.swappiness = 100;
-		}
 
 		for (prio = DEF_PRIORITY; prio >= 0; prio--) {
 			unsigned long nr_to_scan = nr_pages - ret;



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06  3:11 [PATCH 0/3] [PATCH 0/3] swsusp: shrink file cache first Johannes Weiner
  2009-02-06  3:11 ` [PATCH 1/3] swsusp: clean up shrink_all_zones() Johannes Weiner
  2009-02-06  3:11 ` [PATCH 2/3] swsusp: dont fiddle with swappiness Johannes Weiner
@ 2009-02-06  3:11 ` Johannes Weiner
  2009-02-06  3:39   ` KOSAKI Motohiro
  2009-02-06  8:03   ` MinChan Kim
  2 siblings, 2 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06  3:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

[-- Attachment #1: swsusp-shrink-file-cache-first.patch --]
[-- Type: text/plain, Size: 3293 bytes --]

File cache pages are saved to disk either through normal writeback by
reclaim or by including them in the suspend image written to a
swapfile.

Writing them either way should take the same amount of time but doing
normal writeback and unmap changes the fault behaviour on resume from
prefault to on-demand paging, smoothening out resume and giving
previously cached pages the chance to stay out of memory completely if
they are not used anymore.

Another reason for preferring file page eviction is that the locality
principle is visible in fault patterns and swap might perform really
bad with subsequent faulting of contiguously mapped pages.

Since anon and file pages now live on different lists, selectively
scanning one type only is straight-forward.

This patch also removes the scanning of anon pages without allowing to
swap, which does not make much sense.

The five memory shrinking passes now look like this:

Pass 0:  shrink inactive file cache
This has the best chances of not being used any time soon again after
resume, so trade inactive file cache against space for anon pages.

Pass 1:  shrink file cache
Essentially the same as before but replenish the inactive file list
with borderline active pages.

Pass 2:  shrink all file pages and inactive anon
Reclaim mapped pages file pages and go for aged anon pages as well.

Pass 3:  shrink all file pages and anon
Same as before but also shrink the active anon list to have enough
anon pages for actual reclaim should we need pass 4.

Pass 4:  repeat pass 3

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c |   34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2069,13 +2069,23 @@ static unsigned long shrink_all_zones(un
 
 		for_each_evictable_lru(l) {
 			enum zone_stat_item ls = NR_LRU_BASE + l;
-			unsigned long lru_pages = zone_page_state(zone, ls);
+			unsigned long lru_pages;
 
-			/* For pass = 0, we don't shrink the active list */
-			if (pass == 0 && (l == LRU_ACTIVE_ANON ||
-						l == LRU_ACTIVE_FILE))
-				continue;
+			switch (pass) {
+			case 0:
+				if (l == LRU_ACTIVE_FILE)
+					continue;
+			case 1:
+				if (l == LRU_INACTIVE_ANON)
+					continue;
+			case 2:
+				if (l == LRU_ACTIVE_ANON)
+					continue;
+			default:
+				break;
+			}
 
+			lru_pages = zone_page_state(zone, ls);
 			zone->lru[l].nr_scan += (lru_pages >> prio) + 1;
 			if (zone->lru[l].nr_scan >= nr_pages || pass > 3) {
 				unsigned long nr_to_scan;
@@ -2134,17 +2144,17 @@ unsigned long shrink_all_memory(unsigned
 
 	/*
 	 * We try to shrink LRUs in 5 passes:
-	 * 0 = Reclaim from inactive_list only
-	 * 1 = Reclaim from active list but don't reclaim mapped
-	 * 2 = 2nd pass of type 1
-	 * 3 = Reclaim mapped (normal reclaim)
-	 * 4 = 2nd pass of type 3
+	 * 0 = Reclaim unmapped inactive file pages
+	 * 1 = Reclaim unmapped file pages
+	 * 2 = Reclaim file and inactive anon pages
+	 * 3 = Reclaim file and anon pages
+	 * 4 = Second pass 3
 	 */
 	for (pass = 0; pass < 5; pass++) {
 		int prio;
 
-		/* Force reclaiming mapped pages in the passes #3 and #4 */
-		if (pass > 2)
+		/* Reclaim mapped pages in higher passes */
+		if (pass > 1)
 			sc.may_swap = 1;
 
 		for (prio = DEF_PRIORITY; prio >= 0; prio--) {



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/3] swsusp: clean up shrink_all_zones()
  2009-02-06  3:11 ` [PATCH 1/3] swsusp: clean up shrink_all_zones() Johannes Weiner
@ 2009-02-06  3:20   ` KOSAKI Motohiro
  0 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-02-06  3:20 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kosaki.motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

> Move local variables to innermost possible scopes and use local
> variables to cache calculations/reads done more than once.
> 
> No change in functionality (intended).
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c |   23 +++++++++++------------
>  1 file changed, 11 insertions(+), 12 deletions(-)

ok. good cleanup.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>





^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/3] swsusp: dont fiddle with swappiness
  2009-02-06  3:11 ` [PATCH 2/3] swsusp: dont fiddle with swappiness Johannes Weiner
@ 2009-02-06  3:21   ` KOSAKI Motohiro
  0 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-02-06  3:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kosaki.motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

> sc.swappiness is not used in the swsusp memory shrinking path, do not
> set it.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c |    5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)

I agree. currently shrink_all_memory() use shrink_list() directly.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06  3:11 ` [PATCH 3/3][RFC] swsusp: shrink file cache first Johannes Weiner
@ 2009-02-06  3:39   ` KOSAKI Motohiro
  2009-02-06  4:49     ` Johannes Weiner
  2009-02-06  8:03   ` MinChan Kim
  1 sibling, 1 reply; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-02-06  3:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kosaki.motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

Hi

I have some comment.

> File cache pages are saved to disk either through normal writeback by
> reclaim or by including them in the suspend image written to a
> swapfile.
> 
> Writing them either way should take the same amount of time but doing
> normal writeback and unmap changes the fault behaviour on resume from
> prefault to on-demand paging, smoothening out resume and giving
> previously cached pages the chance to stay out of memory completely if
> they are not used anymore.
> 
> Another reason for preferring file page eviction is that the locality
> principle is visible in fault patterns and swap might perform really
> bad with subsequent faulting of contiguously mapped pages.
> 
> Since anon and file pages now live on different lists, selectively
> scanning one type only is straight-forward.

I don't understand your point.
Which do you want to improve suspend performance or resume performance?

if we think suspend performance, we should consider swap device and file-backed device
are different block device.
the interleave of file-backed page out and swap out can improve total write out performce.

if we think resume performance, we shold how think the on-disk contenious of the swap consist
process's virtual address contenious.
it cause to reduce unnecessary seek.
but your patch doesn't this.

Could you explain this patch benefit?
and, I think you should mesure performence result.


<snip>


> @@ -2134,17 +2144,17 @@ unsigned long shrink_all_memory(unsigned
>  
>  	/*
>  	 * We try to shrink LRUs in 5 passes:
> -	 * 0 = Reclaim from inactive_list only
> -	 * 1 = Reclaim from active list but don't reclaim mapped
> -	 * 2 = 2nd pass of type 1
> -	 * 3 = Reclaim mapped (normal reclaim)
> -	 * 4 = 2nd pass of type 3
> +	 * 0 = Reclaim unmapped inactive file pages
> +	 * 1 = Reclaim unmapped file pages

I think your patch reclaim mapped file at priority 0 and 1 too.


> +	 * 2 = Reclaim file and inactive anon pages
> +	 * 3 = Reclaim file and anon pages
> +	 * 4 = Second pass 3
>  	 */
>  	for (pass = 0; pass < 5; pass++) {
>  		int prio;
>  
> -		/* Force reclaiming mapped pages in the passes #3 and #4 */
> -		if (pass > 2)
> +		/* Reclaim mapped pages in higher passes */
> +		if (pass > 1)
>  			sc.may_swap = 1;

Why need this line?
If you reclaim only file backed lru, may_swap isn't effective.
So, Can't we just remove this line and always set may_swap=1 ?



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06  3:39   ` KOSAKI Motohiro
@ 2009-02-06  4:49     ` Johannes Weiner
  2009-02-06  5:59       ` KOSAKI Motohiro
  2009-02-06 21:00       ` [PATCH 3/3][RFC] swsusp: shrink file cache first Andrew Morton
  0 siblings, 2 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06  4:49 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

On Fri, Feb 06, 2009 at 12:39:22PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> I have some comment.
> 
> > File cache pages are saved to disk either through normal writeback by
> > reclaim or by including them in the suspend image written to a
> > swapfile.
> > 
> > Writing them either way should take the same amount of time but doing
> > normal writeback and unmap changes the fault behaviour on resume from
> > prefault to on-demand paging, smoothening out resume and giving
> > previously cached pages the chance to stay out of memory completely if
> > they are not used anymore.
> > 
> > Another reason for preferring file page eviction is that the locality
> > principle is visible in fault patterns and swap might perform really
> > bad with subsequent faulting of contiguously mapped pages.
> > 
> > Since anon and file pages now live on different lists, selectively
> > scanning one type only is straight-forward.
> 
> I don't understand your point.
> Which do you want to improve suspend performance or resume performance?

If there is lots of clean file cache, memory shrinking time on suspend
is fast.  In a test here, a shrink that swaps a lot had 60mb/s
throughput while a shrink on memory crowded with file cache had a
throughput of 500mb/s.

If we have to shrink n pages, aggressively shrinking the
cheap-to-evict ones is a speed improvement already.

The patch is also an improvement in suspend time becauses it doesn't
scan the anon list while it's not allowed to swap.  And what should it
do with anon pages if not swap them out? :)

But that is more a nice side effect.

> if we think suspend performance, we should consider swap device and file-backed device
> are different block device.
> the interleave of file-backed page out and swap out can improve total write out performce.

Hm, good point.  We could probably improve that but I don't think it's
too pressing because at least on my test boxen, actual shrinking time
is really short compared to the total of suspending to disk.

> if we think resume performance, we shold how think the on-disk contenious of the swap consist
> process's virtual address contenious.
> it cause to reduce unnecessary seek.
> but your patch doesn't this.
> 
> Could you explain this patch benefit?

The patch tries to shrink those pages first that are most unlikely to
be needed again after resume.  It assumes that active anon pages are
immediately needed after resume while inactive file pages are not.  So
it defers shrinking anon pages after file cache.

But I just noticed that the old behaviour defers it as well, because
even if it does scan anon pages from the beginning, it allows writing
only starting from pass 3.

I couldn't quite understand what you wrote about on-disk
contiguousness, but that claim still stands: faulting in contiguous
pages from swap can be much slower than faulting file pages.  And my
patch prefers mapped file pages over anon pages.  This is probably
where I have seen the improvements after resume in my tests.

So assuming that we can not save the whole working set, it's better to
preserve as much as possible of those pages that are the most
expensive ones to refault.

> and, I think you should mesure performence result.

Yes, I'm still thinking about ideas how to quantify it properly.  I
have not yet found a reliable way to check for whether the working set
is intact besides seeing whether the resumed applications are
responsive right away or if they first have to swap in their pages
again.

> <snip>
> 
> 
> > @@ -2134,17 +2144,17 @@ unsigned long shrink_all_memory(unsigned
> >  
> >  	/*
> >  	 * We try to shrink LRUs in 5 passes:
> > -	 * 0 = Reclaim from inactive_list only
> > -	 * 1 = Reclaim from active list but don't reclaim mapped
> > -	 * 2 = 2nd pass of type 1
> > -	 * 3 = Reclaim mapped (normal reclaim)
> > -	 * 4 = 2nd pass of type 3
> > +	 * 0 = Reclaim unmapped inactive file pages
> > +	 * 1 = Reclaim unmapped file pages
> 
> I think your patch reclaim mapped file at priority 0 and 1 too.

Doesn't the following check in shrink_page_list prevent this:

                if (!sc->may_swap && page_mapped(page))
                        goto keep_locked;

?

> > +	 * 2 = Reclaim file and inactive anon pages
> > +	 * 3 = Reclaim file and anon pages
> > +	 * 4 = Second pass 3
> >  	 */
> >  	for (pass = 0; pass < 5; pass++) {
> >  		int prio;
> >  
> > -		/* Force reclaiming mapped pages in the passes #3 and #4 */
> > -		if (pass > 2)
> > +		/* Reclaim mapped pages in higher passes */
> > +		if (pass > 1)
> >  			sc.may_swap = 1;
> 
> Why need this line?
> If you reclaim only file backed lru, may_swap isn't effective.
> So, Can't we just remove this line and always set may_swap=1 ?

Same as the above, I think mapped pages are not touched when may_swap
is 0 due to the check quoted above.  Please correct me if I'm wrong.

	Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06  4:49     ` Johannes Weiner
@ 2009-02-06  5:59       ` KOSAKI Motohiro
  2009-02-06 12:24         ` Johannes Weiner
  2009-02-06 21:00       ` [PATCH 3/3][RFC] swsusp: shrink file cache first Andrew Morton
  1 sibling, 1 reply; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-02-06  5:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kosaki.motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

Hi

> > if we think suspend performance, we should consider swap device and file-backed device
> > are different block device.
> > the interleave of file-backed page out and swap out can improve total write out performce.
> 
> Hm, good point.  We could probably improve that but I don't think it's
> too pressing because at least on my test boxen, actual shrinking time
> is really short compared to the total of suspending to disk.

ok.
only remain problem is mesurement result posting :)


> > if we think resume performance, we shold how think the on-disk contenious of the swap consist
> > process's virtual address contenious.
> > it cause to reduce unnecessary seek.
> > but your patch doesn't this.
> > 
> > Could you explain this patch benefit?
> 
> The patch tries to shrink those pages first that are most unlikely to
> be needed again after resume.  It assumes that active anon pages are
> immediately needed after resume while inactive file pages are not.  So
> it defers shrinking anon pages after file cache.

hmm, I'm confusing.
I agree active anon is important than inactive file.
but I don't understand why scanning order at suspend change resume order.


> But I just noticed that the old behaviour defers it as well, because
> even if it does scan anon pages from the beginning, it allows writing
> only starting from pass 3.

Ah, I see.
it's obiously wrong.

> I couldn't quite understand what you wrote about on-disk
> contiguousness, but that claim still stands: faulting in contiguous
> pages from swap can be much slower than faulting file pages.  And my
> patch prefers mapped file pages over anon pages.  This is probably
> where I have seen the improvements after resume in my tests.

sorry, I don't understand yet.
Why "prefers mapped file pages over anon pages" makes large improvement?


> So assuming that we can not save the whole working set, it's better to
> preserve as much as possible of those pages that are the most
> expensive ones to refault.
>
> > and, I think you should mesure performence result.
> 
> Yes, I'm still thinking about ideas how to quantify it properly.  I
> have not yet found a reliable way to check for whether the working set
> is intact besides seeing whether the resumed applications are
> responsive right away or if they first have to swap in their pages
> again.

thanks.
I'm looking for this :)



> > > @@ -2134,17 +2144,17 @@ unsigned long shrink_all_memory(unsigned
> > >  
> > >  	/*
> > >  	 * We try to shrink LRUs in 5 passes:
> > > -	 * 0 = Reclaim from inactive_list only
> > > -	 * 1 = Reclaim from active list but don't reclaim mapped
> > > -	 * 2 = 2nd pass of type 1
> > > -	 * 3 = Reclaim mapped (normal reclaim)
> > > -	 * 4 = 2nd pass of type 3
> > > +	 * 0 = Reclaim unmapped inactive file pages
> > > +	 * 1 = Reclaim unmapped file pages
> > 
> > I think your patch reclaim mapped file at priority 0 and 1 too.
> 
> Doesn't the following check in shrink_page_list prevent this:
> 
>                 if (!sc->may_swap && page_mapped(page))
>                         goto keep_locked;
> 
> ?

Grr, you are right.
I agree, currently may_swap doesn't control swap out or not.
so I think we should change it correct name ;)




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06  3:11 ` [PATCH 3/3][RFC] swsusp: shrink file cache first Johannes Weiner
  2009-02-06  3:39   ` KOSAKI Motohiro
@ 2009-02-06  8:03   ` MinChan Kim
  2009-02-06 10:06     ` MinChan Kim
  1 sibling, 1 reply; 43+ messages in thread
From: MinChan Kim @ 2009-02-06  8:03 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

Hi, Johannes.
I have some questions.
Just out of curiosity. :)

On Fri, Feb 06, 2009 at 04:11:28AM +0100, Johannes Weiner wrote:
> File cache pages are saved to disk either through normal writeback by
> reclaim or by including them in the suspend image written to a
> swapfile.
> 
> Writing them either way should take the same amount of time but doing
> normal writeback and unmap changes the fault behaviour on resume from
> prefault to on-demand paging, smoothening out resume and giving

What do you mean "unmap"? 
Why normal writeback and unmap chnages the fault behavior on resume ?

> previously cached pages the chance to stay out of memory completely if
> they are not used anymore.
> 
> Another reason for preferring file page eviction is that the locality
> principle is visible in fault patterns and swap might perform really
> bad with subsequent faulting of contiguously mapped pages.

Why do you think that swap might perform bad with subsequent faulting 
of contiguusly mapped page ?
You mean normal file system is faster than swap due to readahead and 
smart block of allocation ?


-- 
Kinds Regards
MinChan Kim


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06  8:03   ` MinChan Kim
@ 2009-02-06 10:06     ` MinChan Kim
  2009-02-06 11:50       ` Johannes Weiner
  0 siblings, 1 reply; 43+ messages in thread
From: MinChan Kim @ 2009-02-06 10:06 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

On Fri, Feb 6, 2009 at 5:03 PM, MinChan Kim <minchan.kim@gmail.com> wrote:
> Hi, Johannes.
> I have some questions.
> Just out of curiosity. :)
>
> On Fri, Feb 06, 2009 at 04:11:28AM +0100, Johannes Weiner wrote:
>> File cache pages are saved to disk either through normal writeback by
>> reclaim or by including them in the suspend image written to a
>> swapfile.
>>
>> Writing them either way should take the same amount of time but doing
>> normal writeback and unmap changes the fault behaviour on resume from
>> prefault to on-demand paging, smoothening out resume and giving
>
> What do you mean "unmap"?
> Why normal writeback and unmap chnages the fault behavior on resume ?

Please, Ignore poor first my question. :(
I agree with your opinion.

>> previously cached pages the chance to stay out of memory completely if
>> they are not used anymore.
>>
>> Another reason for preferring file page eviction is that the locality
>> principle is visible in fault patterns and swap might perform really
>> bad with subsequent faulting of contiguously mapped pages.
>
> Why do you think that swap might perform bad with subsequent faulting
> of contiguusly mapped page ?
> You mean normal file system is faster than swap due to readahead and
> smart block of allocation ?

But, I still can't understand this issue.
what mean "page eviction" ? Is it reclaim or swap out ?

>
> --
> Kinds Regards
> MinChan Kim
>
>



-- 
Kinds regards,
MinChan Kim

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 10:06     ` MinChan Kim
@ 2009-02-06 11:50       ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06 11:50 UTC (permalink / raw)
  To: MinChan Kim
  Cc: Andrew Morton, Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

On Fri, Feb 06, 2009 at 07:06:34PM +0900, MinChan Kim wrote:
> On Fri, Feb 6, 2009 at 5:03 PM, MinChan Kim <minchan.kim@gmail.com> wrote:
> >> Another reason for preferring file page eviction is that the locality
> >> principle is visible in fault patterns and swap might perform really
> >> bad with subsequent faulting of contiguously mapped pages.
> >
> > Why do you think that swap might perform bad with subsequent faulting
> > of contiguusly mapped page ?
> > You mean normal file system is faster than swap due to readahead and
> > smart block of allocation ?
> 
> But, I still can't understand this issue.
> what mean "page eviction" ? Is it reclaim or swap out ?

Reclaim evicts pages from memory by swap out (and writeback).

In the suspend case, "reclaim" is perhaps not 100% correct.  We are
not directly interested in the amount of free pages as you are with
reclaim, but interested in the amount of pages in use as those are the
pages we have to write to disk.  So "shrinking" is the better term.

But yes, I mean what you said:

	You mean normal file system is faster than swap due to
	readahead and smart block of allocation ?

Yes.

	Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06  5:59       ` KOSAKI Motohiro
@ 2009-02-06 12:24         ` Johannes Weiner
  2009-02-06 13:35           ` MinChan Kim
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06 12:24 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

On Fri, Feb 06, 2009 at 02:59:35PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > > if we think suspend performance, we should consider swap device and file-backed device
> > > are different block device.
> > > the interleave of file-backed page out and swap out can improve total write out performce.
> > 
> > Hm, good point.  We could probably improve that but I don't think it's
> > too pressing because at least on my test boxen, actual shrinking time
> > is really short compared to the total of suspending to disk.
> 
> ok.
> only remain problem is mesurement result posting :)
> 
> 
> > > if we think resume performance, we shold how think the on-disk contenious of the swap consist
> > > process's virtual address contenious.
> > > it cause to reduce unnecessary seek.
> > > but your patch doesn't this.
> > > 
> > > Could you explain this patch benefit?
> > 
> > The patch tries to shrink those pages first that are most unlikely to
> > be needed again after resume.  It assumes that active anon pages are
> > immediately needed after resume while inactive file pages are not.  So
> > it defers shrinking anon pages after file cache.
> 
> hmm, I'm confusing.
> I agree active anon is important than inactive file.
> but I don't understand why scanning order at suspend change resume order.

This is the problem: on suspend, we can only save about 50% of memory
through the suspend image because of the snapshotting.  So we have to
shrink memory before suspend.  Since you probably always have more RAM
used than 50%, you always have to shrink.  And the image is always the
same size.

After restoring the image, resuming processes want to continue their
work immediately and the user wants to use the applications again as
soon as possible.

Everything that is saved in the suspend image is restored and back in
memory when the processes resume their work.

Everything that is NOT saved in the suspend image is still on swap or
not yet in the page page when the processes resume their work.

So if we shrink the memory in the wrong order, after restoring the
image we have page cache in memory that is not needed and those anon
pages that are needed are swapped out.

And the goal is that after restoring the image we have as much of the
working set back in memory and those pages in swap and on disk-only
that are unlikely to be used immediately by the resumed processes, so
they can continue their work without much disk io.

> > But I just noticed that the old behaviour defers it as well, because
> > even if it does scan anon pages from the beginning, it allows writing
> > only starting from pass 3.
> 
> Ah, I see.
> it's obiously wrong.
> 
> > I couldn't quite understand what you wrote about on-disk
> > contiguousness, but that claim still stands: faulting in contiguous
> > pages from swap can be much slower than faulting file pages.  And my
> > patch prefers mapped file pages over anon pages.  This is probably
> > where I have seen the improvements after resume in my tests.
> 
> sorry, I don't understand yet.
> Why "prefers mapped file pages over anon pages" makes large improvement?

Because contigously mapped file pages are faster to read in than a
group of anon pages.  Or at least that is my claim.

And if we have to evict some of the working set just because the
working set is bigger than 50% of memory, then it's better to evict
those pages that are cheaper to refault.

Does that make sense?

> > Yes, I'm still thinking about ideas how to quantify it properly.  I
> > have not yet found a reliable way to check for whether the working set
> > is intact besides seeing whether the resumed applications are
> > responsive right away or if they first have to swap in their pages
> > again.
> 
> thanks.
> I'm looking for this :)

Thanks to YOU, also for for reviewing!

> > > > @@ -2134,17 +2144,17 @@ unsigned long shrink_all_memory(unsigned
> > > >  
> > > >  	/*
> > > >  	 * We try to shrink LRUs in 5 passes:
> > > > -	 * 0 = Reclaim from inactive_list only
> > > > -	 * 1 = Reclaim from active list but don't reclaim mapped
> > > > -	 * 2 = 2nd pass of type 1
> > > > -	 * 3 = Reclaim mapped (normal reclaim)
> > > > -	 * 4 = 2nd pass of type 3
> > > > +	 * 0 = Reclaim unmapped inactive file pages
> > > > +	 * 1 = Reclaim unmapped file pages
> > > 
> > > I think your patch reclaim mapped file at priority 0 and 1 too.
> > 
> > Doesn't the following check in shrink_page_list prevent this:
> > 
> >                 if (!sc->may_swap && page_mapped(page))
> >                         goto keep_locked;
> > 
> > ?
> 
> Grr, you are right.
> I agree, currently may_swap doesn't control swap out or not.
> so I think we should change it correct name ;)

Agreed.  What do you think about the following patch?

---
Subject: vmscan: rename may_swap scan control knob

may_swap applies not only to anon pages but to mapped file pages as
well.  Rename it to may_unmap which is the actual meaning. 

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9a27c44..2523600 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -60,8 +60,8 @@ struct scan_control {
 
 	int may_writepage;
 
-	/* Can pages be swapped as part of reclaim? */
-	int may_swap;
+	/* Reclaim mapped pages */
+	int may_unmap;
 
 	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
 	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
@@ -606,7 +606,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (unlikely(!page_evictable(page, NULL)))
 			goto cull_mlocked;
 
-		if (!sc->may_swap && page_mapped(page))
+		if (!sc->may_unmap && page_mapped(page))
 			goto keep_locked;
 
 		/* Double the slab pressure for mapped and swapcache pages */
@@ -1694,7 +1694,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 		.gfp_mask = gfp_mask,
 		.may_writepage = !laptop_mode,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
-		.may_swap = 1,
+		.may_unmap = 1,
 		.swappiness = vm_swappiness,
 		.order = order,
 		.mem_cgroup = NULL,
@@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 {
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
-		.may_swap = 1,
+		.may_unmap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = swappiness,
 		.order = 0,
@@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 	struct zonelist *zonelist;
 
 	if (noswap)
-		sc.may_swap = 0;
+		sc.may_unmap = 0;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -1762,7 +1762,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
-		.may_swap = 1,
+		.may_unmap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = vm_swappiness,
 		.order = order,
@@ -2109,7 +2109,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 	struct reclaim_state reclaim_state;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
-		.may_swap = 0,
+		.may_unmap = 0,
 		.swap_cluster_max = nr_pages,
 		.may_writepage = 1,
 		.swappiness = vm_swappiness,
@@ -2147,7 +2147,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 
 		/* Force reclaiming mapped pages in the passes #3 and #4 */
 		if (pass > 2) {
-			sc.may_swap = 1;
+			sc.may_unmap = 1;
 			sc.swappiness = 100;
 		}
 
@@ -2292,7 +2292,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	int priority;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
-		.may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP),
+		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
 		.swap_cluster_max = max_t(unsigned long, nr_pages,
 					SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 12:24         ` Johannes Weiner
@ 2009-02-06 13:35           ` MinChan Kim
  2009-02-06 17:15             ` MinChan Kim
  0 siblings, 1 reply; 43+ messages in thread
From: MinChan Kim @ 2009-02-06 13:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KOSAKI Motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

Thanks for kind explaining and good discussion, Hannes and Kosaki-san.
Always, I learn lots of thing with such good discussion. :)

On Fri, Feb 6, 2009 at 9:24 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Fri, Feb 06, 2009 at 02:59:35PM +0900, KOSAKI Motohiro wrote:
>> Hi
>>
>> > > if we think suspend performance, we should consider swap device and file-backed device
>> > > are different block device.
>> > > the interleave of file-backed page out and swap out can improve total write out performce.
>> >
>> > Hm, good point.  We could probably improve that but I don't think it's
>> > too pressing because at least on my test boxen, actual shrinking time
>> > is really short compared to the total of suspending to disk.
>>
>> ok.
>> only remain problem is mesurement result posting :)
>>
>>
>> > > if we think resume performance, we shold how think the on-disk contenious of the swap consist
>> > > process's virtual address contenious.
>> > > it cause to reduce unnecessary seek.
>> > > but your patch doesn't this.
>> > >
>> > > Could you explain this patch benefit?
>> >
>> > The patch tries to shrink those pages first that are most unlikely to
>> > be needed again after resume.  It assumes that active anon pages are
>> > immediately needed after resume while inactive file pages are not.  So
>> > it defers shrinking anon pages after file cache.
>>
>> hmm, I'm confusing.
>> I agree active anon is important than inactive file.
>> but I don't understand why scanning order at suspend change resume order.
>
> This is the problem: on suspend, we can only save about 50% of memory
> through the suspend image because of the snapshotting.  So we have to
> shrink memory before suspend.  Since you probably always have more RAM
> used than 50%, you always have to shrink.  And the image is always the
> same size.
>
> After restoring the image, resuming processes want to continue their
> work immediately and the user wants to use the applications again as
> soon as possible.
>
> Everything that is saved in the suspend image is restored and back in
> memory when the processes resume their work.
>
> Everything that is NOT saved in the suspend image is still on swap or
> not yet in the page page when the processes resume their work.
>
> So if we shrink the memory in the wrong order, after restoring the
> image we have page cache in memory that is not needed and those anon
> pages that are needed are swapped out.

It make sense.

> And the goal is that after restoring the image we have as much of the
> working set back in memory and those pages in swap and on disk-only
> that are unlikely to be used immediately by the resumed processes, so
> they can continue their work without much disk io.
>

Your intention is good to me.

>> > But I just noticed that the old behaviour defers it as well, because
>> > even if it does scan anon pages from the beginning, it allows writing
>> > only starting from pass 3.
>>
>> Ah, I see.
>> it's obiously wrong.
>>
>> > I couldn't quite understand what you wrote about on-disk
>> > contiguousness, but that claim still stands: faulting in contiguous
>> > pages from swap can be much slower than faulting file pages.  And my
>> > patch prefers mapped file pages over anon pages.  This is probably
>> > where I have seen the improvements after resume in my tests.
>>
>> sorry, I don't understand yet.
>> Why "prefers mapped file pages over anon pages" makes large improvement?
>
> Because contigously mapped file pages are faster to read in than a
> group of anon pages.  Or at least that is my claim.

It make sense if general resume process happens fault which have
locality pattern
so, you should prove this.

>
> And if we have to evict some of the working set just because the
> working set is bigger than 50% of memory, then it's better to evict
> those pages that are cheaper to refault.
>
> Does that make sense?

Indeed!

>> > Yes, I'm still thinking about ideas how to quantify it properly.  I
>> > have not yet found a reliable way to check for whether the working set
>> > is intact besides seeing whether the resumed applications are
>> > responsive right away or if they first have to swap in their pages
>> > again.
>>
>> thanks.
>> I'm looking for this :)
>
> Thanks to YOU, also for for reviewing!
>
>> > > > @@ -2134,17 +2144,17 @@ unsigned long shrink_all_memory(unsigned
>> > > >
>> > > >         /*
>> > > >          * We try to shrink LRUs in 5 passes:
>> > > > -        * 0 = Reclaim from inactive_list only
>> > > > -        * 1 = Reclaim from active list but don't reclaim mapped
>> > > > -        * 2 = 2nd pass of type 1
>> > > > -        * 3 = Reclaim mapped (normal reclaim)
>> > > > -        * 4 = 2nd pass of type 3
>> > > > +        * 0 = Reclaim unmapped inactive file pages
>> > > > +        * 1 = Reclaim unmapped file pages
>> > >
>> > > I think your patch reclaim mapped file at priority 0 and 1 too.
>> >
>> > Doesn't the following check in shrink_page_list prevent this:
>> >
>> >                 if (!sc->may_swap && page_mapped(page))
>> >                         goto keep_locked;
>> >
>> > ?
>>
>> Grr, you are right.
>> I agree, currently may_swap doesn't control swap out or not.
>> so I think we should change it correct name ;)
>
> Agreed.  What do you think about the following patch?

As for me, I can't agree with you.
There are two kinds of file-mapped pages.

1. file-mapped and dirty page.
2. file-mapped and no-dirty page

Both pages are not swapped.
File-mapped and dirty page is synced with original file
File-mapped and no-dirty page is just discarded with viewpoint of reclaim.

So, may_swap is just related to anon-pages
Thus, I think may_swap is reasonable.
How about you ?

>
> ---
> Subject: vmscan: rename may_swap scan control knob
>
> may_swap applies not only to anon pages but to mapped file pages as
> well.  Rename it to may_unmap which is the actual meaning.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9a27c44..2523600 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -60,8 +60,8 @@ struct scan_control {
>
>        int may_writepage;
>
> -       /* Can pages be swapped as part of reclaim? */
> -       int may_swap;
> +       /* Reclaim mapped pages */
> +       int may_unmap;
>
>        /* This context's SWAP_CLUSTER_MAX. If freeing memory for
>         * suspend, we effectively ignore SWAP_CLUSTER_MAX.
> @@ -606,7 +606,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>                if (unlikely(!page_evictable(page, NULL)))
>                        goto cull_mlocked;
>
> -               if (!sc->may_swap && page_mapped(page))
> +               if (!sc->may_unmap && page_mapped(page))
>                        goto keep_locked;
>
>                /* Double the slab pressure for mapped and swapcache pages */
> @@ -1694,7 +1694,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>                .gfp_mask = gfp_mask,
>                .may_writepage = !laptop_mode,
>                .swap_cluster_max = SWAP_CLUSTER_MAX,
> -               .may_swap = 1,
> +               .may_unmap = 1,
>                .swappiness = vm_swappiness,
>                .order = order,
>                .mem_cgroup = NULL,
> @@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>  {
>        struct scan_control sc = {
>                .may_writepage = !laptop_mode,
> -               .may_swap = 1,
> +               .may_unmap = 1,
>                .swap_cluster_max = SWAP_CLUSTER_MAX,
>                .swappiness = swappiness,
>                .order = 0,
> @@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>        struct zonelist *zonelist;
>
>        if (noswap)
> -               sc.may_swap = 0;
> +               sc.may_unmap = 0;
>
>        sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>                        (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> @@ -1762,7 +1762,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
>        struct reclaim_state *reclaim_state = current->reclaim_state;
>        struct scan_control sc = {
>                .gfp_mask = GFP_KERNEL,
> -               .may_swap = 1,
> +               .may_unmap = 1,
>                .swap_cluster_max = SWAP_CLUSTER_MAX,
>                .swappiness = vm_swappiness,
>                .order = order,
> @@ -2109,7 +2109,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>        struct reclaim_state reclaim_state;
>        struct scan_control sc = {
>                .gfp_mask = GFP_KERNEL,
> -               .may_swap = 0,
> +               .may_unmap = 0,
>                .swap_cluster_max = nr_pages,
>                .may_writepage = 1,
>                .swappiness = vm_swappiness,
> @@ -2147,7 +2147,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>
>                /* Force reclaiming mapped pages in the passes #3 and #4 */
>                if (pass > 2) {
> -                       sc.may_swap = 1;
> +                       sc.may_unmap = 1;
>                        sc.swappiness = 100;
>                }
>
> @@ -2292,7 +2292,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>        int priority;
>        struct scan_control sc = {
>                .may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
> -               .may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP),
> +               .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
>                .swap_cluster_max = max_t(unsigned long, nr_pages,
>                                        SWAP_CLUSTER_MAX),
>                .gfp_mask = gfp_mask,
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



-- 
Kinds regards,
MinChan Kim

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 13:35           ` MinChan Kim
@ 2009-02-06 17:15             ` MinChan Kim
  2009-02-06 23:37               ` Johannes Weiner
  2009-02-09 19:43               ` [patch] vmscan: rename sc.may_swap to may_unmap Johannes Weiner
  0 siblings, 2 replies; 43+ messages in thread
From: MinChan Kim @ 2009-02-06 17:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KOSAKI Motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

>>> Grr, you are right.
>>> I agree, currently may_swap doesn't control swap out or not.
>>> so I think we should change it correct name ;)
>>
>> Agreed.  What do you think about the following patch?
>
> As for me, I can't agree with you.
> There are two kinds of file-mapped pages.
>
> 1. file-mapped and dirty page.
> 2. file-mapped and no-dirty page
>
> Both pages are not swapped.
> File-mapped and dirty page is synced with original file
> File-mapped and no-dirty page is just discarded with viewpoint of reclaim.
>
> So, may_swap is just related to anon-pages
> Thus, I think may_swap is reasonable.
> How about you ?

Sorry for misunderstood your point.
It would be better to remain more detaily for git log ?

'may_swap' applies not only to anon pages but to mapped file pages as
well. 'may_swap' term is sometime used for 'swap', sometime used for
'sync|discard'.
In case of anon pages, 'may_swap' determines whether pages were swapout or not.
but In case of mapped file pages, it determines whether pages are
synced or discarded. so, 'may_swap' is rather awkward. Rename it to
'may_unmap' which is the actual meaning.

If you find wrong word and sentence, Please, fix it. :)

>
>>
>> ---
>> Subject: vmscan: rename may_swap scan control knob
>>
>> may_swap applies not only to anon pages but to mapped file pages as
>> well.  Rename it to may_unmap which is the actual meaning.
>>
>> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 9a27c44..2523600 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -60,8 +60,8 @@ struct scan_control {
>>
>>        int may_writepage;
>>
>> -       /* Can pages be swapped as part of reclaim? */
>> -       int may_swap;
>> +       /* Reclaim mapped pages */
>> +       int may_unmap;
>>
>>        /* This context's SWAP_CLUSTER_MAX. If freeing memory for
>>         * suspend, we effectively ignore SWAP_CLUSTER_MAX.
>> @@ -606,7 +606,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>>                if (unlikely(!page_evictable(page, NULL)))
>>                        goto cull_mlocked;
>>
>> -               if (!sc->may_swap && page_mapped(page))
>> +               if (!sc->may_unmap && page_mapped(page))
>>                        goto keep_locked;
>>
>>                /* Double the slab pressure for mapped and swapcache pages */
>> @@ -1694,7 +1694,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>>                .gfp_mask = gfp_mask,
>>                .may_writepage = !laptop_mode,
>>                .swap_cluster_max = SWAP_CLUSTER_MAX,
>> -               .may_swap = 1,
>> +               .may_unmap = 1,
>>                .swappiness = vm_swappiness,
>>                .order = order,
>>                .mem_cgroup = NULL,
>> @@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>>  {
>>        struct scan_control sc = {
>>                .may_writepage = !laptop_mode,
>> -               .may_swap = 1,
>> +               .may_unmap = 1,
>>                .swap_cluster_max = SWAP_CLUSTER_MAX,
>>                .swappiness = swappiness,
>>                .order = 0,
>> @@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>>        struct zonelist *zonelist;
>>
>>        if (noswap)
>> -               sc.may_swap = 0;
>> +               sc.may_unmap = 0;
>>
>>        sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>>                        (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
>> @@ -1762,7 +1762,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
>>        struct reclaim_state *reclaim_state = current->reclaim_state;
>>        struct scan_control sc = {
>>                .gfp_mask = GFP_KERNEL,
>> -               .may_swap = 1,
>> +               .may_unmap = 1,
>>                .swap_cluster_max = SWAP_CLUSTER_MAX,
>>                .swappiness = vm_swappiness,
>>                .order = order,
>> @@ -2109,7 +2109,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>>        struct reclaim_state reclaim_state;
>>        struct scan_control sc = {
>>                .gfp_mask = GFP_KERNEL,
>> -               .may_swap = 0,
>> +               .may_unmap = 0,
>>                .swap_cluster_max = nr_pages,
>>                .may_writepage = 1,
>>                .swappiness = vm_swappiness,
>> @@ -2147,7 +2147,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>>
>>                /* Force reclaiming mapped pages in the passes #3 and #4 */
>>                if (pass > 2) {
>> -                       sc.may_swap = 1;
>> +                       sc.may_unmap = 1;
>>                        sc.swappiness = 100;
>>                }
>>
>> @@ -2292,7 +2292,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>>        int priority;
>>        struct scan_control sc = {
>>                .may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
>> -               .may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP),
>> +               .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
>>                .swap_cluster_max = max_t(unsigned long, nr_pages,
>>                                        SWAP_CLUSTER_MAX),
>>                .gfp_mask = gfp_mask,
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>
>
>
> --
> Kinds regards,
> MinChan Kim
>



-- 
Kinds regards,
MinChan Kim

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06  4:49     ` Johannes Weiner
  2009-02-06  5:59       ` KOSAKI Motohiro
@ 2009-02-06 21:00       ` Andrew Morton
  2009-02-06 23:27         ` Johannes Weiner
                           ` (3 more replies)
  1 sibling, 4 replies; 43+ messages in thread
From: Andrew Morton @ 2009-02-06 21:00 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: kosaki.motohiro, rjw, riel, linux-kernel, linux-mm

On Fri, 6 Feb 2009 05:49:07 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> > and, I think you should mesure performence result.
> 
> Yes, I'm still thinking about ideas how to quantify it properly.  I
> have not yet found a reliable way to check for whether the working set
> is intact besides seeing whether the resumed applications are
> responsive right away or if they first have to swap in their pages
> again.

Describing your subjective non-quantitative impressions would be better
than nothing...

The patch bugs me.

The whole darn point behind the whole darn page reclaim is "reclaim the
pages which we aren't likely to need soon".  There's nothing special
about the swsusp code at all!  We want it to do exactly what page
reclaim normally does, only faster.

So why do we need to write special hand-rolled code to implement
something which we've already spent ten years writing?

hm?  And if this approach leads to less-than-optimum performance after
resume then the fault lies with core page reclaim - it reclaimed the
wrong pages!

That actually was my thinking when I first worked on
shrink_all_memory() and it did turn out to be surprisingly hard to
simply reuse the existing reclaim code for this application.  Things
kept on going wrong.  IIRC this was because we were freeing pages as we
were reclaiming, so the page reclaim logic kept on seeing all these
free pages and kept on wanting to bale out.

Now, the simple and obvious fix to this is not to free the pages - just
keep on allocating pages and storing them locally until we have
"enough" memory.  Then when we're all done, dump them all straight onto
to the freelists.

But for some reason which I do not recall, we couldn't do that.

It would be good to revisit all this.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 21:00       ` [PATCH 3/3][RFC] swsusp: shrink file cache first Andrew Morton
@ 2009-02-06 23:27         ` Johannes Weiner
  2009-02-07 17:23           ` Rafael J. Wysocki
  2009-02-07  4:41         ` Nigel Cunningham
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06 23:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kosaki.motohiro, rjw, riel, linux-kernel, linux-mm

On Fri, Feb 06, 2009 at 01:00:09PM -0800, Andrew Morton wrote:
> On Fri, 6 Feb 2009 05:49:07 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > > and, I think you should mesure performence result.
> > 
> > Yes, I'm still thinking about ideas how to quantify it properly.  I
> > have not yet found a reliable way to check for whether the working set
> > is intact besides seeing whether the resumed applications are
> > responsive right away or if they first have to swap in their pages
> > again.
> 
> Describing your subjective non-quantitative impressions would be better
> than nothing...

Okay.

> The patch bugs me.

Please ignore it, it is broken as is.  My verbal cortex got obviously
disconnected from my code cortex when writing the changelog...  And I
will reconsider the actual change bits, I still think that we
shouldn't scan anon page lists while may_swap is zero.

> The whole darn point behind the whole darn page reclaim is "reclaim the
> pages which we aren't likely to need soon".  There's nothing special
> about the swsusp code at all!  We want it to do exactly what page
> reclaim normally does, only faster.
> 
> So why do we need to write special hand-rolled code to implement
> something which we've already spent ten years writing?
> 
> hm?  And if this approach leads to less-than-optimum performance after
> resume then the fault lies with core page reclaim - it reclaimed the
> wrong pages!
> 
> That actually was my thinking when I first worked on
> shrink_all_memory() and it did turn out to be surprisingly hard to
> simply reuse the existing reclaim code for this application.  Things
> kept on going wrong.  IIRC this was because we were freeing pages as we
> were reclaiming, so the page reclaim logic kept on seeing all these
> free pages and kept on wanting to bale out.
> 
> Now, the simple and obvious fix to this is not to free the pages - just
> keep on allocating pages and storing them locally until we have
> "enough" memory.  Then when we're all done, dump them all straight onto
> to the freelists.
> 
> But for some reason which I do not recall, we couldn't do that.
> 
> It would be good to revisit all this.

Thanks for the comments, I will see what I can come up with.

	Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 17:15             ` MinChan Kim
@ 2009-02-06 23:37               ` Johannes Weiner
  2009-02-09 19:43               ` [patch] vmscan: rename sc.may_swap to may_unmap Johannes Weiner
  1 sibling, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-02-06 23:37 UTC (permalink / raw)
  To: MinChan Kim
  Cc: KOSAKI Motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

On Sat, Feb 07, 2009 at 02:15:21AM +0900, MinChan Kim wrote:
> >>> Grr, you are right.
> >>> I agree, currently may_swap doesn't control swap out or not.
> >>> so I think we should change it correct name ;)
> >>
> >> Agreed.  What do you think about the following patch?
> >
> > As for me, I can't agree with you.
> > There are two kinds of file-mapped pages.
> >
> > 1. file-mapped and dirty page.
> > 2. file-mapped and no-dirty page
> >
> > Both pages are not swapped.
> > File-mapped and dirty page is synced with original file
> > File-mapped and no-dirty page is just discarded with viewpoint of reclaim.
> >
> > So, may_swap is just related to anon-pages
> > Thus, I think may_swap is reasonable.
> > How about you ?
> 
> Sorry for misunderstood your point.
> It would be better to remain more detaily for git log ?
> 
> 'may_swap' applies not only to anon pages but to mapped file pages as
> well. 'may_swap' term is sometime used for 'swap', sometime used for
> 'sync|discard'.
> In case of anon pages, 'may_swap' determines whether pages were swapout or not.
> but In case of mapped file pages, it determines whether pages are
> synced or discarded. so, 'may_swap' is rather awkward. Rename it to
> 'may_unmap' which is the actual meaning.
> 
> If you find wrong word and sentence, Please, fix it. :)

Cool, thanks.  I will resend an updated version soon with your
changelog text.  And on top of the two fixlets of this series which
Andrew already picked up.

	Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 21:00       ` [PATCH 3/3][RFC] swsusp: shrink file cache first Andrew Morton
  2009-02-06 23:27         ` Johannes Weiner
@ 2009-02-07  4:41         ` Nigel Cunningham
  2009-02-07 16:51         ` KOSAKI Motohiro
  2009-02-27 13:27         ` Pavel Machek
  3 siblings, 0 replies; 43+ messages in thread
From: Nigel Cunningham @ 2009-02-07  4:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, kosaki.motohiro, rjw, riel, linux-kernel, linux-mm

Hi.

On Fri, 2009-02-06 at 13:00 -0800, Andrew Morton wrote:
> On Fri, 6 Feb 2009 05:49:07 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > > and, I think you should mesure performence result.
> > 
> > Yes, I'm still thinking about ideas how to quantify it properly.  I
> > have not yet found a reliable way to check for whether the working set
> > is intact besides seeing whether the resumed applications are
> > responsive right away or if they first have to swap in their pages
> > again.
> 
> Describing your subjective non-quantitative impressions would be better
> than nothing...
> 
> The patch bugs me.
> 
> The whole darn point behind the whole darn page reclaim is "reclaim the
> pages which we aren't likely to need soon".  There's nothing special
> about the swsusp code at all!  We want it to do exactly what page
> reclaim normally does, only faster.
> 
> So why do we need to write special hand-rolled code to implement
> something which we've already spent ten years writing?
> 
> hm?  And if this approach leads to less-than-optimum performance after
> resume then the fault lies with core page reclaim - it reclaimed the
> wrong pages!
> 
> That actually was my thinking when I first worked on
> shrink_all_memory() and it did turn out to be surprisingly hard to
> simply reuse the existing reclaim code for this application.  Things
> kept on going wrong.  IIRC this was because we were freeing pages as we
> were reclaiming, so the page reclaim logic kept on seeing all these
> free pages and kept on wanting to bale out.
> 
> Now, the simple and obvious fix to this is not to free the pages - just
> keep on allocating pages and storing them locally until we have
> "enough" memory.  Then when we're all done, dump them all straight onto
> to the freelists.
> 
> But for some reason which I do not recall, we couldn't do that.
> 
> It would be good to revisit all this.

It would be. It would be good, too, if 'we' could make it actually free
the number of pages it's asked for, too. At the moment, you generally
get something like 90% of what you ask for. Makes carefully calculating
how much you need to ask for a bit pointless :\

Nigel


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 21:00       ` [PATCH 3/3][RFC] swsusp: shrink file cache first Andrew Morton
  2009-02-06 23:27         ` Johannes Weiner
  2009-02-07  4:41         ` Nigel Cunningham
@ 2009-02-07 16:51         ` KOSAKI Motohiro
  2009-02-07 21:20           ` Nigel Cunningham
  2009-02-27 13:27         ` Pavel Machek
  3 siblings, 1 reply; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-02-07 16:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, rjw, riel, linux-kernel, linux-mm

2009/2/7 Andrew Morton <akpm@linux-foundation.org>:
> On Fri, 6 Feb 2009 05:49:07 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
>
>> > and, I think you should mesure performence result.
>>
>> Yes, I'm still thinking about ideas how to quantify it properly.  I
>> have not yet found a reliable way to check for whether the working set
>> is intact besides seeing whether the resumed applications are
>> responsive right away or if they first have to swap in their pages
>> again.
>
> Describing your subjective non-quantitative impressions would be better
> than nothing...
>
> The patch bugs me.
>
> The whole darn point behind the whole darn page reclaim is "reclaim the
> pages which we aren't likely to need soon".  There's nothing special
> about the swsusp code at all!  We want it to do exactly what page
> reclaim normally does, only faster.
>
> So why do we need to write special hand-rolled code to implement
> something which we've already spent ten years writing?
>
> hm?  And if this approach leads to less-than-optimum performance after
> resume then the fault lies with core page reclaim - it reclaimed the
> wrong pages!
>
> That actually was my thinking when I first worked on
> shrink_all_memory() and it did turn out to be surprisingly hard to
> simply reuse the existing reclaim code for this application.  Things
> kept on going wrong.  IIRC this was because we were freeing pages as we
> were reclaiming, so the page reclaim logic kept on seeing all these
> free pages and kept on wanting to bale out.
>
> Now, the simple and obvious fix to this is not to free the pages - just
> keep on allocating pages and storing them locally until we have
> "enough" memory.  Then when we're all done, dump them all straight onto
> to the freelists.
>
> But for some reason which I do not recall, we couldn't do that.

current strategy is introduced commit d6277db4ab271862ed599da08d78961c70f00002
quotation here.

    Author: Rafael J. Wysocki <rjw@sisk.pl>
    Date:   Fri Jun 23 02:03:18 2006 -0700
    Subject: [PATCH] swsusp: rework memory shrinker

    Rework the swsusp's memory shrinker in the following way:

    - Simplify balance_pgdat() by removing all of the swsusp-related code
      from it.

    - Make shrink_all_memory() use shrink_slab() and a new function
      shrink_all_zones() which calls shrink_active_list() and
      shrink_inactive_list() directly for each zone in a way that's optimized
      for suspend.

    In shrink_all_memory() we try to free exactly as many pages as the caller
    asks for, preferably in one shot, starting from easier targets.  ?If slab
    caches are huge, they are most likely to have enough pages to reclaim.
    ?The inactive lists are next (the zones with more inactive pages go first)
    etc.

    Each time shrink_all_memory() attempts to shrink the active and inactive
    lists for each zone in 5 passes.  ?In the first pass, only the inactive
    lists are taken into consideration.  ?In the next two passes the active
    lists are also shrunk, but mapped pages are not reclaimed.  ?In the last
    two passes the active and inactive lists are shrunk and mapped pages are
    reclaimed as well.  The aim of this is to alter the reclaim logic to choose
    the best pages to keep on resume and improve the responsiveness of the

and related discussion mail here.

akpm wrote:
--------------
 And what was the observed effect of all this?

Rafael wrote:
--------------
Measurable effects:
1) It tends to free only as much memory as required, eg. if the image_size
is set to 450 MB, the actual image sizes are almost always well above
400 MB and they tended to be below that number without the patch
(~5-10% of a difference, but still :-)).
2) If image_size = 0, it frees everything that can be freed without any
workarounds (we had to add the additional loop checking for
ret >= nr_pages with the additional blk_congestion_wait() to the
"original" shrinker to achieve this).



my conclusion is, nobody says "we can't". it's performance improvement
purpose commit.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 23:27         ` Johannes Weiner
@ 2009-02-07 17:23           ` Rafael J. Wysocki
  2009-02-08 20:56             ` Johannes Weiner
  0 siblings, 1 reply; 43+ messages in thread
From: Rafael J. Wysocki @ 2009-02-07 17:23 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, kosaki.motohiro, riel, linux-kernel, linux-mm

On Saturday 07 February 2009, Johannes Weiner wrote:
> On Fri, Feb 06, 2009 at 01:00:09PM -0800, Andrew Morton wrote:
> > On Fri, 6 Feb 2009 05:49:07 +0100
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > 
> > > > and, I think you should mesure performence result.
> > > 
> > > Yes, I'm still thinking about ideas how to quantify it properly.  I
> > > have not yet found a reliable way to check for whether the working set
> > > is intact besides seeing whether the resumed applications are
> > > responsive right away or if they first have to swap in their pages
> > > again.
> > 
> > Describing your subjective non-quantitative impressions would be better
> > than nothing...
> 
> Okay.
> 
> > The patch bugs me.
> 
> Please ignore it, it is broken as is.  My verbal cortex got obviously
> disconnected from my code cortex when writing the changelog...

If I understood this correctly, patch 3/3 is to be disregarded.

> And I will reconsider the actual change bits, I still think that we
> shouldn't scan anon page lists while may_swap is zero.

Hm, can you please remind me what may_swap == 0 acutally means?

Rafael

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-07 16:51         ` KOSAKI Motohiro
@ 2009-02-07 21:20           ` Nigel Cunningham
  0 siblings, 0 replies; 43+ messages in thread
From: Nigel Cunningham @ 2009-02-07 21:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Johannes Weiner, rjw, riel, linux-kernel, linux-mm

Hi.

On Sun, 2009-02-08 at 01:51 +0900, KOSAKI Motohiro wrote:
> akpm wrote:
> --------------
>  And what was the observed effect of all this?
> 
> Rafael wrote:
> --------------
> Measurable effects:
> 1) It tends to free only as much memory as required, eg. if the image_size
> is set to 450 MB, the actual image sizes are almost always well above
> 400 MB and they tended to be below that number without the patch
> (~5-10% of a difference, but still :-)).

Do you always get at least the number of pages you ask for, if that's
possible?

Regards,

Nigel


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-07 17:23           ` Rafael J. Wysocki
@ 2009-02-08 20:56             ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-02-08 20:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, kosaki.motohiro, riel, linux-kernel, linux-mm

On Sat, Feb 07, 2009 at 06:23:53PM +0100, Rafael J. Wysocki wrote:
> On Saturday 07 February 2009, Johannes Weiner wrote:
> > On Fri, Feb 06, 2009 at 01:00:09PM -0800, Andrew Morton wrote:
> > > On Fri, 6 Feb 2009 05:49:07 +0100
> > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > 
> > > > > and, I think you should mesure performence result.
> > > > 
> > > > Yes, I'm still thinking about ideas how to quantify it properly.  I
> > > > have not yet found a reliable way to check for whether the working set
> > > > is intact besides seeing whether the resumed applications are
> > > > responsive right away or if they first have to swap in their pages
> > > > again.
> > > 
> > > Describing your subjective non-quantitative impressions would be better
> > > than nothing...
> > 
> > Okay.
> > 
> > > The patch bugs me.
> > 
> > Please ignore it, it is broken as is.  My verbal cortex got obviously
> > disconnected from my code cortex when writing the changelog...
> 
> If I understood this correctly, patch 3/3 is to be disregarded.
> 
> > And I will reconsider the actual change bits, I still think that we
> > shouldn't scan anon page lists while may_swap is zero.
> 
> Hm, can you please remind me what may_swap == 0 acutally means?

That no mapped pages are reclaimed.  These are also mapped file pages,
but more importantly in this case, anon pages.  See this check in
shrink_page_list():

                if (!sc->may_swap && page_mapped(page))
                        goto keep_locked;

So scanning anon lists without allowing to unmap doesn't free memory.

	Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [patch] vmscan: rename sc.may_swap to may_unmap
  2009-02-06 17:15             ` MinChan Kim
  2009-02-06 23:37               ` Johannes Weiner
@ 2009-02-09 19:43               ` Johannes Weiner
  2009-02-09 23:02                 ` MinChan Kim
  2009-03-27  6:19                 ` [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap) Daisuke Nishimura
  1 sibling, 2 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-02-09 19:43 UTC (permalink / raw)
  To: MinChan Kim
  Cc: KOSAKI Motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

sc.may_swap does not only influence reclaiming of anon pages but pages
mapped into pagetables in general, which also includes mapped file
pages.

>From shrink_page_list():

		if (!sc->may_swap && page_mapped(page))
			goto keep_locked;

For anon pages, this makes sense as they are always mapped and
reclaiming them always requires swapping.

But mapped file pages are skipped here as well and it has nothing to
do with swapping.

The real effect of the knob is whether mapped pages are unmapped and
reclaimed or not.  Rename it to `may_unmap' to have its name match its
actual meaning more precisely.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

On Sat, Feb 07, 2009 at 02:15:21AM +0900, MinChan Kim wrote:
> Sorry for misunderstood your point.
> It would be better to remain more detaily for git log ?
> 
> 'may_swap' applies not only to anon pages but to mapped file pages as
> well. 'may_swap' term is sometime used for 'swap', sometime used for
> 'sync|discard'.
> In case of anon pages, 'may_swap' determines whether pages were swapout or not.
> but In case of mapped file pages, it determines whether pages are
> synced or discarded. so, 'may_swap' is rather awkward. Rename it to
> 'may_unmap' which is the actual meaning.
> 
> If you find wrong word and sentence, Please, fix it. :)

Is the above description okay for you?

Andrew, this is on top of the two earlier clean ups in vmscan.c:
+ swsusp-clean-up-shrink_all_zones.patch
+ swsusp-dont-fiddle-with-swappiness.patch

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -60,8 +60,8 @@ struct scan_control {
 
 	int may_writepage;
 
-	/* Can pages be swapped as part of reclaim? */
-	int may_swap;
+	/* Can mapped pages be reclaimed? */
+	int may_unmap;
 
 	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
 	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
@@ -606,7 +606,7 @@ static unsigned long shrink_page_list(st
 		if (unlikely(!page_evictable(page, NULL)))
 			goto cull_mlocked;
 
-		if (!sc->may_swap && page_mapped(page))
+		if (!sc->may_unmap && page_mapped(page))
 			goto keep_locked;
 
 		/* Double the slab pressure for mapped and swapcache pages */
@@ -1694,7 +1694,7 @@ unsigned long try_to_free_pages(struct z
 		.gfp_mask = gfp_mask,
 		.may_writepage = !laptop_mode,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
-		.may_swap = 1,
+		.may_unmap = 1,
 		.swappiness = vm_swappiness,
 		.order = order,
 		.mem_cgroup = NULL,
@@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pag
 {
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
-		.may_swap = 1,
+		.may_unmap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = swappiness,
 		.order = 0,
@@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pag
 	struct zonelist *zonelist;
 
 	if (noswap)
-		sc.may_swap = 0;
+		sc.may_unmap = 0;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -1762,7 +1762,7 @@ static unsigned long balance_pgdat(pg_da
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
-		.may_swap = 1,
+		.may_unmap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = vm_swappiness,
 		.order = order,
@@ -2108,7 +2108,7 @@ unsigned long shrink_all_memory(unsigned
 	struct reclaim_state reclaim_state;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
-		.may_swap = 0,
+		.may_unmap = 0,
 		.swap_cluster_max = nr_pages,
 		.may_writepage = 1,
 		.isolate_pages = isolate_pages_global,
@@ -2145,7 +2145,7 @@ unsigned long shrink_all_memory(unsigned
 
 		/* Force reclaiming mapped pages in the passes #3 and #4 */
 		if (pass > 2)
-			sc.may_swap = 1;
+			sc.may_unmap = 1;
 
 		for (prio = DEF_PRIORITY; prio >= 0; prio--) {
 			unsigned long nr_to_scan = nr_pages - ret;
@@ -2288,7 +2288,7 @@ static int __zone_reclaim(struct zone *z
 	int priority;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
-		.may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP),
+		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
 		.swap_cluster_max = max_t(unsigned long, nr_pages,
 					SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] vmscan: rename sc.may_swap to may_unmap
  2009-02-09 19:43               ` [patch] vmscan: rename sc.may_swap to may_unmap Johannes Weiner
@ 2009-02-09 23:02                 ` MinChan Kim
  2009-02-10 10:00                   ` KOSAKI Motohiro
  2009-03-27  6:19                 ` [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap) Daisuke Nishimura
  1 sibling, 1 reply; 43+ messages in thread
From: MinChan Kim @ 2009-02-09 23:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KOSAKI Motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	linux-kernel, linux-mm

On Tue, Feb 10, 2009 at 4:43 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> sc.may_swap does not only influence reclaiming of anon pages but pages
> mapped into pagetables in general, which also includes mapped file
> pages.
>
> From shrink_page_list():
>
>                if (!sc->may_swap && page_mapped(page))
>                        goto keep_locked;
>
> For anon pages, this makes sense as they are always mapped and
> reclaiming them always requires swapping.
>
> But mapped file pages are skipped here as well and it has nothing to
> do with swapping.
>
> The real effect of the knob is whether mapped pages are unmapped and
> reclaimed or not.  Rename it to `may_unmap' to have its name match its
> actual meaning more precisely.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c |   20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> On Sat, Feb 07, 2009 at 02:15:21AM +0900, MinChan Kim wrote:
>> Sorry for misunderstood your point.
>> It would be better to remain more detaily for git log ?
>>
>> 'may_swap' applies not only to anon pages but to mapped file pages as
>> well. 'may_swap' term is sometime used for 'swap', sometime used for
>> 'sync|discard'.
>> In case of anon pages, 'may_swap' determines whether pages were swapout or not.
>> but In case of mapped file pages, it determines whether pages are
>> synced or discarded. so, 'may_swap' is rather awkward. Rename it to
>> 'may_unmap' which is the actual meaning.
>>
>> If you find wrong word and sentence, Please, fix it. :)
>
> Is the above description okay for you?
>

It looks good to me. :)

Reviewed-by: MinChan Kim <minchan.kim@gmail.com>

-- 
Kinds regards,
MinChan Kim

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [patch] vmscan: rename sc.may_swap to may_unmap
  2009-02-09 23:02                 ` MinChan Kim
@ 2009-02-10 10:00                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-02-10 10:00 UTC (permalink / raw)
  To: MinChan Kim
  Cc: kosaki.motohiro, Johannes Weiner, Andrew Morton,
	Rafael J. Wysocki, Rik van Riel, linux-kernel, linux-mm

> On Tue, Feb 10, 2009 at 4:43 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > sc.may_swap does not only influence reclaiming of anon pages but pages
> > mapped into pagetables in general, which also includes mapped file
> > pages.
> >
> > From shrink_page_list():
> >
> >                if (!sc->may_swap && page_mapped(page))
> >                        goto keep_locked;
> >
> > For anon pages, this makes sense as they are always mapped and
> > reclaiming them always requires swapping.
> >
> > But mapped file pages are skipped here as well and it has nothing to
> > do with swapping.
> >
> > The real effect of the knob is whether mapped pages are unmapped and
> > reclaimed or not.  Rename it to `may_unmap' to have its name match its
> > actual meaning more precisely.
> >
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/vmscan.c |   20 ++++++++++----------
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> It looks good to me. :)
> 
> Reviewed-by: MinChan Kim <minchan.kim@gmail.com>

me too :)

	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-06 21:00       ` [PATCH 3/3][RFC] swsusp: shrink file cache first Andrew Morton
                           ` (2 preceding siblings ...)
  2009-02-07 16:51         ` KOSAKI Motohiro
@ 2009-02-27 13:27         ` Pavel Machek
  2009-03-01 10:37           ` KOSAKI Motohiro
  3 siblings, 1 reply; 43+ messages in thread
From: Pavel Machek @ 2009-02-27 13:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, kosaki.motohiro, rjw, riel, linux-kernel, linux-mm

On Fri 2009-02-06 13:00:09, Andrew Morton wrote:
> On Fri, 6 Feb 2009 05:49:07 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > > and, I think you should mesure performence result.
> > 
> > Yes, I'm still thinking about ideas how to quantify it properly.  I
> > have not yet found a reliable way to check for whether the working set
> > is intact besides seeing whether the resumed applications are
> > responsive right away or if they first have to swap in their pages
> > again.
> 
> Describing your subjective non-quantitative impressions would be better
> than nothing...
> 
> The patch bugs me.
> 
> The whole darn point behind the whole darn page reclaim is "reclaim the
> pages which we aren't likely to need soon".  There's nothing special
> about the swsusp code at all!  We want it to do exactly what page
> reclaim normally does, only faster.
> 
> So why do we need to write special hand-rolled code to implement
> something which we've already spent ten years writing?
> 
> hm?  And if this approach leads to less-than-optimum performance after
> resume then the fault lies with core page reclaim - it reclaimed the
> wrong pages!
> 
> That actually was my thinking when I first worked on
> shrink_all_memory() and it did turn out to be surprisingly hard to
> simply reuse the existing reclaim code for this application.  Things
> kept on going wrong.  IIRC this was because we were freeing pages as we
> were reclaiming, so the page reclaim logic kept on seeing all these
> free pages and kept on wanting to bale out.
> 
> Now, the simple and obvious fix to this is not to free the pages - just
> keep on allocating pages and storing them locally until we have
> "enough" memory.  Then when we're all done, dump them all straight onto
> to the freelists.
> 
> But for some reason which I do not recall, we couldn't do that.

We used to do that. I remember having loop doing get_free_page and
doing linklist of them. I believe it was considered quite an hack.

.....one reason is that ee don't want to OOMkill anything if memory is
low, we want to abort the hibernation...

Sorry for being late.....


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
  2009-02-27 13:27         ` Pavel Machek
@ 2009-03-01 10:37           ` KOSAKI Motohiro
  0 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-03-01 10:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kosaki.motohiro, Andrew Morton, Johannes Weiner, rjw, riel,
	linux-kernel, linux-mm

> > hm?  And if this approach leads to less-than-optimum performance after
> > resume then the fault lies with core page reclaim - it reclaimed the
> > wrong pages!
> > 
> > That actually was my thinking when I first worked on
> > shrink_all_memory() and it did turn out to be surprisingly hard to
> > simply reuse the existing reclaim code for this application.  Things
> > kept on going wrong.  IIRC this was because we were freeing pages as we
> > were reclaiming, so the page reclaim logic kept on seeing all these
> > free pages and kept on wanting to bale out.
> > 
> > Now, the simple and obvious fix to this is not to free the pages - just
> > keep on allocating pages and storing them locally until we have
> > "enough" memory.  Then when we're all done, dump them all straight onto
> > to the freelists.
> > 
> > But for some reason which I do not recall, we couldn't do that.
> 
> We used to do that. I remember having loop doing get_free_page and
> doing linklist of them. I believe it was considered quite an hack.
> 
> .....one reason is that ee don't want to OOMkill anything if memory is
> low, we want to abort the hibernation...
> 
> Sorry for being late.....

Not at all.
your information is really helpful.

maybe, I expect we can make simplification without oomkill...




^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap)
  2009-02-09 19:43               ` [patch] vmscan: rename sc.may_swap to may_unmap Johannes Weiner
  2009-02-09 23:02                 ` MinChan Kim
@ 2009-03-27  6:19                 ` Daisuke Nishimura
  2009-03-27  6:30                   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 43+ messages in thread
From: Daisuke Nishimura @ 2009-03-27  6:19 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Johannes Weiner, MinChan Kim, KOSAKI Motohiro, Andrew Morton,
	Rafael J. Wysocki, Rik van Riel, KAMEZAWA Hiroyuki, Balbir Singh,
	Daisuke Nishimura

Added
 Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
 Cc: Balbir Singh <balbir@in.ibm.com>

I'm sorry for replying to a very old mail.

> @@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pag
>  {
>  	struct scan_control sc = {
>  		.may_writepage = !laptop_mode,
> -		.may_swap = 1,
> +		.may_unmap = 1,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
>  		.swappiness = swappiness,
>  		.order = 0,
> @@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pag
>  	struct zonelist *zonelist;
>  
>  	if (noswap)
> -		sc.may_swap = 0;
> +		sc.may_unmap = 0;
>  
>  	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>  			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
IIUC, memcg had used may_swap as a flag for "we need to use swap?" as the name indicate.

Because, when mem+swap hits the limit, trying to swapout pages is meaningless
as it doesn't change mem+swap usage.

What do you think of this patch?
===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
but memcg had used it as a flag for "we need to use swap?", as the
name indicate.

And in current implementation, memcg cannot reclaim mapped file caches
when mem+swap hits the limit.

re-introduce may_swap flag and handle it at shrink_page_list.

This patch doesn't influence any scan_control users other than memcg.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 mm/vmscan.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c815653..86118d9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -64,6 +64,9 @@ struct scan_control {
 	/* Can mapped pages be reclaimed? */
 	int may_unmap;
 
+	/* Can pages be swapped as part of reclaim? */
+	int may_swap;
+
 	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
 	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
 	 * In this context, it doesn't matter that we scan the
@@ -616,6 +619,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (!sc->may_unmap && page_mapped(page))
 			goto keep_locked;
 
+		if (!sc->may_swap && PageSwapBacked(page)
+			/* SwapCache uses 'swap' already */
+			&& !PageSwapCache(page))
+			goto keep_locked;
+
 		/* Double the slab pressure for mapped and swapcache pages */
 		if (page_mapped(page) || PageSwapCache(page))
 			sc->nr_scanned++;
@@ -1696,6 +1704,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 		.may_writepage = !laptop_mode,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.may_unmap = 1,
+		.may_swap = 1,
 		.swappiness = vm_swappiness,
 		.order = order,
 		.mem_cgroup = NULL,
@@ -1715,6 +1724,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
 		.may_unmap = 1,
+		.may_swap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = swappiness,
 		.order = 0,
@@ -1724,7 +1734,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 	struct zonelist *zonelist;
 
 	if (noswap)
-		sc.may_unmap = 0;
+		sc.may_swap = 0;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -1764,6 +1774,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
 		.may_unmap = 1,
+		.may_swap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = vm_swappiness,
 		.order = order,
@@ -2110,6 +2121,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
 		.may_unmap = 0,
+		.may_swap = 1,
 		.may_writepage = 1,
 		.isolate_pages = isolate_pages_global,
 	};
@@ -2292,6 +2304,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
+		.may_swap = 1,
 		.swap_cluster_max = max_t(unsigned long, nr_pages,
 					SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap)
  2009-03-27  6:19                 ` [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap) Daisuke Nishimura
@ 2009-03-27  6:30                   ` KAMEZAWA Hiroyuki
  2009-03-29 23:45                     ` KOSAKI Motohiro
  0 siblings, 1 reply; 43+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-27  6:30 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-kernel, linux-mm, Johannes Weiner, MinChan Kim,
	KOSAKI Motohiro, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	Balbir Singh

On Fri, 27 Mar 2009 15:19:26 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Added
>  Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>  Cc: Balbir Singh <balbir@in.ibm.com>
> 
> I'm sorry for replying to a very old mail.
> 
> > @@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pag
> >  {
> >  	struct scan_control sc = {
> >  		.may_writepage = !laptop_mode,
> > -		.may_swap = 1,
> > +		.may_unmap = 1,
> >  		.swap_cluster_max = SWAP_CLUSTER_MAX,
> >  		.swappiness = swappiness,
> >  		.order = 0,
> > @@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pag
> >  	struct zonelist *zonelist;
> >  
> >  	if (noswap)
> > -		sc.may_swap = 0;
> > +		sc.may_unmap = 0;
> >  
> >  	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> >  			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> IIUC, memcg had used may_swap as a flag for "we need to use swap?" as the name indicate.
> 
> Because, when mem+swap hits the limit, trying to swapout pages is meaningless
> as it doesn't change mem+swap usage.
> 
Good catch...sigh, I missed this disussion.



> What do you think of this patch?
> ===
> From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
> but memcg had used it as a flag for "we need to use swap?", as the
> name indicate.
> 
> And in current implementation, memcg cannot reclaim mapped file caches
> when mem+swap hits the limit.
> 
When mem+swap hits the limit, swap-out anonymous page doesn't reduce the
amount of usage of mem+swap, so, swap-out should be avoided.

> re-introduce may_swap flag and handle it at shrink_page_list.
> 
> This patch doesn't influence any scan_control users other than memcg.
> 


> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

Seems good,
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

But hum....Maybe this lru scan work in the same way as the case
of !total_swap_pages. (means don't scan anon LRU.)
revisit this later.

-Kame

> ---
>  mm/vmscan.c |   15 ++++++++++++++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c815653..86118d9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -64,6 +64,9 @@ struct scan_control {
>  	/* Can mapped pages be reclaimed? */
>  	int may_unmap;
>  
> +	/* Can pages be swapped as part of reclaim? */
> +	int may_swap;
> +
>  	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
>  	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
>  	 * In this context, it doesn't matter that we scan the
> @@ -616,6 +619,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		if (!sc->may_unmap && page_mapped(page))
>  			goto keep_locked;
>  
> +		if (!sc->may_swap && PageSwapBacked(page)
> +			/* SwapCache uses 'swap' already */
> +			&& !PageSwapCache(page))
> +			goto keep_locked;
> +
>  		/* Double the slab pressure for mapped and swapcache pages */
>  		if (page_mapped(page) || PageSwapCache(page))
>  			sc->nr_scanned++;
> @@ -1696,6 +1704,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  		.may_writepage = !laptop_mode,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
>  		.may_unmap = 1,
> +		.may_swap = 1,
>  		.swappiness = vm_swappiness,
>  		.order = order,
>  		.mem_cgroup = NULL,
> @@ -1715,6 +1724,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>  	struct scan_control sc = {
>  		.may_writepage = !laptop_mode,
>  		.may_unmap = 1,
> +		.may_swap = 1,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
>  		.swappiness = swappiness,
>  		.order = 0,
> @@ -1724,7 +1734,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>  	struct zonelist *zonelist;
>  
>  	if (noswap)
> -		sc.may_unmap = 0;
> +		sc.may_swap = 0;
>  
>  	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>  			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> @@ -1764,6 +1774,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
>  	struct scan_control sc = {
>  		.gfp_mask = GFP_KERNEL,
>  		.may_unmap = 1,
> +		.may_swap = 1,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
>  		.swappiness = vm_swappiness,
>  		.order = order,
> @@ -2110,6 +2121,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  	struct scan_control sc = {
>  		.gfp_mask = GFP_KERNEL,
>  		.may_unmap = 0,
> +		.may_swap = 1,
>  		.may_writepage = 1,
>  		.isolate_pages = isolate_pages_global,
>  	};
> @@ -2292,6 +2304,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	struct scan_control sc = {
>  		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
>  		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
> +		.may_swap = 1,
>  		.swap_cluster_max = max_t(unsigned long, nr_pages,
>  					SWAP_CLUSTER_MAX),
>  		.gfp_mask = gfp_mask,
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap)
  2009-03-27  6:30                   ` KAMEZAWA Hiroyuki
@ 2009-03-29 23:45                     ` KOSAKI Motohiro
  2009-03-31  0:18                       ` Daisuke Nishimura
  2009-03-31  1:26                       ` Minchan Kim
  0 siblings, 2 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-03-29 23:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Daisuke Nishimura, linux-kernel, linux-mm,
	Johannes Weiner, MinChan Kim, Andrew Morton, Rafael J. Wysocki,
	Rik van Riel, Balbir Singh

> On Fri, 27 Mar 2009 15:19:26 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > Added
> >  Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >  Cc: Balbir Singh <balbir@in.ibm.com>
> > 
> > I'm sorry for replying to a very old mail.
> > 
> > > @@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pag
> > >  {
> > >  	struct scan_control sc = {
> > >  		.may_writepage = !laptop_mode,
> > > -		.may_swap = 1,
> > > +		.may_unmap = 1,
> > >  		.swap_cluster_max = SWAP_CLUSTER_MAX,
> > >  		.swappiness = swappiness,
> > >  		.order = 0,
> > > @@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pag
> > >  	struct zonelist *zonelist;
> > >  
> > >  	if (noswap)
> > > -		sc.may_swap = 0;
> > > +		sc.may_unmap = 0;
> > >  
> > >  	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> > >  			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> > IIUC, memcg had used may_swap as a flag for "we need to use swap?" as the name indicate.
> > 
> > Because, when mem+swap hits the limit, trying to swapout pages is meaningless
> > as it doesn't change mem+swap usage.
> > 
> Good catch...sigh, I missed this disussion.
> 
> 
> 
> > What do you think of this patch?
> > ===
> > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > 
> > vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
> > but memcg had used it as a flag for "we need to use swap?", as the
> > name indicate.
> > 
> > And in current implementation, memcg cannot reclaim mapped file caches
> > when mem+swap hits the limit.
> > 
> When mem+swap hits the limit, swap-out anonymous page doesn't reduce the
> amount of usage of mem+swap, so, swap-out should be avoided.
> 
> > re-introduce may_swap flag and handle it at shrink_page_list.
> > 
> > This patch doesn't influence any scan_control users other than memcg.
> > 
> 
> 
> > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> 
> Seems good,
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> But hum....Maybe this lru scan work in the same way as the case
> of !total_swap_pages. (means don't scan anon LRU.)
> revisit this later.

Well, How about following patch?

So, I have to agree my judgement of may_unmap was wrong.
You explain memcg can use may_swap instead may_unmap. and I think
other may_unmap user (zone_reclaim and shrink_all_list) can convert
may_unmap code to may_swap.

IOW, Nishimura-san, you explain we can remove the branch of the may_unmap
from shrink_page_list().
it's really good job. thanks!


========
Subject: vmswan: reintroduce sc->may_swap

vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
but memcg had used it as a flag for "we need to use swap?", as the
name indicate.

And in current implementation, memcg cannot reclaim mapped file caches
when mem+swap hits the limit.

re-introduce may_swap flag and handle it at get_scan_ratio().
This patch doesn't influence any scan_control users other than memcg.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
--
 mm/vmscan.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3be6157..00ea4a1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -63,6 +63,9 @@ struct scan_control {
 	/* Can mapped pages be reclaimed? */
 	int may_unmap;
 
+	/* Can pages be swapped as part of reclaim? */
+	int may_swap;
+
 	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
 	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
 	 * In this context, it doesn't matter that we scan the
@@ -1379,7 +1382,7 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	/* If we have no swap space, do not bother scanning anon pages. */
-	if (nr_swap_pages <= 0) {
+	if (!sc->may_swap || (nr_swap_pages <= 0)) {
 		percent[0] = 0;
 		percent[1] = 100;
 		return;
@@ -1695,6 +1698,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 		.may_writepage = !laptop_mode,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.may_unmap = 1,
+		.may_swap = 1,
 		.swappiness = vm_swappiness,
 		.order = order,
 		.mem_cgroup = NULL,
@@ -1714,6 +1718,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 	struct scan_control sc = {
 		.may_writepage = !laptop_mode,
 		.may_unmap = 1,
+		.may_swap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = swappiness,
 		.order = 0,
@@ -1723,7 +1728,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 	struct zonelist *zonelist;
 
 	if (noswap)
-		sc.may_unmap = 0;
+		sc.may_swap = 0;
 
 	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -1763,6 +1768,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
 		.may_unmap = 1,
+		.may_swap = 1,
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.swappiness = vm_swappiness,
 		.order = order,
@@ -2109,6 +2115,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
 		.may_unmap = 0,
+		.may_swap = 1,
 		.swap_cluster_max = nr_pages,
 		.may_writepage = 1,
 		.isolate_pages = isolate_pages_global,
@@ -2289,6 +2296,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
+		.may_swap = 1,
 		.swap_cluster_max = max_t(unsigned long, nr_pages,
 					SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,





^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap)
  2009-03-29 23:45                     ` KOSAKI Motohiro
@ 2009-03-31  0:18                       ` Daisuke Nishimura
  2009-03-31  1:26                       ` Minchan Kim
  1 sibling, 0 replies; 43+ messages in thread
From: Daisuke Nishimura @ 2009-03-31  0:18 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: nishimura, KAMEZAWA Hiroyuki, linux-kernel, linux-mm,
	Johannes Weiner, MinChan Kim, Andrew Morton, Rafael J. Wysocki,
	Rik van Riel, Balbir Singh

On Mon, 30 Mar 2009 08:45:28 +0900 (JST), KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > On Fri, 27 Mar 2009 15:19:26 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > Added
> > >  Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > >  Cc: Balbir Singh <balbir@in.ibm.com>
> > > 
> > > I'm sorry for replying to a very old mail.
> > > 
> > > > @@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pag
> > > >  {
> > > >  	struct scan_control sc = {
> > > >  		.may_writepage = !laptop_mode,
> > > > -		.may_swap = 1,
> > > > +		.may_unmap = 1,
> > > >  		.swap_cluster_max = SWAP_CLUSTER_MAX,
> > > >  		.swappiness = swappiness,
> > > >  		.order = 0,
> > > > @@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pag
> > > >  	struct zonelist *zonelist;
> > > >  
> > > >  	if (noswap)
> > > > -		sc.may_swap = 0;
> > > > +		sc.may_unmap = 0;
> > > >  
> > > >  	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> > > >  			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> > > IIUC, memcg had used may_swap as a flag for "we need to use swap?" as the name indicate.
> > > 
> > > Because, when mem+swap hits the limit, trying to swapout pages is meaningless
> > > as it doesn't change mem+swap usage.
> > > 
> > Good catch...sigh, I missed this disussion.
> > 
> > 
> > 
> > > What do you think of this patch?
> > > ===
> > > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > > 
> > > vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
> > > but memcg had used it as a flag for "we need to use swap?", as the
> > > name indicate.
> > > 
> > > And in current implementation, memcg cannot reclaim mapped file caches
> > > when mem+swap hits the limit.
> > > 
> > When mem+swap hits the limit, swap-out anonymous page doesn't reduce the
> > amount of usage of mem+swap, so, swap-out should be avoided.
> > 
> > > re-introduce may_swap flag and handle it at shrink_page_list.
> > > 
> > > This patch doesn't influence any scan_control users other than memcg.
> > > 
> > 
> > 
> > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > 
> > Seems good,
> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > But hum....Maybe this lru scan work in the same way as the case
> > of !total_swap_pages. (means don't scan anon LRU.)
> > revisit this later.
> 
> Well, How about following patch?
> 
I think your patch looks better because vain scanning of anon list
is avoided.

Thanks,
Daisuke Nishimura.

> So, I have to agree my judgement of may_unmap was wrong.
> You explain memcg can use may_swap instead may_unmap. and I think
> other may_unmap user (zone_reclaim and shrink_all_list) can convert
> may_unmap code to may_swap.
> 
> IOW, Nishimura-san, you explain we can remove the branch of the may_unmap
> from shrink_page_list().
> it's really good job. thanks!
> 
> 
> ========
> Subject: vmswan: reintroduce sc->may_swap
> 
> vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
> but memcg had used it as a flag for "we need to use swap?", as the
> name indicate.
> 
> And in current implementation, memcg cannot reclaim mapped file caches
> when mem+swap hits the limit.
> 
> re-introduce may_swap flag and handle it at get_scan_ratio().
> This patch doesn't influence any scan_control users other than memcg.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> --
>  mm/vmscan.c |   12 ++++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 3be6157..00ea4a1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -63,6 +63,9 @@ struct scan_control {
>  	/* Can mapped pages be reclaimed? */
>  	int may_unmap;
>  
> +	/* Can pages be swapped as part of reclaim? */
> +	int may_swap;
> +
>  	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
>  	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
>  	 * In this context, it doesn't matter that we scan the
> @@ -1379,7 +1382,7 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>  
>  	/* If we have no swap space, do not bother scanning anon pages. */
> -	if (nr_swap_pages <= 0) {
> +	if (!sc->may_swap || (nr_swap_pages <= 0)) {
>  		percent[0] = 0;
>  		percent[1] = 100;
>  		return;
> @@ -1695,6 +1698,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  		.may_writepage = !laptop_mode,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
>  		.may_unmap = 1,
> +		.may_swap = 1,
>  		.swappiness = vm_swappiness,
>  		.order = order,
>  		.mem_cgroup = NULL,
> @@ -1714,6 +1718,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>  	struct scan_control sc = {
>  		.may_writepage = !laptop_mode,
>  		.may_unmap = 1,
> +		.may_swap = 1,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
>  		.swappiness = swappiness,
>  		.order = 0,
> @@ -1723,7 +1728,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>  	struct zonelist *zonelist;
>  
>  	if (noswap)
> -		sc.may_unmap = 0;
> +		sc.may_swap = 0;
>  
>  	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>  			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> @@ -1763,6 +1768,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
>  	struct scan_control sc = {
>  		.gfp_mask = GFP_KERNEL,
>  		.may_unmap = 1,
> +		.may_swap = 1,
>  		.swap_cluster_max = SWAP_CLUSTER_MAX,
>  		.swappiness = vm_swappiness,
>  		.order = order,
> @@ -2109,6 +2115,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  	struct scan_control sc = {
>  		.gfp_mask = GFP_KERNEL,
>  		.may_unmap = 0,
> +		.may_swap = 1,
>  		.swap_cluster_max = nr_pages,
>  		.may_writepage = 1,
>  		.isolate_pages = isolate_pages_global,
> @@ -2289,6 +2296,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	struct scan_control sc = {
>  		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
>  		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
> +		.may_swap = 1,
>  		.swap_cluster_max = max_t(unsigned long, nr_pages,
>  					SWAP_CLUSTER_MAX),
>  		.gfp_mask = gfp_mask,
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-03-29 23:45                     ` KOSAKI Motohiro
  2009-03-31  0:18                       ` Daisuke Nishimura
@ 2009-03-31  1:26                       ` Minchan Kim
  2009-03-31  1:42                         ` KAMEZAWA Hiroyuki
  2009-03-31  1:52                         ` Daisuke Nishimura
  1 sibling, 2 replies; 43+ messages in thread
From: Minchan Kim @ 2009-03-31  1:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, linux-kernel, linux-mm,
	Johannes Weiner, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	Balbir Singh

Hi,




On Mon, Mar 30, 2009 at 8:45 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> On Fri, 27 Mar 2009 15:19:26 +0900
>> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>>
>> > Added
>> >  Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >  Cc: Balbir Singh <balbir@in.ibm.com>
>> >
>> > I'm sorry for replying to a very old mail.
>> >
>> > > @@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pag
>> > >  {
>> > >   struct scan_control sc = {
>> > >           .may_writepage = !laptop_mode,
>> > > -         .may_swap = 1,
>> > > +         .may_unmap = 1,
>> > >           .swap_cluster_max = SWAP_CLUSTER_MAX,
>> > >           .swappiness = swappiness,
>> > >           .order = 0,
>> > > @@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pag
>> > >   struct zonelist *zonelist;
>> > >
>> > >   if (noswap)
>> > > -         sc.may_swap = 0;
>> > > +         sc.may_unmap = 0;
>> > >
>> > >   sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>> > >                   (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
>> > IIUC, memcg had used may_swap as a flag for "we need to use swap?" as the name indicate.
>> >
>> > Because, when mem+swap hits the limit, trying to swapout pages is meaningless
>> > as it doesn't change mem+swap usage.
>> >
>> Good catch...sigh, I missed this disussion.
>>
>>
>>
>> > What do you think of this patch?
>> > ===
>> > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>> >
>> > vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
>> > but memcg had used it as a flag for "we need to use swap?", as the
>> > name indicate.
>> >
>> > And in current implementation, memcg cannot reclaim mapped file caches
>> > when mem+swap hits the limit.
>> >
>> When mem+swap hits the limit, swap-out anonymous page doesn't reduce the
>> amount of usage of mem+swap, so, swap-out should be avoided.
>>
>> > re-introduce may_swap flag and handle it at shrink_page_list.
>> >
>> > This patch doesn't influence any scan_control users other than memcg.
>> >
>>
>>
>> > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>>
>> Seems good,
>> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> But hum....Maybe this lru scan work in the same way as the case
>> of !total_swap_pages. (means don't scan anon LRU.)
>> revisit this later.
>
> Well, How about following patch?
>
> So, I have to agree my judgement of may_unmap was wrong.
> You explain memcg can use may_swap instead may_unmap. and I think
> other may_unmap user (zone_reclaim and shrink_all_list) can convert
> may_unmap code to may_swap.
>
> IOW, Nishimura-san, you explain we can remove the branch of the may_unmap
> from shrink_page_list().
> it's really good job. thanks!
>
>
> ========
> Subject: vmswan: reintroduce sc->may_swap
>
> vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
> but memcg had used it as a flag for "we need to use swap?", as the
> name indicate.
>
> And in current implementation, memcg cannot reclaim mapped file caches
> when mem+swap hits the limit.
>
> re-introduce may_swap flag and handle it at get_scan_ratio().
> This patch doesn't influence any scan_control users other than memcg.
>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> --
>  mm/vmscan.c |   12 ++++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 3be6157..00ea4a1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -63,6 +63,9 @@ struct scan_control {
>        /* Can mapped pages be reclaimed? */
>        int may_unmap;
>
> +       /* Can pages be swapped as part of reclaim? */
> +       int may_swap;
> +

Sorry for too late response.
I don't know memcg well.

The memcg managed to use may_swap well with global page reclaim until now.
I think that was because may_swap can represent both meaning.
Do we need each variables really ?

How about using union variable ?
---

struct scan_control {
  /* Incremented by the number of inactive pages that were scanned */
  unsigned long nr_scanned;
...
   union {
    int may_swap; /* memcg: Cap pages be swapped as part of reclaim? */
    int may_unmap /* global: Can mapped pages be reclaimed? */
  };



>        /* This context's SWAP_CLUSTER_MAX. If freeing memory for
>         * suspend, we effectively ignore SWAP_CLUSTER_MAX.
>         * In this context, it doesn't matter that we scan the
> @@ -1379,7 +1382,7 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
>        struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>
>        /* If we have no swap space, do not bother scanning anon pages. */
> -       if (nr_swap_pages <= 0) {
> +       if (!sc->may_swap || (nr_swap_pages <= 0)) {
>                percent[0] = 0;
>                percent[1] = 100;
>                return;
> @@ -1695,6 +1698,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>                .may_writepage = !laptop_mode,
>                .swap_cluster_max = SWAP_CLUSTER_MAX,
>                .may_unmap = 1,
> +               .may_swap = 1,
>                .swappiness = vm_swappiness,
>                .order = order,
>                .mem_cgroup = NULL,
> @@ -1714,6 +1718,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>        struct scan_control sc = {
>                .may_writepage = !laptop_mode,
>                .may_unmap = 1,
> +               .may_swap = 1,
>                .swap_cluster_max = SWAP_CLUSTER_MAX,
>                .swappiness = swappiness,
>                .order = 0,
> @@ -1723,7 +1728,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>        struct zonelist *zonelist;
>
>        if (noswap)
> -               sc.may_unmap = 0;
> +               sc.may_swap = 0;
>
>        sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>                        (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> @@ -1763,6 +1768,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
>        struct scan_control sc = {
>                .gfp_mask = GFP_KERNEL,
>                .may_unmap = 1,
> +               .may_swap = 1,
>                .swap_cluster_max = SWAP_CLUSTER_MAX,
>                .swappiness = vm_swappiness,
>                .order = order,
> @@ -2109,6 +2115,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>        struct scan_control sc = {
>                .gfp_mask = GFP_KERNEL,
>                .may_unmap = 0,
> +               .may_swap = 1,
>                .swap_cluster_max = nr_pages,
>                .may_writepage = 1,
>                .isolate_pages = isolate_pages_global,
> @@ -2289,6 +2296,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>        struct scan_control sc = {
>                .may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
>                .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
> +               .may_swap = 1,
>                .swap_cluster_max = max_t(unsigned long, nr_pages,
>                                        SWAP_CLUSTER_MAX),
>                .gfp_mask = gfp_mask,
>
>
>
>
>



-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-03-31  1:26                       ` Minchan Kim
@ 2009-03-31  1:42                         ` KAMEZAWA Hiroyuki
  2009-03-31  1:48                           ` KOSAKI Motohiro
  2009-03-31  1:52                         ` Daisuke Nishimura
  1 sibling, 1 reply; 43+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-31  1:42 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Daisuke Nishimura, linux-kernel, linux-mm,
	Johannes Weiner, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	Balbir Singh

On Tue, 31 Mar 2009 10:26:17 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 3be6157..00ea4a1 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -63,6 +63,9 @@ struct scan_control {
> >        /* Can mapped pages be reclaimed? */
> >        int may_unmap;
> >
> > +       /* Can pages be swapped as part of reclaim? */
> > +       int may_swap;
> > +
> 
> Sorry for too late response.
> I don't know memcg well.
> 
> The memcg managed to use may_swap well with global page reclaim until now.
> I think that was because may_swap can represent both meaning.
> Do we need each variables really ?
> 
> How about using union variable ?

or Just removing one of them  ?

Thanks,
-Kame

> ---
> 
> struct scan_control {
>   /* Incremented by the number of inactive pages that were scanned */
>   unsigned long nr_scanned;
> ...
>    union {
>     int may_swap; /* memcg: Cap pages be swapped as part of reclaim? */
>     int may_unmap /* global: Can mapped pages be reclaimed? */
>   };
> 
> 
> 
> >        /* This context's SWAP_CLUSTER_MAX. If freeing memory for
> >         * suspend, we effectively ignore SWAP_CLUSTER_MAX.
> >         * In this context, it doesn't matter that we scan the
> > @@ -1379,7 +1382,7 @@ static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
> >        struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> >
> >        /* If we have no swap space, do not bother scanning anon pages. */
> > -       if (nr_swap_pages <= 0) {
> > +       if (!sc->may_swap || (nr_swap_pages <= 0)) {
> >                percent[0] = 0;
> >                percent[1] = 100;
> >                return;
> > @@ -1695,6 +1698,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> >                .may_writepage = !laptop_mode,
> >                .swap_cluster_max = SWAP_CLUSTER_MAX,
> >                .may_unmap = 1,
> > +               .may_swap = 1,
> >                .swappiness = vm_swappiness,
> >                .order = order,
> >                .mem_cgroup = NULL,
> > @@ -1714,6 +1718,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> >        struct scan_control sc = {
> >                .may_writepage = !laptop_mode,
> >                .may_unmap = 1,
> > +               .may_swap = 1,
> >                .swap_cluster_max = SWAP_CLUSTER_MAX,
> >                .swappiness = swappiness,
> >                .order = 0,
> > @@ -1723,7 +1728,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> >        struct zonelist *zonelist;
> >
> >        if (noswap)
> > -               sc.may_unmap = 0;
> > +               sc.may_swap = 0;
> >
> >        sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> >                        (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> > @@ -1763,6 +1768,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
> >        struct scan_control sc = {
> >                .gfp_mask = GFP_KERNEL,
> >                .may_unmap = 1,
> > +               .may_swap = 1,
> >                .swap_cluster_max = SWAP_CLUSTER_MAX,
> >                .swappiness = vm_swappiness,
> >                .order = order,
> > @@ -2109,6 +2115,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> >        struct scan_control sc = {
> >                .gfp_mask = GFP_KERNEL,
> >                .may_unmap = 0,
> > +               .may_swap = 1,
> >                .swap_cluster_max = nr_pages,
> >                .may_writepage = 1,
> >                .isolate_pages = isolate_pages_global,
> > @@ -2289,6 +2296,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> >        struct scan_control sc = {
> >                .may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
> >                .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
> > +               .may_swap = 1,
> >                .swap_cluster_max = max_t(unsigned long, nr_pages,
> >                                        SWAP_CLUSTER_MAX),
> >                .gfp_mask = gfp_mask,
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> Kinds regards,
> Minchan Kim
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-03-31  1:42                         ` KAMEZAWA Hiroyuki
@ 2009-03-31  1:48                           ` KOSAKI Motohiro
  2009-04-01  4:09                             ` Johannes Weiner
  0 siblings, 1 reply; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-03-31  1:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Minchan Kim, Daisuke Nishimura, linux-kernel,
	linux-mm, Johannes Weiner, Andrew Morton, Rafael J. Wysocki,
	Rik van Riel, Balbir Singh

> > Sorry for too late response.
> > I don't know memcg well.
> > 
> > The memcg managed to use may_swap well with global page reclaim until now.
> > I think that was because may_swap can represent both meaning.
> > Do we need each variables really ?
> > 
> > How about using union variable ?
> 
> or Just removing one of them  ?

I hope all may_unmap user convert to using may_swap.
may_swap is more efficient and cleaner meaning.




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-03-31  1:26                       ` Minchan Kim
  2009-03-31  1:42                         ` KAMEZAWA Hiroyuki
@ 2009-03-31  1:52                         ` Daisuke Nishimura
  1 sibling, 0 replies; 43+ messages in thread
From: Daisuke Nishimura @ 2009-03-31  1:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: nishimura, KOSAKI Motohiro, KAMEZAWA Hiroyuki, linux-kernel,
	linux-mm, Johannes Weiner, Andrew Morton, Rafael J. Wysocki,
	Rik van Riel, Balbir Singh

Hi,

> > ========
> > Subject: vmswan: reintroduce sc->may_swap
> >
> > vmscan-rename-scmay_swap-to-may_unmap.patch removed may_swap flag,
> > but memcg had used it as a flag for "we need to use swap?", as the
> > name indicate.
> >
> > And in current implementation, memcg cannot reclaim mapped file caches
> > when mem+swap hits the limit.
> >
> > re-introduce may_swap flag and handle it at get_scan_ratio().
> > This patch doesn't influence any scan_control users other than memcg.
> >
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> > --
> >  mm/vmscan.c |   12 ++++++++++--
> >  1 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 3be6157..00ea4a1 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -63,6 +63,9 @@ struct scan_control {
> >        /* Can mapped pages be reclaimed? */
> >        int may_unmap;
> >
> > +       /* Can pages be swapped as part of reclaim? */
> > +       int may_swap;
> > +
> 
> Sorry for too late response.
> I don't know memcg well.
> 
> The memcg managed to use may_swap well with global page reclaim until now.
memcg had a bug that it cannot reclaim mapped file caches when it hit
the mem+swap limit :(


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-03-31  1:48                           ` KOSAKI Motohiro
@ 2009-04-01  4:09                             ` Johannes Weiner
  2009-04-01  5:08                               ` Daisuke Nishimura
  2009-04-01  9:04                               ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-04-01  4:09 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, Daisuke Nishimura, linux-kernel,
	linux-mm, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	Balbir Singh

On Tue, Mar 31, 2009 at 10:48:32AM +0900, KOSAKI Motohiro wrote:
> > > Sorry for too late response.
> > > I don't know memcg well.
> > > 
> > > The memcg managed to use may_swap well with global page reclaim until now.
> > > I think that was because may_swap can represent both meaning.
> > > Do we need each variables really ?
> > > 
> > > How about using union variable ?
> > 
> > or Just removing one of them  ?
> 
> I hope all may_unmap user convert to using may_swap.
> may_swap is more efficient and cleaner meaning.

How about making may_swap mean the following:

	@@ -642,6 +639,8 @@ static unsigned long shrink_page_list(st
	 		 * Try to allocate it some swap space here.
	 		 */
	 		if (PageAnon(page) && !PageSwapCache(page)) {
	+			if (!sc->map_swap)
	+				goto keep_locked;
	 			if (!(sc->gfp_mask & __GFP_IO))
	 				goto keep_locked;
	 			if (!add_to_swap(page))

try_to_free_pages() always sets it.

try_to_free_mem_cgroup_pages() sets it depending on whether it really
wants swapping, and only swapping, right?  But the above would still
reclaim already swapped anon pages and I don't know the memory
controller.

balance_pgdat() always sets it.

__zone_reclaim() sets it depending on zone_reclaim_mode.  The
RECLAIM_SWAP bit of this field and its documentation in
Documentation/sysctl/vm.txt suggests it also really only means swap.

shrink_all_memory() would be the sole user of may_unmap because it
really wants to eat cache first.  But this could be figured out on a
different occasion.

	Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-04-01  4:09                             ` Johannes Weiner
@ 2009-04-01  5:08                               ` Daisuke Nishimura
  2009-04-01  9:04                               ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 43+ messages in thread
From: Daisuke Nishimura @ 2009-04-01  5:08 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: nishimura, KOSAKI Motohiro, KAMEZAWA Hiroyuki, Minchan Kim,
	linux-kernel, linux-mm, Andrew Morton, Rafael J. Wysocki,
	Rik van Riel, Balbir Singh

On Wed, 1 Apr 2009 06:09:51 +0200, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Mar 31, 2009 at 10:48:32AM +0900, KOSAKI Motohiro wrote:
> > > > Sorry for too late response.
> > > > I don't know memcg well.
> > > > 
> > > > The memcg managed to use may_swap well with global page reclaim until now.
> > > > I think that was because may_swap can represent both meaning.
> > > > Do we need each variables really ?
> > > > 
> > > > How about using union variable ?
> > > 
> > > or Just removing one of them  ?
> > 
> > I hope all may_unmap user convert to using may_swap.
> > may_swap is more efficient and cleaner meaning.
> 
> How about making may_swap mean the following:
> 
> 	@@ -642,6 +639,8 @@ static unsigned long shrink_page_list(st
> 	 		 * Try to allocate it some swap space here.
> 	 		 */
> 	 		if (PageAnon(page) && !PageSwapCache(page)) {
> 	+			if (!sc->map_swap)
> 	+				goto keep_locked;
> 	 			if (!(sc->gfp_mask & __GFP_IO))
> 	 				goto keep_locked;
> 	 			if (!add_to_swap(page))
> 
but it doesn't work for shmem/tmpfs, does it?
So, I did in my first patch like:

@@ -616,6 +619,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (!sc->may_unmap && page_mapped(page))
 			goto keep_locked;
 
+		if (!sc->may_swap && PageSwapBacked(page)
+			/* SwapCache uses 'swap' already */
+			&& !PageSwapCache(page))
+			goto keep_locked;
+
 		/* Double the slab pressure for mapped and swapcache pages */
 		if (page_mapped(page) || PageSwapCache(page))
 			sc->nr_scanned++;

> try_to_free_pages() always sets it.
> 
> try_to_free_mem_cgroup_pages() sets it depending on whether it really
> wants swapping, and only swapping, right?
right.

> But the above would still reclaim already swapped anon pages
then, it would be better to add a check at shrink_page_list anyway..

Kosaki-san, what do you think?


Thanks,
Daisuke Nishimura.

> and I don't know the memory
> controller.
> 
> balance_pgdat() always sets it.
> 
> __zone_reclaim() sets it depending on zone_reclaim_mode.  The
> RECLAIM_SWAP bit of this field and its documentation in
> Documentation/sysctl/vm.txt suggests it also really only means swap.
> 
> shrink_all_memory() would be the sole user of may_unmap because it
> really wants to eat cache first.  But this could be figured out on a
> different occasion.
> 
> 	Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-04-01  4:09                             ` Johannes Weiner
  2009-04-01  5:08                               ` Daisuke Nishimura
@ 2009-04-01  9:04                               ` KAMEZAWA Hiroyuki
  2009-04-01  9:11                                 ` KOSAKI Motohiro
  2009-04-01  9:49                                 ` Johannes Weiner
  1 sibling, 2 replies; 43+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-04-01  9:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KOSAKI Motohiro, Minchan Kim, Daisuke Nishimura, linux-kernel,
	linux-mm, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	Balbir Singh

On Wed, 1 Apr 2009 06:09:51 +0200
Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Tue, Mar 31, 2009 at 10:48:32AM +0900, KOSAKI Motohiro wrote:
> > > > Sorry for too late response.
> > > > I don't know memcg well.
> > > > 
> > > > The memcg managed to use may_swap well with global page reclaim until now.
> > > > I think that was because may_swap can represent both meaning.
> > > > Do we need each variables really ?
> > > > 
> > > > How about using union variable ?
> > > 
> > > or Just removing one of them  ?
> > 
> > I hope all may_unmap user convert to using may_swap.
> > may_swap is more efficient and cleaner meaning.
> 
> How about making may_swap mean the following:
> 
> 	@@ -642,6 +639,8 @@ static unsigned long shrink_page_list(st
> 	 		 * Try to allocate it some swap space here.
> 	 		 */
> 	 		if (PageAnon(page) && !PageSwapCache(page)) {
> 	+			if (!sc->map_swap)
> 	+				goto keep_locked;
> 	 			if (!(sc->gfp_mask & __GFP_IO))
> 	 				goto keep_locked;
> 	 			if (!add_to_swap(page))
> 
> try_to_free_pages() always sets it.
> 
What is the advantage than _not_ scanning ANON LRU at all ?

> try_to_free_mem_cgroup_pages() sets it depending on whether it really
> wants swapping, and only swapping, right?  But the above would still
> reclaim already swapped anon pages and I don't know the memory
> controller.
> 
memory cgroup has 2 calls to this shrink_zone.
 1. memory usage hits the limit.
 2. mem+swap usage hits the limit.

At "2", swap-out doesn't decrease the usage of mem+swap, then set may_swap=0.
So, we want to kick out only file caches.
But, we can reclaim file cache and "unmap file cache and reclaim it!" is 
necessary even if may_swap=0.

Then, scanning only FILE LRU makes sense at may_swap=0 *if* memcg is
the only user of may_swap=0.

Let's see others.

 - __zone_reclaim sets may_unmap to be 0 when they don't want swap-out.
   .....can be replaced with may_swap.

 - shrink_all_memory sets may_swap to be 0. Is this called by hibernation ?
   If you don't want to unmap file caches while hibernation, adding may_unmap
   as *new* paramter makes sense, I think.

The change you proposed is for dropping unused SwapCache pages. Right ?
But this will be dropped by kswapd if necessary.

As far as memcg concerns, scanning ANON LRU even when may_swap=0 is just
a waste of cpu time.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-04-01  9:04                               ` KAMEZAWA Hiroyuki
@ 2009-04-01  9:11                                 ` KOSAKI Motohiro
  2009-04-01  9:49                                 ` Johannes Weiner
  1 sibling, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-04-01  9:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, Johannes Weiner, Minchan Kim, Daisuke Nishimura,
	linux-kernel, linux-mm, Andrew Morton, Rafael J. Wysocki,
	Rik van Riel, Balbir Singh

> memory cgroup has 2 calls to this shrink_zone.
>  1. memory usage hits the limit.
>  2. mem+swap usage hits the limit.
> 
> At "2", swap-out doesn't decrease the usage of mem+swap, then set may_swap=0.
> So, we want to kick out only file caches.
> But, we can reclaim file cache and "unmap file cache and reclaim it!" is 
> necessary even if may_swap=0.
> 
> Then, scanning only FILE LRU makes sense at may_swap=0 *if* memcg is
> the only user of may_swap=0.
> 
> Let's see others.
> 
>  - __zone_reclaim sets may_unmap to be 0 when they don't want swap-out.
>    .....can be replaced with may_swap.
> 
>  - shrink_all_memory sets may_swap to be 0. Is this called by hibernation ?
>    If you don't want to unmap file caches while hibernation, adding may_unmap
>    as *new* paramter makes sense, I think.
> 
> The change you proposed is for dropping unused SwapCache pages. Right ?
> But this will be dropped by kswapd if necessary.
> 
> As far as memcg concerns, scanning ANON LRU even when may_swap=0 is just
> a waste of cpu time.

this sentence just explain my intention.

1. memcg, zone_reclaim scanning ANON LRU is just waste of cpu.
2. kswapd and normal direct reclaim can reclaim stealed swapcache anyway.
   then above trick don't cause any system hang-up and performance degression.




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-04-01  9:04                               ` KAMEZAWA Hiroyuki
  2009-04-01  9:11                                 ` KOSAKI Motohiro
@ 2009-04-01  9:49                                 ` Johannes Weiner
  2009-04-01  9:55                                   ` KOSAKI Motohiro
  1 sibling, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2009-04-01  9:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, Minchan Kim, Daisuke Nishimura, linux-kernel,
	linux-mm, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	Balbir Singh

On Wed, Apr 01, 2009 at 06:04:45PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 1 Apr 2009 06:09:51 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > On Tue, Mar 31, 2009 at 10:48:32AM +0900, KOSAKI Motohiro wrote:
> > > > > Sorry for too late response.
> > > > > I don't know memcg well.
> > > > > 
> > > > > The memcg managed to use may_swap well with global page reclaim until now.
> > > > > I think that was because may_swap can represent both meaning.
> > > > > Do we need each variables really ?
> > > > > 
> > > > > How about using union variable ?
> > > > 
> > > > or Just removing one of them  ?
> > > 
> > > I hope all may_unmap user convert to using may_swap.
> > > may_swap is more efficient and cleaner meaning.
> > 
> > How about making may_swap mean the following:
> > 
> > 	@@ -642,6 +639,8 @@ static unsigned long shrink_page_list(st
> > 	 		 * Try to allocate it some swap space here.
> > 	 		 */
> > 	 		if (PageAnon(page) && !PageSwapCache(page)) {
> > 	+			if (!sc->map_swap)
> > 	+				goto keep_locked;
> > 	 			if (!(sc->gfp_mask & __GFP_IO))
> > 	 				goto keep_locked;
> > 	 			if (!add_to_swap(page))
> > 
> > try_to_free_pages() always sets it.
> > 
> What is the advantage than _not_ scanning ANON LRU at all ?

I thought we could collect anon pages that don't need swap io.

> > try_to_free_mem_cgroup_pages() sets it depending on whether it really
> > wants swapping, and only swapping, right?  But the above would still
> > reclaim already swapped anon pages and I don't know the memory
> > controller.
> > 
> memory cgroup has 2 calls to this shrink_zone.
>  1. memory usage hits the limit.
>  2. mem+swap usage hits the limit.
> 
> At "2", swap-out doesn't decrease the usage of mem+swap, then set may_swap=0.
> So, we want to kick out only file caches.
> But, we can reclaim file cache and "unmap file cache and reclaim it!" is 
> necessary even if may_swap=0.

Yes.

> Then, scanning only FILE LRU makes sense at may_swap=0 *if* memcg is
> the only user of may_swap=0.
> 
> Let's see others.
> 
>  - __zone_reclaim sets may_unmap to be 0 when they don't want swap-out.
>    .....can be replaced with may_swap.
> 
>  - shrink_all_memory sets may_swap to be 0. Is this called by hibernation ?
>    If you don't want to unmap file caches while hibernation, adding may_unmap
>    as *new* paramter makes sense, I think.

Yep, that was my idea too.  At least for now and then reevaluate
whether it shouldn't just reclaim in lru order without this flag...

> The change you proposed is for dropping unused SwapCache pages. Right ?
> But this will be dropped by kswapd if necessary.
> 
> As far as memcg concerns, scanning ANON LRU even when may_swap=0 is just
> a waste of cpu time.

Okay.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-04-01  9:49                                 ` Johannes Weiner
@ 2009-04-01  9:55                                   ` KOSAKI Motohiro
  2009-04-01 16:03                                     ` Johannes Weiner
  0 siblings, 1 reply; 43+ messages in thread
From: KOSAKI Motohiro @ 2009-04-01  9:55 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, Minchan Kim,
	Daisuke Nishimura, linux-kernel, linux-mm, Andrew Morton,
	Rafael J. Wysocki, Rik van Riel, Balbir Singh

> > > How about making may_swap mean the following:
> > > 
> > > 	@@ -642,6 +639,8 @@ static unsigned long shrink_page_list(st
> > > 	 		 * Try to allocate it some swap space here.
> > > 	 		 */
> > > 	 		if (PageAnon(page) && !PageSwapCache(page)) {
> > > 	+			if (!sc->map_swap)
> > > 	+				goto keep_locked;
> > > 	 			if (!(sc->gfp_mask & __GFP_IO))
> > > 	 				goto keep_locked;
> > > 	 			if (!add_to_swap(page))
> > > 
> > > try_to_free_pages() always sets it.
> > > 
> > What is the advantage than _not_ scanning ANON LRU at all ?
> 
> I thought we could collect anon pages that don't need swap io.

Yes. but Is this important?
if memcg reclaim don't collect sleal swapcache, other global reclaim can.

Am I missing any viewpoint?




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename  sc.may_swap to may_unmap)
  2009-04-01  9:55                                   ` KOSAKI Motohiro
@ 2009-04-01 16:03                                     ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2009-04-01 16:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, Daisuke Nishimura, linux-kernel,
	linux-mm, Andrew Morton, Rafael J. Wysocki, Rik van Riel,
	Balbir Singh

On Wed, Apr 01, 2009 at 06:55:45PM +0900, KOSAKI Motohiro wrote:
> > > > How about making may_swap mean the following:
> > > > 
> > > > 	@@ -642,6 +639,8 @@ static unsigned long shrink_page_list(st
> > > > 	 		 * Try to allocate it some swap space here.
> > > > 	 		 */
> > > > 	 		if (PageAnon(page) && !PageSwapCache(page)) {
> > > > 	+			if (!sc->map_swap)
> > > > 	+				goto keep_locked;
> > > > 	 			if (!(sc->gfp_mask & __GFP_IO))
> > > > 	 				goto keep_locked;
> > > > 	 			if (!add_to_swap(page))
> > > > 
> > > > try_to_free_pages() always sets it.
> > > > 
> > > What is the advantage than _not_ scanning ANON LRU at all ?
> > 
> > I thought we could collect anon pages that don't need swap io.
> 
> Yes. but Is this important?
> if memcg reclaim don't collect sleal swapcache, other global reclaim can.
> 
> Am I missing any viewpoint?

Nothing I am aware of, it should work as you suggest.  I just wasn't
sure about the memory controller.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2009-04-01 16:05 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-06  3:11 [PATCH 0/3] [PATCH 0/3] swsusp: shrink file cache first Johannes Weiner
2009-02-06  3:11 ` [PATCH 1/3] swsusp: clean up shrink_all_zones() Johannes Weiner
2009-02-06  3:20   ` KOSAKI Motohiro
2009-02-06  3:11 ` [PATCH 2/3] swsusp: dont fiddle with swappiness Johannes Weiner
2009-02-06  3:21   ` KOSAKI Motohiro
2009-02-06  3:11 ` [PATCH 3/3][RFC] swsusp: shrink file cache first Johannes Weiner
2009-02-06  3:39   ` KOSAKI Motohiro
2009-02-06  4:49     ` Johannes Weiner
2009-02-06  5:59       ` KOSAKI Motohiro
2009-02-06 12:24         ` Johannes Weiner
2009-02-06 13:35           ` MinChan Kim
2009-02-06 17:15             ` MinChan Kim
2009-02-06 23:37               ` Johannes Weiner
2009-02-09 19:43               ` [patch] vmscan: rename sc.may_swap to may_unmap Johannes Weiner
2009-02-09 23:02                 ` MinChan Kim
2009-02-10 10:00                   ` KOSAKI Motohiro
2009-03-27  6:19                 ` [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap) Daisuke Nishimura
2009-03-27  6:30                   ` KAMEZAWA Hiroyuki
2009-03-29 23:45                     ` KOSAKI Motohiro
2009-03-31  0:18                       ` Daisuke Nishimura
2009-03-31  1:26                       ` Minchan Kim
2009-03-31  1:42                         ` KAMEZAWA Hiroyuki
2009-03-31  1:48                           ` KOSAKI Motohiro
2009-04-01  4:09                             ` Johannes Weiner
2009-04-01  5:08                               ` Daisuke Nishimura
2009-04-01  9:04                               ` KAMEZAWA Hiroyuki
2009-04-01  9:11                                 ` KOSAKI Motohiro
2009-04-01  9:49                                 ` Johannes Weiner
2009-04-01  9:55                                   ` KOSAKI Motohiro
2009-04-01 16:03                                     ` Johannes Weiner
2009-03-31  1:52                         ` Daisuke Nishimura
2009-02-06 21:00       ` [PATCH 3/3][RFC] swsusp: shrink file cache first Andrew Morton
2009-02-06 23:27         ` Johannes Weiner
2009-02-07 17:23           ` Rafael J. Wysocki
2009-02-08 20:56             ` Johannes Weiner
2009-02-07  4:41         ` Nigel Cunningham
2009-02-07 16:51         ` KOSAKI Motohiro
2009-02-07 21:20           ` Nigel Cunningham
2009-02-27 13:27         ` Pavel Machek
2009-03-01 10:37           ` KOSAKI Motohiro
2009-02-06  8:03   ` MinChan Kim
2009-02-06 10:06     ` MinChan Kim
2009-02-06 11:50       ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).