[PATCH 0/3] make mapped executable pages the first class citizen (with test cases)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] make mapped executable pages the first class citizen (with test cases)
@ 2009-06-08  9:10 Wu Fengguang
  2009-06-08  9:10 ` [PATCH 1/3] vmscan: report vm_flags in page_referenced() Wu Fengguang
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-06-08  9:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Wu, Fengguang, Andi Kleen, Christoph Lameter,
	Elladan, Nick Piggin, Johannes Weiner, Peter Zijlstra,
	Rik van Riel, tytso, linux-mm, minchan.kim

Andrew,

I managed to back this patchset with two test cases :)

They demonstrated that
- X desktop responsiveness can be *doubled* under high memory/swap pressure
- it can almost stop major faults when the active file list is slowly scanned
  because of undergoing partially cache hot streaming IO

The details are included in the changelog.

Thanks,
Fengguang
-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] vmscan: report vm_flags in page_referenced()
  2009-06-08  9:10 [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Wu Fengguang
@ 2009-06-08  9:10 ` Wu Fengguang
  2009-06-08  9:10 ` [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Wu Fengguang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-06-08  9:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Peter Zijlstra, Wu Fengguang, Andi Kleen,
	Christoph Lameter, Elladan, Nick Piggin, Johannes Weiner,
	Rik van Riel, tytso, linux-mm, minchan.kim

[-- Attachment #1: mm-vmscan-report-vm_flags-in-page_referenced.patch --]
[-- Type: text/plain, Size: 6565 bytes --]

Collect vma->vm_flags of the VMAs that actually referenced the page.

This is preparing for more informed reclaim heuristics,
eg. to protect executable file pages more aggressively.
For now only the VM_EXEC bit will be used by the caller.

Thanks to Johannes, Peter and Minchan for all the good tips.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/rmap.h |    5 +++--
 mm/rmap.c            |   37 ++++++++++++++++++++++++++-----------
 mm/vmscan.c          |    7 +++++--
 3 files changed, 34 insertions(+), 15 deletions(-)

--- linux.orig/include/linux/rmap.h
+++ linux/include/linux/rmap.h
@@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct 
 /*
  * Called from mm/vmscan.c to handle paging out
  */
-int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
+int page_referenced(struct page *, int is_locked,
+			struct mem_cgroup *cnt, unsigned long *vm_flags);
 int try_to_unmap(struct page *, int ignore_refs);
 
 /*
@@ -121,7 +122,7 @@ int page_wrprotect(struct page *page, in
 #define anon_vma_prepare(vma)	(0)
 #define anon_vma_link(vma)	do {} while (0)
 
-#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
+#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
 #define try_to_unmap(page, refs) SWAP_FAIL
 
 static inline int page_mkclean(struct page *page)
--- linux.orig/mm/rmap.c
+++ linux/mm/rmap.c
@@ -333,7 +333,9 @@ static int page_mapped_in_vma(struct pag
  * repeatedly from either page_referenced_anon or page_referenced_file.
  */
 static int page_referenced_one(struct page *page,
-	struct vm_area_struct *vma, unsigned int *mapcount)
+			       struct vm_area_struct *vma,
+			       unsigned int *mapcount,
+			       unsigned long *vm_flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -381,11 +383,14 @@ out_unmap:
 	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
 out:
+	if (referenced)
+		*vm_flags |= vma->vm_flags;
 	return referenced;
 }
 
 static int page_referenced_anon(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
@@ -405,7 +410,8 @@ static int page_referenced_anon(struct p
 		 */
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
-		referenced += page_referenced_one(page, vma, &mapcount);
+		referenced += page_referenced_one(page, vma,
+						  &mapcount, vm_flags);
 		if (!mapcount)
 			break;
 	}
@@ -418,6 +424,7 @@ static int page_referenced_anon(struct p
  * page_referenced_file - referenced check for object-based rmap
  * @page: the page we're checking references on.
  * @mem_cont: target memory controller
+ * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
  *
  * For an object-based mapped page, find all the places it is mapped and
  * check/clear the referenced flag.  This is done by following the page->mapping
@@ -427,7 +434,8 @@ static int page_referenced_anon(struct p
  * This function is only called from page_referenced for object-based pages.
  */
 static int page_referenced_file(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
@@ -467,7 +475,8 @@ static int page_referenced_file(struct p
 		 */
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
-		referenced += page_referenced_one(page, vma, &mapcount);
+		referenced += page_referenced_one(page, vma,
+						  &mapcount, vm_flags);
 		if (!mapcount)
 			break;
 	}
@@ -481,29 +490,35 @@ static int page_referenced_file(struct p
  * @page: the page to test
  * @is_locked: caller holds lock on the page
  * @mem_cont: target memory controller
+ * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
  */
-int page_referenced(struct page *page, int is_locked,
-			struct mem_cgroup *mem_cont)
+int page_referenced(struct page *page,
+		    int is_locked,
+		    struct mem_cgroup *mem_cont,
+		    unsigned long *vm_flags)
 {
 	int referenced = 0;
 
 	if (TestClearPageReferenced(page))
 		referenced++;
 
+	*vm_flags = 0;
 	if (page_mapped(page) && page->mapping) {
 		if (PageAnon(page))
-			referenced += page_referenced_anon(page, mem_cont);
+			referenced += page_referenced_anon(page, mem_cont,
+								vm_flags);
 		else if (is_locked)
-			referenced += page_referenced_file(page, mem_cont);
+			referenced += page_referenced_file(page, mem_cont,
+								vm_flags);
 		else if (!trylock_page(page))
 			referenced++;
 		else {
 			if (page->mapping)
-				referenced +=
-					page_referenced_file(page, mem_cont);
+				referenced += page_referenced_file(page,
+							mem_cont, vm_flags);
 			unlock_page(page);
 		}
 	}
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -584,6 +584,7 @@ static unsigned long shrink_page_list(st
 	struct pagevec freed_pvec;
 	int pgactivate = 0;
 	unsigned long nr_reclaimed = 0;
+	unsigned long vm_flags;
 
 	cond_resched();
 
@@ -634,7 +635,8 @@ static unsigned long shrink_page_list(st
 				goto keep_locked;
 		}
 
-		referenced = page_referenced(page, 1, sc->mem_cgroup);
+		referenced = page_referenced(page, 1,
+						sc->mem_cgroup, &vm_flags);
 		/* In active use or really unfreeable?  Activate it. */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
 					referenced && page_mapping_inuse(page))
@@ -1215,6 +1217,7 @@ static void shrink_active_list(unsigned 
 {
 	unsigned long pgmoved;
 	unsigned long pgscanned;
+	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_inactive);
 	struct page *page;
@@ -1255,7 +1258,7 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
 			pgmoved++;
 
 		list_add(&page->lru, &l_inactive);

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/3] vmscan: make mapped executable pages the first class citizen
  2009-06-08  9:10 [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Wu Fengguang
  2009-06-08  9:10 ` [PATCH 1/3] vmscan: report vm_flags in page_referenced() Wu Fengguang
@ 2009-06-08  9:10 ` Wu Fengguang
  2009-06-08 15:34   ` Christoph Lameter
  2009-06-08  9:10 ` [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list() Wu Fengguang
  2009-07-10  7:24 ` [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Nai Xia
  3 siblings, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2009-06-08  9:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Elladan, Nick Piggin, Andi Kleen,
	Christoph Lameter, Rik van Riel, Peter Zijlstra, Wu Fengguang,
	Johannes Weiner, tytso, linux-mm, minchan.kim

[-- Attachment #1: mm-vmscan-protect-exec-referenced.patch --]
[-- Type: text/plain, Size: 19158 bytes --]

Protect referenced PROT_EXEC mapped pages from being deactivated.

PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
currently running executables and their linked libraries, they shall really be
cached aggressively to provide good user experiences.

Thanks to Johannes Weiner for the advice to reuse the VMA walk in
page_referenced() to get the PROT_EXEC bit.


[more details]

( The consequences of this patch will have to be discussed together with
  Rik van Riel's recent patch "vmscan: evict use-once pages first". )

( Some of the good points and insights are taken into this changelog.
  Thanks to all the involved people for the great LKML discussions. )

the problem
-----------

For a typical desktop, the most precious working set is composed of
*actively accessed*
	(1) memory mapped executables
	(2) and their anonymous pages
	(3) and other files
	(4) and the dcache/icache/.. slabs
while the least important data are
	(5) infrequently used or use-once files

For a typical desktop, one major problem is busty and large amount of (5)
use-once files flushing out the working set.

Inside the working set, (4) dcache/icache have already been too sticky ;-)
So we only have to care (2) anonymous and (1)(3) file pages.

anonymous pages
---------------
Anonymous pages are effectively immune to the streaming IO attack, because we
now have separate file/anon LRU lists. When the use-once files crowd into the
file LRU, the list's "quality" is significantly lowered. Therefore the scan
balance policy in get_scan_ratio() will choose to scan the (low quality) file
LRU much more frequently than the anon LRU.

file pages
----------
Rik proposed to *not* scan the active file LRU when the inactive list grows
larger than active list. This guarantees that when there are use-once streaming
IO, and the working set is not too large(so that active_size < inactive_size),
the active file LRU will *not* be scanned at all. So the not-too-large working
set can be well protected.

But there are also situations where the file working set is a bit large so that
(active_size >= inactive_size), or the streaming IOs are not purely use-once.
In these cases, the active list will be scanned slowly. Because the current
shrink_active_list() policy is to deactivate active pages regardless of their
referenced bits. The deactivated pages become susceptible to the streaming IO
attack: the inactive list could be scanned fast (500MB / 50MBps = 10s) so that
the deactivated pages don't have enough time to get re-referenced. Because a
user tend to switch between windows in intervals from seconds to minutes.

This patch holds mapped executable pages in the active list as long as they
are referenced during each full scan of the active list.  Because the active
list is normally scanned much slower, they get longer grace time (eg. 100s)
for further references, which better matches the pace of user operations.

Therefore this patch greatly prolongs the in-cache time of executable code,
when there are moderate memory pressures.

	before patch: guaranteed to be cached if reference intervals < I
	after  patch: guaranteed to be cached if reference intervals < I+A
		      (except when randomly reclaimed by the lumpy reclaim)
where
	A = time to fully scan the   active file LRU
	I = time to fully scan the inactive file LRU

Note that normally A >> I.

side effects
------------

This patch is safe in general, it restores the pre-2.6.28 mmap() behavior
but in a much smaller and well targeted scope.

One may worry about some one to abuse the PROT_EXEC heuristic.  But as
Andrew Morton stated, there are other tricks to getting that sort of boost.

Another concern is the PROT_EXEC mapped pages growing large in rare cases,
and therefore hurting reclaim efficiency. But a sane application targeted for
large audience will never use PROT_EXEC for data mappings. If some home made
application tries to abuse that bit, it shall be aware of the consequences.
If it is abused to scale of 2/3 total memory, it gains nothing but overheads.

benchmarks
----------

1) memory tight desktop

1.1) brief summary

- clock time and major faults are reduced by 50%;
- pswpin numbers are reduced to ~1/3.

That means X desktop responsiveness is doubled under high memory/swap pressure.

1.2) test scenario

- nfsroot gnome desktop with 512M physical memory
- run some programs, and switch between the existing windows
  after starting each new program.

1.3) progress timing (seconds)

  before       after    programs
    0.02        0.02    N xeyes
    0.75        0.76    N firefox
    2.02        1.88    N nautilus
    3.36        3.17    N nautilus --browser
    5.26        4.89    N gthumb
    7.12        6.47    N gedit
    9.22        8.16    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
   13.58       12.55    N xterm
   15.87       14.57    N mlterm
   18.63       17.06    N gnome-terminal
   21.16       18.90    N urxvt
   26.24       23.48    N gnome-system-monitor
   28.72       26.52    N gnome-help
   32.15       29.65    N gnome-dictionary
   39.66       36.12    N /usr/games/sol
   43.16       39.27    N /usr/games/gnometris
   48.65       42.56    N /usr/games/gnect
   53.31       47.03    N /usr/games/gtali
   58.60       52.05    N /usr/games/iagno
   65.77       55.42    N /usr/games/gnotravex
   70.76       61.47    N /usr/games/mahjongg
   76.15       67.11    N /usr/games/gnome-sudoku
   86.32       75.15    N /usr/games/glines
   92.21       79.70    N /usr/games/glchess
  103.79       88.48    N /usr/games/gnomine
  113.84       96.51    N /usr/games/gnotski
  124.40      102.19    N /usr/games/gnibbles
  137.41      114.93    N /usr/games/gnobots2
  155.53      125.02    N /usr/games/blackjack
  179.85      135.11    N /usr/games/same-gnome
  224.49      154.50    N /usr/bin/gnome-window-properties
  248.44      162.09    N /usr/bin/gnome-default-applications-properties
  282.62      173.29    N /usr/bin/gnome-at-properties
  323.72      188.21    N /usr/bin/gnome-typing-monitor
  363.99      199.93    N /usr/bin/gnome-at-visual
  394.21      206.95    N /usr/bin/gnome-sound-properties
  435.14      224.49    N /usr/bin/gnome-at-mobility
  463.05      234.11    N /usr/bin/gnome-keybinding-properties
  503.75      248.59    N /usr/bin/gnome-about-me
  554.00      276.27    N /usr/bin/gnome-display-properties
  615.48      304.39    N /usr/bin/gnome-network-preferences
  693.03      342.01    N /usr/bin/gnome-mouse-properties
  759.90      388.58    N /usr/bin/gnome-appearance-properties
  937.90      508.47    N /usr/bin/gnome-control-center
 1109.75      587.57    N /usr/bin/gnome-keyboard-properties
 1399.05      758.16    N : oocalc
 1524.64      830.03    N : oodraw
 1684.31      900.03    N : ooimpress
 1874.04      993.91    N : oomath
 2115.12     1081.89    N : ooweb
 2369.02     1161.99    N : oowriter

Note that the last ": oo*" commands are actually commented out.

1.4) vmstat numbers (some relevant ones are marked with *)

                            before    after
 nr_free_pages              1293      3898
 nr_inactive_anon           59956     53460
 nr_active_anon             26815     30026
 nr_inactive_file           2657      3218
 nr_active_file             2019      2806
 nr_unevictable             4         4
 nr_mlock                   4         4
 nr_anon_pages              26706     27859
*nr_mapped                  3542      4469
 nr_file_pages              72232     67681
 nr_dirty                   1         0
 nr_writeback               123       19
 nr_slab_reclaimable        3375      3534
 nr_slab_unreclaimable      11405     10665
 nr_page_table_pages        8106      7864
 nr_unstable                0         0
 nr_bounce                  0         0
*nr_vmscan_write            394776    230839
 nr_writeback_temp          0         0
 numa_hit                   6843353   3318676
 numa_miss                  0         0
 numa_foreign               0         0
 numa_interleave            1719      1719
 numa_local                 6843353   3318676
 numa_other                 0         0
*pgpgin                     5954683   2057175
*pgpgout                    1578276   922744
*pswpin                     1486615   512238
*pswpout                    394568    230685
 pgalloc_dma                277432    56602
 pgalloc_dma32              6769477   3310348
 pgalloc_normal             0         0
 pgalloc_movable            0         0
 pgfree                     7048396   3371118
 pgactivate                 2036343   1471492
 pgdeactivate               2189691   1612829
 pgfault                    3702176   3100702
*pgmajfault                 452116    201343
 pgrefill_dma               12185     7127
 pgrefill_dma32             334384    653703
 pgrefill_normal            0         0
 pgrefill_movable           0         0
 pgsteal_dma                74214     22179
 pgsteal_dma32              3334164   1638029
 pgsteal_normal             0         0
 pgsteal_movable            0         0
 pgscan_kswapd_dma          1081421   1216199
 pgscan_kswapd_dma32        58979118  46002810
 pgscan_kswapd_normal       0         0
 pgscan_kswapd_movable      0         0
 pgscan_direct_dma          2015438   1086109
 pgscan_direct_dma32        55787823  36101597
 pgscan_direct_normal       0         0
 pgscan_direct_movable      0         0
 pginodesteal               3461      7281
 slabs_scanned              564864    527616
 kswapd_steal               2889797   1448082
 kswapd_inodesteal          14827     14835
 pageoutrun                 43459     21562
 allocstall                 9653      4032
 pgrotated                  384216    228631

1.5) free numbers at the end of the tests

before patch:
                             total       used       free     shared    buffers     cached
                Mem:           474        467          7          0          0        236
                -/+ buffers/cache:        230        243
                Swap:         1023        418        605

after patch:
                             total       used       free     shared    buffers     cached
                Mem:           474        457         16          0          0        236
                -/+ buffers/cache:        221        253
                Swap:         1023        404        619


2) memory flushing in a file server

2.1) brief summary

The number of major faults from 50 to 3 during 10% cache hot reads.

That means this patch successfully stops major faults when the active file list
is slowly scanned when there are partially cache hot streaming IO.

2.2) test scenario

Do 100000 pread(size=110 pages, offset=(i*100) pages), where 10% of the pages
will be activated:

        for i in `seq 0 100 10000000`; do echo $i 110;  done > pattern-hot-10
        iotrace.rb --load pattern-hot-10 --play /b/sparse
	vmmon  nr_mapped nr_active_file nr_inactive_file   pgmajfault pgdeactivate pgfree

and monitor /proc/vmstat during the time. The test box has 2G memory.

I carried out tests on fresh booted console as well as X desktop,
and fetched the vmstat numbers on

(1) begin:     shortly after the big read IO starts;
(2) end:       just before the big read IO stops;
(3) restore:   the big read IO stops and the zsh working set restored
(4) restore X: after IO, switch back and forth between the urxvt and firefox
               windows to restore their working set.

2.3) console mode results

        nr_mapped   nr_active_file nr_inactive_file       pgmajfault     pgdeactivate           pgfree

2.6.29 VM_EXEC protection ON:
begin:       2481             2237             8694              630                0           574299
end:          275           231976           233914              633           776271         20933042
restore:      370           232154           234524              691           777183         20958453

2.6.29 VM_EXEC protection ON (second run):
begin:       2434             2237             8493              629                0           574195
end:          284           231970           233536              632           771918         20896129
restore:      399           232218           234789              690           774526         20957909

2.6.30-rc4-mm VM_EXEC protection OFF:
begin:       2479             2344             9659              210                0           579643
end:          284           232010           234142              260           772776         20917184
restore:      379           232159           234371              301           774888         20967849

The above console numbers show that

- The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29.
  I'd attribute that improvement to the mmap readahead improvements :-)

- The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50.
  That's a huge improvement - which means with the VM_EXEC protection logic,
  active mmap pages is pretty safe even under partially cache hot streaming IO.

- when active:inactive file lru size reaches 1:1, their scan rates is 1:20.8
  under 10% cache hot IO. (computed with formula Dpgdeactivate:Dpgfree)
  That roughly means the active mmap pages get 20.8 more chances to get
  re-referenced to stay in memory.

- The absolute nr_mapped drops considerably to 1/9 during the big IO, and the
  dropped pages are mostly inactive ones. The patch has almost no impact in
  this aspect, that means it won't unnecessarily increase memory pressure.
  (In contrast, your 20% mmap protection ratio will keep them all, and
  therefore eliminate the extra 41 major faults to restore working set
  of zsh etc.)

The iotrace.rb read throughput is
	151.194384MB/s 284.198252s 100001x 450560b --load pattern-hot-10 --play /b/sparse
which means the inactive list is rotated at the speed of 250MB/s,
so a full scan of which takes about 3.5 seconds, while a full scan
of active file list takes about 77 seconds.

2.4) X mode results

We can reach roughly the same conclusions for X desktop:

        nr_mapped   nr_active_file nr_inactive_file       pgmajfault     pgdeactivate           pgfree

2.6.30-rc4-mm VM_EXEC protection ON:
begin:       9740             8920            64075              561                0           678360
end:          768           218254           220029              565           798953         21057006
restore:      857           218543           220987              606           799462         21075710
restore X:   2414           218560           225344              797           799462         21080795

2.6.30-rc4-mm VM_EXEC protection OFF:
begin:       9368             5035            26389              554                0           633391
end:          770           218449           221230              661           646472         17832500
restore:     1113           218466           220978              710           649881         17905235
restore X:   2687           218650           225484              947           802700         21083584

- the absolute nr_mapped drops considerably (to 1/13 of the original size)
  during the streaming IO.
- the delta of pgmajfault is 3 vs 107 during IO, or 236 vs 393
  during the whole process.


CC: Elladan <elladan@eskimo.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   52 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 45 insertions(+), 7 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1219,6 +1219,7 @@ static void shrink_active_list(unsigned 
 	unsigned long pgscanned;
 	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1258,28 +1259,42 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
 			pgmoved++;
+			/*
+			 * Identify referenced, file-backed active pages and
+			 * give them one more trip around the active list. So
+			 * that executable code get better chances to stay in
+			 * memory under moderate memory pressure.  Anon pages
+			 * are not likely to be evicted by use-once streaming
+			 * IO, plus JVM can create lots of anon VM_EXEC pages,
+			 * so we ignore them here.
+			 */
+			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
 
 	/*
-	 * Move the pages to the [file or anon] inactive list.
+	 * Move pages back to the lru list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
-	 * Count referenced pages from currently used mappings as
-	 * rotated, even though they are moved to the inactive list.
-	 * This helps balance scan pressure between file and anonymous
-	 * pages in get_scan_ratio.
+	 * Count referenced pages from currently used mappings as rotated,
+	 * even though only some of them are actually re-activated.  This
+	 * helps balance scan pressure between file and anonymous pages in
+	 * get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;  /* count pages moved to inactive list */
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1302,6 +1317,29 @@ static void shrink_active_list(unsigned 
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	pgmoved = 0;  /* count pages moved back to active list */
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)
 		pagevec_strip(&pvec);

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-06-08  9:10 [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Wu Fengguang
  2009-06-08  9:10 ` [PATCH 1/3] vmscan: report vm_flags in page_referenced() Wu Fengguang
  2009-06-08  9:10 ` [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Wu Fengguang
@ 2009-06-08  9:10 ` Wu Fengguang
  2009-07-10  7:24 ` [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Nai Xia
  3 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-06-08  9:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Pekka Enberg, Peter Zijlstra, Wu Fengguang,
	Andi Kleen, Christoph Lameter, Elladan, Nick Piggin,
	Johannes Weiner, Rik van Riel, tytso, linux-mm, minchan.kim

[-- Attachment #1: mm-vmscan-reduce-code.patch --]
[-- Type: text/plain, Size: 5649 bytes --]

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active file list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

However that's fine according to Johannes' comments:

  I don't think that this could be harmful.  We just preserve the buffer
  mappings of what we consider the working set and with low memory
  pressure, as you say, this set is not big.

  As to stripping of reactivated pages: the only pages we re-activate
  for now are those VM_EXEC mapped ones.  Since we don't expect IO from
  or to these pages, removing the buffer mappings in case they grow too
  large should be okay, I guess.

CC: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   95 ++++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1211,6 +1211,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (!is_active_lru(lru))
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (!is_active_lru(lru))
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1222,8 +1259,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1240,6 +1275,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1282,8 +1318,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
@@ -1293,57 +1327,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen
  2009-06-08  9:10 ` [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Wu Fengguang
@ 2009-06-08 15:34   ` Christoph Lameter
  2009-06-08 17:30     ` Nai Xia
  2009-06-09  3:28     ` Wu Fengguang
  0 siblings, 2 replies; 30+ messages in thread
From: Christoph Lameter @ 2009-06-08 15:34 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, KOSAKI Motohiro, Elladan, Nick Piggin, Andi Kleen,
	Rik van Riel, Peter Zijlstra, Johannes Weiner, tytso, linux-mm,
	minchan.kim

On Mon, 8 Jun 2009, Wu Fengguang wrote:

> 1.2) test scenario
>
> - nfsroot gnome desktop with 512M physical memory
> - run some programs, and switch between the existing windows
>   after starting each new program.

Is there a predefined sequence or does this vary between tests? Scripted?

What percentage of time is saved in the test after due to the
modifications?
Around 20%?

> (1) begin:     shortly after the big read IO starts;
> (2) end:       just before the big read IO stops;
> (3) restore:   the big read IO stops and the zsh working set restored
> (4) restore X: after IO, switch back and forth between the urxvt and firefox
>                windows to restore their working set.

Any action done on the firefox sessions? Or just switch to a firefox
session that needs to redraw?

> The above console numbers show that
>
> - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29.
>   I'd attribute that improvement to the mmap readahead improvements :-)

So there are other effects,,, You not measuring the effect only this
patchset?

> - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50.
>   That's a huge improvement - which means with the VM_EXEC protection logic,
>   active mmap pages is pretty safe even under partially cache hot streaming IO.

Looks good.

> - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the
>   dropped pages are mostly inactive ones. The patch has almost no impact in
>   this aspect, that means it won't unnecessarily increase memory pressure.
>   (In contrast, your 20% mmap protection ratio will keep them all, and
>   therefore eliminate the extra 41 major faults to restore working set
>   of zsh etc.)

Good.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen
  2009-06-08 15:34   ` Christoph Lameter
@ 2009-06-08 17:30     ` Nai Xia
  2009-06-09  3:28     ` Wu Fengguang
  1 sibling, 0 replies; 30+ messages in thread
From: Nai Xia @ 2009-06-08 17:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Wu Fengguang, Andrew Morton, KOSAKI Motohiro, Elladan,
	Nick Piggin, Andi Kleen, Rik van Riel, Peter Zijlstra,
	Johannes Weiner, tytso, linux-mm, minchan.kim

On Mon, Jun 8, 2009 at 11:34 PM, Christoph
Lameter<cl@linux-foundation.org> wrote:
> On Mon, 8 Jun 2009, Wu Fengguang wrote:
>
>> 1.2) test scenario
>>
>> - nfsroot gnome desktop with 512M physical memory
>> - run some programs, and switch between the existing windows
>>   after starting each new program.
>
> Is there a predefined sequence or does this vary between tests? Scripted?
>
> What percentage of time is saved in the test after due to the
> modifications?
> Around 20%?

I think measuring the percentage of saved time may not be a good idea.
The major underlying  factor for time of swithing GUI windows may vary
application to application, distribution to distribution and machine to
machine. It's not reproducable.
I am having a ridiculous timing for swithing from any window to window
of slickedit, because of its damn slow redrawing method.
I bet this patch will gain at most 1% on timing for this case. :)

>
>> (1) begin:     shortly after the big read IO starts;
>> (2) end:       just before the big read IO stops;
>> (3) restore:   the big read IO stops and the zsh working set restored
>> (4) restore X: after IO, switch back and forth between the urxvt and firefox
>>                windows to restore their working set.
>
> Any action done on the firefox sessions? Or just switch to a firefox
> session that needs to redraw?
>
>> The above console numbers show that
>>
>> - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29.
>>   I'd attribute that improvement to the mmap readahead improvements :-)
>
> So there are other effects,,, You not measuring the effect only this
> patchset?
>
>> - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50.
>>   That's a huge improvement - which means with the VM_EXEC protection logic,
>>   active mmap pages is pretty safe even under partially cache hot streaming IO.
>
> Looks good.
>
>> - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the
>>   dropped pages are mostly inactive ones. The patch has almost no impact in
>>   this aspect, that means it won't unnecessarily increase memory pressure.
>>   (In contrast, your 20% mmap protection ratio will keep them all, and
>>   therefore eliminate the extra 41 major faults to restore working set
>>   of zsh etc.)
>
> Good.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen
  2009-06-08 15:34   ` Christoph Lameter
  2009-06-08 17:30     ` Nai Xia
@ 2009-06-09  3:28     ` Wu Fengguang
  1 sibling, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-06-09  3:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, KOSAKI Motohiro, Elladan, Nick Piggin, Andi Kleen,
	Rik van Riel, Peter Zijlstra, Johannes Weiner, tytso, linux-mm,
	minchan.kim

[-- Attachment #1: Type: text/plain, Size: 4800 bytes --]

On Mon, Jun 08, 2009 at 11:34:06PM +0800, Christoph Lameter wrote:
> On Mon, 8 Jun 2009, Wu Fengguang wrote:
>
> > 1.2) test scenario
> >
> > - nfsroot gnome desktop with 512M physical memory
> > - run some programs, and switch between the existing windows
> >   after starting each new program.
>
> Is there a predefined sequence or does this vary between tests? Scripted?

Yes it's scripted testing and has a predefined sequence.
The scripts are attached for your reference.

> What percentage of time is saved in the test after due to the
> modifications?
> Around 20%?

It's 50%, hehe. I've posted the startup times for each program:

  before       after    programs
    0.02        0.02    N xeyes
    0.75        0.76    N firefox
    2.02        1.88    N nautilus
    3.36        3.17    N nautilus --browser
    5.26        4.89    N gthumb
    7.12        6.47    N gedit
    9.22        8.16    N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf
   13.58       12.55    N xterm
   15.87       14.57    N mlterm
   18.63       17.06    N gnome-terminal
   21.16       18.90    N urxvt
   26.24       23.48    N gnome-system-monitor
   28.72       26.52    N gnome-help
   32.15       29.65    N gnome-dictionary
   39.66       36.12    N /usr/games/sol
   43.16       39.27    N /usr/games/gnometris
   48.65       42.56    N /usr/games/gnect
   53.31       47.03    N /usr/games/gtali
   58.60       52.05    N /usr/games/iagno
   65.77       55.42    N /usr/games/gnotravex
   70.76       61.47    N /usr/games/mahjongg
   76.15       67.11    N /usr/games/gnome-sudoku
   86.32       75.15    N /usr/games/glines
   92.21       79.70    N /usr/games/glchess
  103.79       88.48    N /usr/games/gnomine
  113.84       96.51    N /usr/games/gnotski
  124.40      102.19    N /usr/games/gnibbles
  137.41      114.93    N /usr/games/gnobots2
  155.53      125.02    N /usr/games/blackjack
  179.85      135.11    N /usr/games/same-gnome
  224.49      154.50    N /usr/bin/gnome-window-properties
  248.44      162.09    N /usr/bin/gnome-default-applications-properties
  282.62      173.29    N /usr/bin/gnome-at-properties
  323.72      188.21    N /usr/bin/gnome-typing-monitor
  363.99      199.93    N /usr/bin/gnome-at-visual
  394.21      206.95    N /usr/bin/gnome-sound-properties
  435.14      224.49    N /usr/bin/gnome-at-mobility
  463.05      234.11    N /usr/bin/gnome-keybinding-properties
  503.75      248.59    N /usr/bin/gnome-about-me
  554.00      276.27    N /usr/bin/gnome-display-properties
  615.48      304.39    N /usr/bin/gnome-network-preferences
  693.03      342.01    N /usr/bin/gnome-mouse-properties
  759.90      388.58    N /usr/bin/gnome-appearance-properties
  937.90      508.47    N /usr/bin/gnome-control-center
 1109.75      587.57    N /usr/bin/gnome-keyboard-properties
 1399.05      758.16    N : oocalc
 1524.64      830.03    N : oodraw
 1684.31      900.03    N : ooimpress
 1874.04      993.91    N : oomath
 2115.12     1081.89    N : ooweb
 2369.02     1161.99    N : oowriter


> > (1) begin:     shortly after the big read IO starts;
> > (2) end:       just before the big read IO stops;
> > (3) restore:   the big read IO stops and the zsh working set restored
> > (4) restore X: after IO, switch back and forth between the urxvt and firefox
> >                windows to restore their working set.
>
> Any action done on the firefox sessions? Or just switch to a firefox
> session that needs to redraw?

After starting each new program, a new tab is opened in firefox to render a
simple web page. It's the same web page, so firefox may actually cache it.

> > The above console numbers show that
> >
> > - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29.
> >   I'd attribute that improvement to the mmap readahead improvements :-)
>
> So there are other effects,,, You not measuring the effect only this
> patchset?

Yes there are additional effects in the .29 vs .30 comparisons.
But the following .30 vs .30 comparisons in X can lead to the same conclusions
except for this additional effect.

> > - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50.
> >   That's a huge improvement - which means with the VM_EXEC protection logic,
> >   active mmap pages is pretty safe even under partially cache hot streaming IO.
>
> Looks good.
>
> > - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the
> >   dropped pages are mostly inactive ones. The patch has almost no impact in
> >   this aspect, that means it won't unnecessarily increase memory pressure.
> >   (In contrast, your 20% mmap protection ratio will keep them all, and
> >   therefore eliminate the extra 41 major faults to restore working set
> >   of zsh etc.)
>
> Good.

Thanks,
Fengguang

[-- Attachment #2: run-many-x-apps.sh --]
[-- Type: application/x-sh, Size: 1735 bytes --]

[-- Attachment #3: test-mmap-exec-prot.sh --]
[-- Type: application/x-sh, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] make mapped executable pages the first class citizen (with test cases)
  2009-06-08  9:10 [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Wu Fengguang
                   ` (2 preceding siblings ...)
  2009-06-08  9:10 ` [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list() Wu Fengguang
@ 2009-07-10  7:24 ` Nai Xia
  2009-07-10  8:34   ` Wu Fengguang
  3 siblings, 1 reply; 30+ messages in thread
From: Nai Xia @ 2009-07-10  7:24 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, KOSAKI Motohiro, Andi Kleen, Christoph Lameter,
	Elladan, Nick Piggin, Johannes Weiner, Peter Zijlstra,
	Rik van Riel, tytso, linux-mm, minchan.kim

Hi,

I was able to launch some tests with SPEC cpu2006.
The benchmark was based on mmotm
commit 0b7292956dbdfb212abf6e3c9cfb41e9471e1081 on a intel  Q6600 box with
4G ram. The kernel cmdline mem=500M was used to see how good exec-prot can
be under memory stress.

Following are the results:

                                  Estimated
                Base     Base       Base
Benchmarks      Ref.   Run Time     Ratio

mmotm with 500M
400.perlbench    9770        671      14.6  *
401.bzip2        9650       1011       9.55 *
403.gcc          8050        774      10.4  *
462.libquantum  20720       1213      17.1  *


mmot-prot with 500M
400.perlbench    9770        658      14.8  *
401.bzip2        9650       1007       9.58 *
403.gcc          8050        749      10.8  *
462.libquantum  20720       1116      18.6  *

mmotm with 4G ( allowing the full working sets)
400.perlbench    9770        594      16.5  *
401.bzip2        9650        828      11.7  *
403.gcc          8050        523      15.4  *
462.libquantum  20720       1121      18.5  *


It's worth noting that SPEC documented "The CPU2006 benchmarks
(code + workload) have been designed to fit within about 1GB of
physical memory",
and the exec vm sizes of these programs are as below:
perlbench  956KB
bzip2         56KB
gcc          3008KB
libquantum  36KB


Are we expecting to see more good results for cpu-bound programs (e.g.
scientific ones)
with large number of exec pages ?


Best Regards,

Nai Xia

On Mon, Jun 8, 2009 at 5:10 PM, Wu Fengguang<fengguang.wu@intel.com> wrote:
> Andrew,
>
> I managed to back this patchset with two test cases :)
>
> They demonstrated that
> - X desktop responsiveness can be *doubled* under high memory/swap pressure
> - it can almost stop major faults when the active file list is slowly scanned
>  because of undergoing partially cache hot streaming IO
>
> The details are included in the changelog.
>
> Thanks,
> Fengguang
> --
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] make mapped executable pages the first class citizen  (with test cases)
  2009-07-10  7:24 ` [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Nai Xia
@ 2009-07-10  8:34   ` Wu Fengguang
  2009-07-10 16:50     ` Nai Xia
  0 siblings, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2009-07-10  8:34 UTC (permalink / raw)
  To: Nai Xia
  Cc: Andrew Morton, KOSAKI Motohiro, Andi Kleen, Christoph Lameter,
	Elladan, Nick Piggin, Johannes Weiner, Peter Zijlstra,
	Rik van Riel, tytso, linux-mm, minchan.kim

On Fri, Jul 10, 2009 at 03:24:29PM +0800, Nai Xia wrote:
> Hi,
> 
> I was able to launch some tests with SPEC cpu2006.
> The benchmark was based on mmotm
> commit 0b7292956dbdfb212abf6e3c9cfb41e9471e1081 on a intel  Q6600 box with
> 4G ram. The kernel cmdline mem=500M was used to see how good exec-prot can
> be under memory stress.

Thank you for the testings, Nai!

> Following are the results:
> 
>                                   Estimated
>                 Base     Base       Base
> Benchmarks      Ref.   Run Time     Ratio
> 
> mmotm with 500M
> 400.perlbench    9770        671      14.6  *
> 401.bzip2        9650       1011       9.55 *
> 403.gcc          8050        774      10.4  *
> 462.libquantum  20720       1213      17.1  *
> 
> 
> mmot-prot with 500M
> 400.perlbench    9770        658      14.8  *
> 401.bzip2        9650       1007       9.58 *
> 403.gcc          8050        749      10.8  *
> 462.libquantum  20720       1116      18.6  *
> 
> mmotm with 4G ( allowing the full working sets)
> 400.perlbench    9770        594      16.5  *
> 401.bzip2        9650        828      11.7  *
> 403.gcc          8050        523      15.4  *
> 462.libquantum  20720       1121      18.5  *

mmotm    mmotm-prot  mmotm-4G    mmotm-prot   mmotm-4G
14.6     14.8        16.5        +1.4%        +13.0%  
 9.55     9.58       11.7        +0.3%        +22.5%  
10.4     10.8        15.4        +3.8%        +48.1%  
17.1     18.6        18.5        +8.8%         +8.2%  

So it's mostly small improvements.

> It's worth noting that SPEC documented "The CPU2006 benchmarks
> (code + workload) have been designed to fit within about 1GB of
> physical memory",
> and the exec vm sizes of these programs are as below:
> perlbench  956KB
> bzip2         56KB
> gcc          3008KB
> libquantum  36KB
> 
> 
> Are we expecting to see more good results for cpu-bound programs (e.g.
> scientific ones)
> with large number of exec pages ?

Not likely. Scientific computing is typically equipped with lots of
memory and the footprint of the program itself is relatively small.

The exec-mmap protection mainly helps when some exec pages/programs
have been inactive for some minutes and then go active. That's the
typically desktop use pattern.

Thanks,
Fengguang

> On Mon, Jun 8, 2009 at 5:10 PM, Wu Fengguang<fengguang.wu@intel.com> wrote:
> > Andrew,
> >
> > I managed to back this patchset with two test cases :)
> >
> > They demonstrated that
> > - X desktop responsiveness can be *doubled* under high memory/swap pressure
> > - it can almost stop major faults when the active file list is slowly scanned
> > A because of undergoing partially cache hot streaming IO
> >
> > The details are included in the changelog.
> >
> > Thanks,
> > Fengguang
> > --
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. A For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] make mapped executable pages the first class citizen (with test cases)
  2009-07-10  8:34   ` Wu Fengguang
@ 2009-07-10 16:50     ` Nai Xia
  0 siblings, 0 replies; 30+ messages in thread
From: Nai Xia @ 2009-07-10 16:50 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, KOSAKI Motohiro, Andi Kleen, Christoph Lameter,
	Elladan, Nick Piggin, Johannes Weiner, Peter Zijlstra,
	Rik van Riel, tytso, linux-mm, minchan.kim

On Fri, Jul 10, 2009 at 4:34 PM, Wu Fengguang<fengguang.wu@intel.com> wrote:
> On Fri, Jul 10, 2009 at 03:24:29PM +0800, Nai Xia wrote:
>> Hi,
>>
>> I was able to launch some tests with SPEC cpu2006.
>> The benchmark was based on mmotm
>> commit 0b7292956dbdfb212abf6e3c9cfb41e9471e1081 on a intel  Q6600 box with
>> 4G ram. The kernel cmdline mem=500M was used to see how good exec-prot can
>> be under memory stress.
>
> Thank you for the testings, Nai!

You are welcome :)

>
>> Following are the results:
>>
>>                                   Estimated
>>                 Base     Base       Base
>> Benchmarks      Ref.   Run Time     Ratio
>>
>> mmotm with 500M
>> 400.perlbench    9770        671      14.6  *
>> 401.bzip2        9650       1011       9.55 *
>> 403.gcc          8050        774      10.4  *
>> 462.libquantum  20720       1213      17.1  *
>>
>>
>> mmot-prot with 500M
>> 400.perlbench    9770        658      14.8  *
>> 401.bzip2        9650       1007       9.58 *
>> 403.gcc          8050        749      10.8  *
>> 462.libquantum  20720       1116      18.6  *
>>
>> mmotm with 4G ( allowing the full working sets)
>> 400.perlbench    9770        594      16.5  *
>> 401.bzip2        9650        828      11.7  *
>> 403.gcc          8050        523      15.4  *
>> 462.libquantum  20720       1121      18.5  *
>
> mmotm    mmotm-prot  mmotm-4G    mmotm-prot   mmotm-4G
> 14.6     14.8        16.5        +1.4%        +13.0%
>  9.55     9.58       11.7        +0.3%        +22.5%
> 10.4     10.8        15.4        +3.8%        +48.1%
> 17.1     18.6        18.5        +8.8%         +8.2%
>
> So it's mostly small improvements.
>
>> It's worth noting that SPEC documented "The CPU2006 benchmarks
>> (code + workload) have been designed to fit within about 1GB of
>> physical memory",
>> and the exec vm sizes of these programs are as below:
>> perlbench  956KB
>> bzip2         56KB
>> gcc          3008KB
>> libquantum  36KB
>>
>>
>> Are we expecting to see more good results for cpu-bound programs (e.g.
>> scientific ones)
>> with large number of exec pages ?
>
> Not likely. Scientific computing is typically equipped with lots of
> memory and the footprint of the program itself is relatively small.

OK, well, maybe as long as there is still swapping, improvement is
possible. Actually, in the above cases like bzip2, its exec footprint
is already quite small compared to the percentage of the improvement.
Let me see if I am lucky enough to have someone majoring in computing chemistry
in our Univ. give a benchmark. :) You know they have relatively small
machines doing
small personal computing jobs and sometimes swapping still matters.

>
> The exec-mmap protection mainly helps when some exec pages/programs
> have been inactive for some minutes and then go active. That's the
> typically desktop use pattern.

OK.  Still it's good to see that this patch can improve more than 20% on average
on non-typical cases, hehe.

Regards,
Nai

>
> Thanks,
> Fengguang
>
>> On Mon, Jun 8, 2009 at 5:10 PM, Wu Fengguang<fengguang.wu@intel.com> wrote:
>> > Andrew,
>> >
>> > I managed to back this patchset with two test cases :)
>> >
>> > They demonstrated that
>> > - X desktop responsiveness can be *doubled* under high memory/swap pressure
>> > - it can almost stop major faults when the active file list is slowly scanned
>> >  because of undergoing partially cache hot streaming IO
>> >
>> > The details are included in the changelog.
>> >
>> > Thanks,
>> > Fengguang
>> > --
>> >
>> > --
>> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> > the body to majordomo@kvack.org.  For more info on Linux MM,
>> > see: http://www.linux-mm.org/ .
>> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-19 10:18         ` Johannes Weiner
@ 2009-05-19 10:32           ` Wu Fengguang
  -1 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-19 10:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, LKML, Peter Zijlstra, Christoph Lameter,
	KOSAKI Motohiro, riel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

On Tue, May 19, 2009 at 06:18:33PM +0800, Johannes Weiner wrote:
> On Tue, May 19, 2009 at 10:43:16AM +0800, Wu Fengguang wrote:
> > @@ -1283,6 +1319,7 @@ static void shrink_active_list(unsigned 
> >  			 * are ignored, since JVM can create lots of anon
> >  			 * VM_EXEC pages.
> >  			 */
> > +			if (page_cluster)
> >  			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> >  				list_add(&page->lru, &l_active);
> >  				continue;
> 
> Huh, what's with that hunk?

Ah, sorry, that's a handy debugging knob ;)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-19 10:32           ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-19 10:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, LKML, Peter Zijlstra, Christoph Lameter,
	KOSAKI Motohiro, riel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

On Tue, May 19, 2009 at 06:18:33PM +0800, Johannes Weiner wrote:
> On Tue, May 19, 2009 at 10:43:16AM +0800, Wu Fengguang wrote:
> > @@ -1283,6 +1319,7 @@ static void shrink_active_list(unsigned 
> >  			 * are ignored, since JVM can create lots of anon
> >  			 * VM_EXEC pages.
> >  			 */
> > +			if (page_cluster)
> >  			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> >  				list_add(&page->lru, &l_active);
> >  				continue;
> 
> Huh, what's with that hunk?

Ah, sorry, that's a handy debugging knob ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-19  2:43       ` Wu Fengguang
@ 2009-05-19 10:18         ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2009-05-19 10:18 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, Peter Zijlstra, Christoph Lameter,
	KOSAKI Motohiro, riel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

On Tue, May 19, 2009 at 10:43:16AM +0800, Wu Fengguang wrote:
> @@ -1283,6 +1319,7 @@ static void shrink_active_list(unsigned 
>  			 * are ignored, since JVM can create lots of anon
>  			 * VM_EXEC pages.
>  			 */
> +			if (page_cluster)
>  			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
>  				list_add(&page->lru, &l_active);
>  				continue;

Huh, what's with that hunk?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-19 10:18         ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2009-05-19 10:18 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, Peter Zijlstra, Christoph Lameter,
	KOSAKI Motohiro, riel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

On Tue, May 19, 2009 at 10:43:16AM +0800, Wu Fengguang wrote:
> @@ -1283,6 +1319,7 @@ static void shrink_active_list(unsigned 
>  			 * are ignored, since JVM can create lots of anon
>  			 * VM_EXEC pages.
>  			 */
> +			if (page_cluster)
>  			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
>  				list_add(&page->lru, &l_active);
>  				continue;

Huh, what's with that hunk?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-18  9:16     ` Wu Fengguang
@ 2009-05-19  2:43       ` Wu Fengguang
  -1 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-19  2:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Peter Zijlstra, Christoph Lameter, KOSAKI Motohiro, hannes,
	riel, tytso, linux-mm, elladan, npiggin, minchan.kim

[update2: use !is_active_lru()]
 
---
vmscan: merge duplicate code in shrink_active_list()

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active file list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

However that's fine according to Johannes's comments:

  I don't think that this could be harmful.  We just preserve the buffer
  mappings of what we consider the working set and with low memory
  pressure, as you say, this set is not big.

  As to stripping of reactivated pages: the only pages we re-activate
  for now are those VM_EXEC mapped ones.  Since we don't expect IO from
  or to these pages, removing the buffer mappings in case they grow too
  large should be okay, I guess.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   96 ++++++++++++++++++++++----------------------------
 1 file changed, 43 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (!is_active_lru(lru))
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (!is_active_lru(lru))
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1236,8 +1273,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1254,6 +1289,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1283,6 +1319,7 @@ static void shrink_active_list(unsigned 
 			 * are ignored, since JVM can create lots of anon
 			 * VM_EXEC pages.
 			 */
+			if (page_cluster)
 			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
 				list_add(&page->lru, &l_active);
 				continue;
@@ -1295,8 +1332,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
@@ -1306,57 +1341,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-19  2:43       ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-19  2:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Peter Zijlstra, Christoph Lameter, KOSAKI Motohiro, hannes,
	riel, tytso, linux-mm, elladan, npiggin, minchan.kim

[update2: use !is_active_lru()]
 
---
vmscan: merge duplicate code in shrink_active_list()

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active file list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

However that's fine according to Johannes's comments:

  I don't think that this could be harmful.  We just preserve the buffer
  mappings of what we consider the working set and with low memory
  pressure, as you say, this set is not big.

  As to stripping of reactivated pages: the only pages we re-activate
  for now are those VM_EXEC mapped ones.  Since we don't expect IO from
  or to these pages, removing the buffer mappings in case they grow too
  large should be okay, I guess.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   96 ++++++++++++++++++++++----------------------------
 1 file changed, 43 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (!is_active_lru(lru))
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (!is_active_lru(lru))
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1236,8 +1273,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1254,6 +1289,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1283,6 +1319,7 @@ static void shrink_active_list(unsigned 
 			 * are ignored, since JVM can create lots of anon
 			 * VM_EXEC pages.
 			 */
+			if (page_cluster)
 			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
 				list_add(&page->lru, &l_active);
 				continue;
@@ -1295,8 +1332,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
@@ -1306,57 +1341,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-17  2:23   ` Wu Fengguang
@ 2009-05-18  9:16     ` Wu Fengguang
  -1 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-18  9:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Peter Zijlstra, Christoph Lameter, KOSAKI Motohiro, hannes,
	riel, tytso, linux-mm, elladan, npiggin, minchan.kim

[update: expand changelog and use is_active_lru()]

---
vmscan: merge duplicate code in shrink_active_list()

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active file list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

However that's fine according to Johannes's comments:

  I don't think that this could be harmful.  We just preserve the buffer
  mappings of what we consider the working set and with low memory
  pressure, as you say, this set is not big.

  As to stripping of reactivated pages: the only pages we re-activate
  for now are those VM_EXEC mapped ones.  Since we don't expect IO from
  or to these pages, removing the buffer mappings in case they grow too
  large should be okay, I guess.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   95 ++++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (is_active_lru(lru))
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (is_active_lru(lru))
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1236,8 +1273,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1254,6 +1289,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1295,8 +1331,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
@@ -1306,57 +1340,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-18  9:16     ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-18  9:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Peter Zijlstra, Christoph Lameter, KOSAKI Motohiro, hannes,
	riel, tytso, linux-mm, elladan, npiggin, minchan.kim

[update: expand changelog and use is_active_lru()]

---
vmscan: merge duplicate code in shrink_active_list()

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active file list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

However that's fine according to Johannes's comments:

  I don't think that this could be harmful.  We just preserve the buffer
  mappings of what we consider the working set and with low memory
  pressure, as you say, this set is not big.

  As to stripping of reactivated pages: the only pages we re-activate
  for now are those VM_EXEC mapped ones.  Since we don't expect IO from
  or to these pages, removing the buffer mappings in case they grow too
  large should be okay, I guess.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   95 ++++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (is_active_lru(lru))
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (is_active_lru(lru))
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1236,8 +1273,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1254,6 +1289,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1295,8 +1331,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
@@ -1306,57 +1340,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-17  2:23 [PATCH 0/3] make mapped executable pages the first class citizen Wu Fengguang
@ 2009-05-17  2:23   ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-17  2:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Peter Zijlstra, Wu Fengguang, Christoph Lameter,
	KOSAKI Motohiro, hannes, riel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

[-- Attachment #1: mm-vmscan-reduce-code.patch --]
[-- Type: text/plain, Size: 4955 bytes --]

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   95 ++++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (lru == LRU_INACTIVE_ANON || lru == LRU_INACTIVE_FILE)
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (lru == LRU_INACTIVE_ANON || lru == LRU_INACTIVE_FILE)
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1236,8 +1273,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1254,6 +1289,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1295,8 +1331,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
@@ -1306,57 +1340,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

-- 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-17  2:23   ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-17  2:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Peter Zijlstra, Wu Fengguang, Christoph Lameter,
	KOSAKI Motohiro, hannes, riel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

[-- Attachment #1: mm-vmscan-reduce-code.patch --]
[-- Type: text/plain, Size: 5180 bytes --]

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   95 ++++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (lru == LRU_INACTIVE_ANON || lru == LRU_INACTIVE_FILE)
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (lru == LRU_INACTIVE_ANON || lru == LRU_INACTIVE_FILE)
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1236,8 +1273,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1254,6 +1289,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1295,8 +1331,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated,
@@ -1306,57 +1340,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-16  9:00   ` Wu Fengguang
@ 2009-05-17  1:24     ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2009-05-17  1:24 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, Rik van Riel, Christoph Lameter,
	KOSAKI Motohiro, hannes, peterz, tytso, linux-mm, elladan,
	npiggin

On Sat, May 16, 2009 at 6:00 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> The "move pages to active list" and "move pages to inactive list"
> code blocks are mostly identical and can be served by a function.
>
> Thanks to Andrew Morton for pointing this out.
>
> Note that buffer_heads_over_limit check will also be carried out
> for re-activated pages, which is slightly different from pre-2.6.28
> kernels. Also, Rik's "vmscan: evict use-once pages first" patch
> could totally stop scans of active list when memory pressure is low.

To clarify, active file list. otherwise is good to me.
Thanks for your great effort to enhance VM. :)

> CC: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-17  1:24     ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2009-05-17  1:24 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, Rik van Riel, Christoph Lameter,
	KOSAKI Motohiro, hannes, peterz, tytso, linux-mm, elladan,
	npiggin

On Sat, May 16, 2009 at 6:00 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> The "move pages to active list" and "move pages to inactive list"
> code blocks are mostly identical and can be served by a function.
>
> Thanks to Andrew Morton for pointing this out.
>
> Note that buffer_heads_over_limit check will also be carried out
> for re-activated pages, which is slightly different from pre-2.6.28
> kernels. Also, Rik's "vmscan: evict use-once pages first" patch
> could totally stop scans of active list when memory pressure is low.

To clarify, active file list. otherwise is good to me.
Thanks for your great effort to enhance VM. :)

> CC: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-16  9:00   ` Wu Fengguang
@ 2009-05-16 14:35     ` Rik van Riel
  -1 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2009-05-16 14:35 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, Christoph Lameter, KOSAKI Motohiro, hannes,
	peterz, tytso, linux-mm, elladan, npiggin, minchan.kim

Wu Fengguang wrote:
> The "move pages to active list" and "move pages to inactive list"
> code blocks are mostly identical and can be served by a function.
> 
> Thanks to Andrew Morton for pointing this out.
> 
> Note that buffer_heads_over_limit check will also be carried out
> for re-activated pages, which is slightly different from pre-2.6.28
> kernels. Also, Rik's "vmscan: evict use-once pages first" patch
> could totally stop scans of active list when memory pressure is low.
> So the net effect could be, the number of buffer heads is now more
> likely to grow large.
> 
> CC: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-16 14:35     ` Rik van Riel
  0 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2009-05-16 14:35 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, Christoph Lameter, KOSAKI Motohiro, hannes,
	peterz, tytso, linux-mm, elladan, npiggin, minchan.kim

Wu Fengguang wrote:
> The "move pages to active list" and "move pages to inactive list"
> code blocks are mostly identical and can be served by a function.
> 
> Thanks to Andrew Morton for pointing this out.
> 
> Note that buffer_heads_over_limit check will also be carried out
> for re-activated pages, which is slightly different from pre-2.6.28
> kernels. Also, Rik's "vmscan: evict use-once pages first" patch
> could totally stop scans of active list when memory pressure is low.
> So the net effect could be, the number of buffer heads is now more
> likely to grow large.
> 
> CC: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-16 13:39     ` Johannes Weiner
@ 2009-05-16 13:47       ` Wu Fengguang
  -1 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-16 13:47 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, LKML, Rik van Riel, Christoph Lameter,
	KOSAKI Motohiro, peterz, tytso, linux-mm, elladan, npiggin,
	minchan.kim

On Sat, May 16, 2009 at 09:39:50PM +0800, Johannes Weiner wrote:
> On Sat, May 16, 2009 at 05:00:08PM +0800, Wu Fengguang wrote:
> > The "move pages to active list" and "move pages to inactive list"
> > code blocks are mostly identical and can be served by a function.
> > 
> > Thanks to Andrew Morton for pointing this out.
> > 
> > Note that buffer_heads_over_limit check will also be carried out
> > for re-activated pages, which is slightly different from pre-2.6.28
> > kernels. Also, Rik's "vmscan: evict use-once pages first" patch
> > could totally stop scans of active list when memory pressure is low.
> > So the net effect could be, the number of buffer heads is now more
> > likely to grow large.
> 
> I don't think that this could be harmful.  We just preserve the buffer
> mappings of what we consider the working set and with low memory
> pressure, as you say, this set is not big.
> 
> As to stripping of reactivated pages: the only pages we re-activate
> for now are those VM_EXEC mapped ones.  Since we don't expect IO from
> or to these pages, removing the buffer mappings in case they grow too
> large should be okay, I guess.

Agreed - and good analyzes, thanks!

Fengguang

> > CC: Rik van Riel <riel@redhat.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> 
> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-16 13:47       ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-16 13:47 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, LKML, Rik van Riel, Christoph Lameter,
	KOSAKI Motohiro, peterz, tytso, linux-mm, elladan, npiggin,
	minchan.kim

On Sat, May 16, 2009 at 09:39:50PM +0800, Johannes Weiner wrote:
> On Sat, May 16, 2009 at 05:00:08PM +0800, Wu Fengguang wrote:
> > The "move pages to active list" and "move pages to inactive list"
> > code blocks are mostly identical and can be served by a function.
> > 
> > Thanks to Andrew Morton for pointing this out.
> > 
> > Note that buffer_heads_over_limit check will also be carried out
> > for re-activated pages, which is slightly different from pre-2.6.28
> > kernels. Also, Rik's "vmscan: evict use-once pages first" patch
> > could totally stop scans of active list when memory pressure is low.
> > So the net effect could be, the number of buffer heads is now more
> > likely to grow large.
> 
> I don't think that this could be harmful.  We just preserve the buffer
> mappings of what we consider the working set and with low memory
> pressure, as you say, this set is not big.
> 
> As to stripping of reactivated pages: the only pages we re-activate
> for now are those VM_EXEC mapped ones.  Since we don't expect IO from
> or to these pages, removing the buffer mappings in case they grow too
> large should be okay, I guess.

Agreed - and good analyzes, thanks!

Fengguang

> > CC: Rik van Riel <riel@redhat.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> 
> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-16  9:00   ` Wu Fengguang
@ 2009-05-16 13:39     ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2009-05-16 13:39 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, Rik van Riel, Christoph Lameter,
	KOSAKI Motohiro, peterz, tytso, linux-mm, elladan, npiggin,
	minchan.kim

On Sat, May 16, 2009 at 05:00:08PM +0800, Wu Fengguang wrote:
> The "move pages to active list" and "move pages to inactive list"
> code blocks are mostly identical and can be served by a function.
> 
> Thanks to Andrew Morton for pointing this out.
> 
> Note that buffer_heads_over_limit check will also be carried out
> for re-activated pages, which is slightly different from pre-2.6.28
> kernels. Also, Rik's "vmscan: evict use-once pages first" patch
> could totally stop scans of active list when memory pressure is low.
> So the net effect could be, the number of buffer heads is now more
> likely to grow large.

I don't think that this could be harmful.  We just preserve the buffer
mappings of what we consider the working set and with low memory
pressure, as you say, this set is not big.

As to stripping of reactivated pages: the only pages we re-activate
for now are those VM_EXEC mapped ones.  Since we don't expect IO from
or to these pages, removing the buffer mappings in case they grow too
large should be okay, I guess.

> CC: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-16 13:39     ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2009-05-16 13:39 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, Rik van Riel, Christoph Lameter,
	KOSAKI Motohiro, peterz, tytso, linux-mm, elladan, npiggin,
	minchan.kim

On Sat, May 16, 2009 at 05:00:08PM +0800, Wu Fengguang wrote:
> The "move pages to active list" and "move pages to inactive list"
> code blocks are mostly identical and can be served by a function.
> 
> Thanks to Andrew Morton for pointing this out.
> 
> Note that buffer_heads_over_limit check will also be carried out
> for re-activated pages, which is slightly different from pre-2.6.28
> kernels. Also, Rik's "vmscan: evict use-once pages first" patch
> could totally stop scans of active list when memory pressure is low.
> So the net effect could be, the number of buffer heads is now more
> likely to grow large.

I don't think that this could be harmful.  We just preserve the buffer
mappings of what we consider the working set and with low memory
pressure, as you say, this set is not big.

As to stripping of reactivated pages: the only pages we re-activate
for now are those VM_EXEC mapped ones.  Since we don't expect IO from
or to these pages, removing the buffer mappings in case they grow too
large should be okay, I guess.

> CC: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
  2009-05-16  9:00 [PATCH 0/3] make mapped executable pages the first class citizen Wu Fengguang
@ 2009-05-16  9:00   ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-16  9:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Rik van Riel, Wu Fengguang, Christoph Lameter,
	KOSAKI Motohiro, hannes, peterz, tytso, linux-mm, elladan,
	npiggin, minchan.kim

[-- Attachment #1: mm-vmscan-reduce-code.patch --]
[-- Type: text/plain, Size: 4799 bytes --]

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

CC: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   95 ++++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (lru == LRU_INACTIVE_ANON || lru == LRU_INACTIVE_FILE)
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (lru == LRU_INACTIVE_ANON || lru == LRU_INACTIVE_FILE)
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1236,8 +1273,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1254,6 +1289,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1295,8 +1331,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated.
@@ -1305,57 +1339,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

-- 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-16  9:00   ` Wu Fengguang
  0 siblings, 0 replies; 30+ messages in thread
From: Wu Fengguang @ 2009-05-16  9:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Rik van Riel, Wu Fengguang, Christoph Lameter,
	KOSAKI Motohiro, hannes, peterz, tytso, linux-mm, elladan,
	npiggin, minchan.kim

[-- Attachment #1: mm-vmscan-reduce-code.patch --]
[-- Type: text/plain, Size: 5024 bytes --]

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Note that buffer_heads_over_limit check will also be carried out
for re-activated pages, which is slightly different from pre-2.6.28
kernels. Also, Rik's "vmscan: evict use-once pages first" patch
could totally stop scans of active list when memory pressure is low.
So the net effect could be, the number of buffer heads is now more
likely to grow large.

CC: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   95 ++++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 53 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,43 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(struct zone *zone,
+				     struct list_head *list,
+				     enum lru_list lru)
+{
+	unsigned long pgmoved = 0;
+	struct pagevec pvec;
+	struct page *page;
+
+	pagevec_init(&pvec, 1);
+
+	while (!list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (lru == LRU_INACTIVE_ANON || lru == LRU_INACTIVE_FILE)
+			ClearPageActive(page);	/* we are de-activating */
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+
+		if (!pagevec_add(&pvec, page) || list_empty(list)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (lru == LRU_INACTIVE_ANON || lru == LRU_INACTIVE_FILE)
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1236,8 +1273,6 @@ static void shrink_active_list(unsigned 
 	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
-	struct pagevec pvec;
-	enum lru_list lru;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
 
 	lru_add_drain();
@@ -1254,6 +1289,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1295,8 +1331,6 @@ static void shrink_active_list(unsigned 
 	/*
 	 * Move pages back to the lru list.
 	 */
-	pagevec_init(&pvec, 1);
-
 	spin_lock_irq(&zone->lru_lock);
 	/*
 	 * Count referenced pages from currently used mappings as rotated.
@@ -1305,57 +1339,12 @@ static void shrink_active_list(unsigned 
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(zone, &l_active,
+						LRU_ACTIVE + file * LRU_FILE);
+	move_active_pages_to_lru(zone, &l_inactive,
+						LRU_BASE   + file * LRU_FILE);
 
 	spin_unlock_irq(&zone->lru_lock);
-	if (buffer_heads_over_limit)
-		pagevec_strip(&pvec);
-	pagevec_release(&pvec);
 }
 
 static int inactive_anon_is_low_global(struct zone *zone)

-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-07-10 16:50 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-08  9:10 [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Wu Fengguang
2009-06-08  9:10 ` [PATCH 1/3] vmscan: report vm_flags in page_referenced() Wu Fengguang
2009-06-08  9:10 ` [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Wu Fengguang
2009-06-08 15:34   ` Christoph Lameter
2009-06-08 17:30     ` Nai Xia
2009-06-09  3:28     ` Wu Fengguang
2009-06-08  9:10 ` [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list() Wu Fengguang
2009-07-10  7:24 ` [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Nai Xia
2009-07-10  8:34   ` Wu Fengguang
2009-07-10 16:50     ` Nai Xia
  -- strict thread matches above, loose matches on Subject: below --
2009-05-17  2:23 [PATCH 0/3] make mapped executable pages the first class citizen Wu Fengguang
2009-05-17  2:23 ` [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list() Wu Fengguang
2009-05-17  2:23   ` Wu Fengguang
2009-05-18  9:16   ` Wu Fengguang
2009-05-18  9:16     ` Wu Fengguang
2009-05-19  2:43     ` Wu Fengguang
2009-05-19  2:43       ` Wu Fengguang
2009-05-19 10:18       ` Johannes Weiner
2009-05-19 10:18         ` Johannes Weiner
2009-05-19 10:32         ` Wu Fengguang
2009-05-19 10:32           ` Wu Fengguang
2009-05-16  9:00 [PATCH 0/3] make mapped executable pages the first class citizen Wu Fengguang
2009-05-16  9:00 ` [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list() Wu Fengguang
2009-05-16  9:00   ` Wu Fengguang
2009-05-16 13:39   ` Johannes Weiner
2009-05-16 13:39     ` Johannes Weiner
2009-05-16 13:47     ` Wu Fengguang
2009-05-16 13:47       ` Wu Fengguang
2009-05-16 14:35   ` Rik van Riel
2009-05-16 14:35     ` Rik van Riel
2009-05-17  1:24   ` Minchan Kim
2009-05-17  1:24     ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.