linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] VM split active lists
@ 2004-03-09  5:32 Nick Piggin
  2004-03-09  5:33 ` [RFC][PATCH 1/4] vm-lrutopage-cleanup Nick Piggin
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Nick Piggin @ 2004-03-09  5:32 UTC (permalink / raw)
  To: linux-kernel, Linux Memory Management

Hi,
Background: there are a number of problems in the 2.6 page reclaim
algorithms. Thankfully, most of them were simple oversights or small
bugs, the worst of which Andrew Morton and myself have fixes for in
his -mm tree and being mostly simple and obviously correct, they will
hopefully be included in 2.6.5.

With these fixes, 2.6 swapping performance (the area I'm focusing on)
is very much improved. Unfortunately there is another more complex
patch in limbo that improves performance by a additional 10%. It is
Nikita's dont-rotate-active-list.

The reason for the improvement is that it improves ordering of mapped
pages on the active list. Now I I'd like to fix this problem and get
this 10%. However dont-rotate-active-list is pretty ugly to put it
nicely.

OK, the theory is that mapped pagecache pages are worth more than
unmapped pages. This is a good theory because mapped pages will
usually have far more random access patterns, so pagein *and* pageout
will be much less efficient. Also, applications are probably coded to
be more suited to blocking in read() than a random code / anon memory
page. So a factor of >= 16 wouldn't be out of the question.

Now the basic problem is that we have these two classes of pages on
one (the active) list, and we attempt to place different scanning
semantics on each class. This is done with the reclaim_mapped logic.
Now I won't be too disparaging of reclaim_mapped because I think
Andrew crea^W^W^W^W it somehow more or less works, but it has a couple
of problems.

* Difficult to trace: relies on some saved state from earlier in time.
* difficult to control: relies on inner workings (eg "priority").
  mapped vs unmapped scanning behaviour is derived basically by black
  magic.
* not-quite-right semantics: mapped pages are infinitely preferable
  to unmapped pages until something goes click and then they are worth
  about half as much.
* These semantics mean that in low memory pressure (before the click),
  truely inactive mapped pages will never be reclaimed. Probably they
  should be to increase resident working set.
* Also, a significant number of mapped pages can be passed over
  without doing any real work.
* This causes list position information to be lost (which is where
  that 10% comes from).

Now I have an alternative which hopefully solves all these problems
and with less complexity than dont-rotate-active-list which only
solves the last one: split the active list into active_mapped and
active_unmapped lists. Pages are moved between them lazily at scan
time, and they needn't be totally accurate.

You then simply put 16 (or whatever) times the amount of pressure on
the unmapped list as you do on the mapped list. This number can be the
tunable (instead of swapiness).

I have an implementation which compiles, boots, and survives a -j8
kbuild. Probably still has a few problems though. Couple of things: it
presently just puts even pressure on both lists, so it is swappy
(trivial to fix). It also gives unmapped pages the full two level
(active+inactive) system because it was just easier to do it that way.
Don't know if this would be good or bad.

The patches go like this:
1/4: vm-lrutopage-cleanup
Cleanup from Nikita's dont-rotate-active-list patch.

2/4: vm-nofixed-active-list
Generalise active list scanning to scan different lists.

3/4: vm-no-reclaim_mapped
Kill reclaim_mapped and its merry men.

4/4: vm-mapped-x-active-lists
Split the active list into mapped and unmapped pages.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 1/4] vm-lrutopage-cleanup
  2004-03-09  5:32 [RFC][PATCH 0/4] VM split active lists Nick Piggin
@ 2004-03-09  5:33 ` Nick Piggin
  2004-03-09  5:33   ` [RFC][PATCH 2/4] vm-nofixed-active-list Nick Piggin
  2004-03-09  5:34 ` [RFC][PATCH 3/4] vm-no-reclaim_mapped Nick Piggin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2004-03-09  5:33 UTC (permalink / raw)
  To: linux-kernel, Linux Memory Management

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: vm-lrutopage-cleanup.patch --]
[-- Type: text/x-patch, Size: 3369 bytes --]


Cleanup from Nikita's dont-rotate-active-list patch.


 linux-2.6-npiggin/mm/vmscan.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff -puN mm/vmscan.c~vm-lrutopage-cleanup mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-lrutopage-cleanup	2004-03-09 13:51:29.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2004-03-09 16:31:28.000000000 +1100
@@ -45,14 +45,15 @@
 int vm_swappiness = 60;
 static long total_memory;
 
+#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
+
 #ifdef ARCH_HAS_PREFETCH
 #define prefetch_prev_lru_page(_page, _base, _field)			\
 	do {								\
 		if ((_page)->lru.prev != _base) {			\
 			struct page *prev;				\
 									\
-			prev = list_entry(_page->lru.prev,		\
-					struct page, lru);		\
+			prev = lru_to_page(&(_page->lru));		\
 			prefetch(&prev->_field);			\
 		}							\
 	} while (0)
@@ -66,8 +67,7 @@ static long total_memory;
 		if ((_page)->lru.prev != _base) {			\
 			struct page *prev;				\
 									\
-			prev = list_entry(_page->lru.prev,		\
-					struct page, lru);		\
+			prev = lru_to_page(&(_page->lru));			\
 			prefetchw(&prev->_field);			\
 		}							\
 	} while (0)
@@ -262,7 +262,7 @@ shrink_list(struct list_head *page_list,
 		int may_enter_fs;
 		int referenced;
 
-		page = list_entry(page_list->prev, struct page, lru);
+		page = lru_to_page(page_list);
 		list_del(&page->lru);
 
 		if (TestSetPageLocked(page))
@@ -496,8 +496,7 @@ shrink_cache(struct zone *zone, unsigned
 
 		while (nr_scan++ < SWAP_CLUSTER_MAX &&
 				!list_empty(&zone->inactive_list)) {
-			page = list_entry(zone->inactive_list.prev,
-						struct page, lru);
+			page = lru_to_page(&zone->inactive_list);
 
 			prefetchw_prev_lru_page(page,
 						&zone->inactive_list, flags);
@@ -542,7 +541,7 @@ shrink_cache(struct zone *zone, unsigned
 		 * Put back any unfreeable pages.
 		 */
 		while (!list_empty(&page_list)) {
-			page = list_entry(page_list.prev, struct page, lru);
+			page = lru_to_page(&page_list);
 			if (TestSetPageLRU(page))
 				BUG();
 			list_del(&page->lru);
@@ -601,7 +600,7 @@ refill_inactive_zone(struct zone *zone, 
 	pgmoved = 0;
 	spin_lock_irq(&zone->lru_lock);
 	while (nr_pages && !list_empty(&zone->active_list)) {
-		page = list_entry(zone->active_list.prev, struct page, lru);
+		page = lru_to_page(&zone->active_list);
 		prefetchw_prev_lru_page(page, &zone->active_list, flags);
 		if (!TestClearPageLRU(page))
 			BUG();
@@ -652,7 +651,7 @@ refill_inactive_zone(struct zone *zone, 
 		reclaim_mapped = 1;
 
 	while (!list_empty(&l_hold)) {
-		page = list_entry(l_hold.prev, struct page, lru);
+		page = lru_to_page(&l_hold);
 		list_del(&page->lru);
 		if (page_mapped(page)) {
 			if (!reclaim_mapped) {
@@ -683,7 +682,7 @@ refill_inactive_zone(struct zone *zone, 
 	pgmoved = 0;
 	spin_lock_irq(&zone->lru_lock);
 	while (!list_empty(&l_inactive)) {
-		page = list_entry(l_inactive.prev, struct page, lru);
+		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
 		if (TestSetPageLRU(page))
 			BUG();
@@ -712,7 +711,7 @@ refill_inactive_zone(struct zone *zone, 
 
 	pgmoved = 0;
 	while (!list_empty(&l_active)) {
-		page = list_entry(l_active.prev, struct page, lru);
+		page = lru_to_page(&l_active);
 		prefetchw_prev_lru_page(page, &l_active, flags);
 		if (TestSetPageLRU(page))
 			BUG();

_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 2/4] vm-nofixed-active-list
  2004-03-09  5:33 ` [RFC][PATCH 1/4] vm-lrutopage-cleanup Nick Piggin
@ 2004-03-09  5:33   ` Nick Piggin
  0 siblings, 0 replies; 17+ messages in thread
From: Nick Piggin @ 2004-03-09  5:33 UTC (permalink / raw)
  To: linux-kernel, Linux Memory Management

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: vm-nofixed-active-list.patch --]
[-- Type: text/x-patch, Size: 2709 bytes --]


Generalise active list scanning to scan different lists.



 linux-2.6-npiggin/mm/vmscan.c |   21 ++++++++++-----------
 1 files changed, 10 insertions(+), 11 deletions(-)

diff -puN mm/vmscan.c~vm-nofixed-active-list mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-nofixed-active-list	2004-03-09 13:57:23.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2004-03-09 16:31:27.000000000 +1100
@@ -579,13 +579,12 @@ done:
  * The downside is that we have to touch page->count against each page.
  * But we had to alter page->flags anyway.
  */
-static void
-refill_inactive_zone(struct zone *zone, const int nr_pages_in,
-			struct page_state *ps)
+static void shrink_active_list(struct zone *zone, struct list_head *list,
+				const int nr_scan, struct page_state *ps)
 {
 	int pgmoved;
 	int pgdeactivate = 0;
-	int nr_pages = nr_pages_in;
+	int nr_pages = nr_scan;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_inactive);	/* Pages to go onto the inactive_list */
 	LIST_HEAD(l_active);	/* Pages to go onto the active_list */
@@ -599,16 +598,16 @@ refill_inactive_zone(struct zone *zone, 
 	lru_add_drain();
 	pgmoved = 0;
 	spin_lock_irq(&zone->lru_lock);
-	while (nr_pages && !list_empty(&zone->active_list)) {
-		page = lru_to_page(&zone->active_list);
-		prefetchw_prev_lru_page(page, &zone->active_list, flags);
+	while (nr_pages && !list_empty(list)) {
+		page = lru_to_page(list);
+		prefetchw_prev_lru_page(page, list, flags);
 		if (!TestClearPageLRU(page))
 			BUG();
 		list_del(&page->lru);
 		if (page_count(page) == 0) {
 			/* It is currently in pagevec_release() */
 			SetPageLRU(page);
-			list_add(&page->lru, &zone->active_list);
+			list_add(&page->lru, list);
 		} else {
 			page_cache_get(page);
 			list_add(&page->lru, &l_hold);
@@ -716,7 +715,7 @@ refill_inactive_zone(struct zone *zone, 
 		if (TestSetPageLRU(page))
 			BUG();
 		BUG_ON(!PageActive(page));
-		list_move(&page->lru, &zone->active_list);
+		list_move(&page->lru, list);
 		pgmoved++;
 		if (!pagevec_add(&pvec, page)) {
 			zone->nr_active += pgmoved;
@@ -730,7 +729,7 @@ refill_inactive_zone(struct zone *zone, 
 	spin_unlock_irq(&zone->lru_lock);
 	pagevec_release(&pvec);
 
-	mod_page_state_zone(zone, pgrefill, nr_pages_in - nr_pages);
+	mod_page_state_zone(zone, pgrefill, nr_scan - nr_pages);
 	mod_page_state(pgdeactivate, pgdeactivate);
 }
 
@@ -762,7 +761,7 @@ shrink_zone(struct zone *zone, int max_s
 	count = atomic_read(&zone->nr_scan_active);
 	if (count >= SWAP_CLUSTER_MAX) {
 		atomic_set(&zone->nr_scan_active, 0);
-		refill_inactive_zone(zone, count, ps);
+		shrink_active_list(zone, &zone->active_list, count, ps);
 	}
 
 	atomic_add(max_scan, &zone->nr_scan_inactive);

_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 3/4] vm-no-reclaim_mapped
  2004-03-09  5:32 [RFC][PATCH 0/4] VM split active lists Nick Piggin
  2004-03-09  5:33 ` [RFC][PATCH 1/4] vm-lrutopage-cleanup Nick Piggin
@ 2004-03-09  5:34 ` Nick Piggin
  2004-03-09  5:35 ` [RFC][PATCH 4/4] vm-mapped-x-active-lists Nick Piggin
  2004-03-10  5:10 ` [RFC][PATCH 0/4] VM split active lists Nick Piggin
  3 siblings, 0 replies; 17+ messages in thread
From: Nick Piggin @ 2004-03-09  5:34 UTC (permalink / raw)
  To: linux-kernel, Linux Memory Management

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: vm-no-reclaim_mapped.patch --]
[-- Type: text/x-patch, Size: 6146 bytes --]


Kill reclaim_mapped and its merry men.


 linux-2.6-npiggin/include/linux/mmzone.h |   19 -------
 linux-2.6-npiggin/mm/page_alloc.c        |    2 
 linux-2.6-npiggin/mm/vmscan.c            |   76 +++----------------------------
 3 files changed, 8 insertions(+), 89 deletions(-)

diff -puN mm/vmscan.c~vm-no-reclaim_mapped mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-no-reclaim_mapped	2004-03-09 14:05:03.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2004-03-09 16:31:21.000000000 +1100
@@ -590,10 +590,6 @@ static void shrink_active_list(struct zo
 	LIST_HEAD(l_active);	/* Pages to go onto the active_list */
 	struct page *page;
 	struct pagevec pvec;
-	int reclaim_mapped = 0;
-	long mapped_ratio;
-	long distress;
-	long swap_tendency;
 
 	lru_add_drain();
 	pgmoved = 0;
@@ -618,59 +614,23 @@ static void shrink_active_list(struct zo
 	zone->nr_active -= pgmoved;
 	spin_unlock_irq(&zone->lru_lock);
 
-	/*
-	 * `distress' is a measure of how much trouble we're having reclaiming
-	 * pages.  0 -> no problems.  100 -> great trouble.
-	 */
-	distress = 100 >> zone->prev_priority;
-
-	/*
-	 * The point of this algorithm is to decide when to start reclaiming
-	 * mapped memory instead of just pagecache.  Work out how much memory
-	 * is mapped.
-	 */
-	mapped_ratio = (ps->nr_mapped * 100) / total_memory;
-
-	/*
-	 * Now decide how much we really want to unmap some pages.  The mapped
-	 * ratio is downgraded - just because there's a lot of mapped memory
-	 * doesn't necessarily mean that page reclaim isn't succeeding.
-	 *
-	 * The distress ratio is important - we don't want to start going oom.
-	 *
-	 * A 100% value of vm_swappiness overrides this algorithm altogether.
-	 */
-	swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;
-
-	/*
-	 * Now use this metric to decide whether to start moving mapped memory
-	 * onto the inactive list.
-	 */
-	if (swap_tendency >= 100)
-		reclaim_mapped = 1;
-
 	while (!list_empty(&l_hold)) {
 		page = lru_to_page(&l_hold);
 		list_del(&page->lru);
-		if (page_mapped(page)) {
-			if (!reclaim_mapped) {
-				list_add(&page->lru, &l_active);
-				continue;
-			}
-			pte_chain_lock(page);
-			if (page_referenced(page)) {
-				pte_chain_unlock(page);
-				list_add(&page->lru, &l_active);
-				continue;
-			}
+		pte_chain_lock(page);
+		if (page_referenced(page)) {
 			pte_chain_unlock(page);
+			list_add(&page->lru, &l_active);
+			continue;
 		}
+		pte_chain_unlock(page);
+
 		/*
 		 * FIXME: need to consider page_count(page) here if/when we
 		 * reap orphaned pages via the LRU (Daniel's locking stuff)
 		 */
-		if (total_swap_pages == 0 && !page->mapping &&
-						!PagePrivate(page)) {
+		if (unlikely(total_swap_pages == 0 && !page->mapping &&
+						!PagePrivate(page))) {
 			list_add(&page->lru, &l_active);
 			continue;
 		}
@@ -800,9 +760,6 @@ shrink_caches(struct zone **zones, int p
 		struct zone *zone = zones[i];
 		int max_scan;
 
-		if (zone->free_pages < zone->pages_high)
-			zone->temp_priority = priority;
-
 		if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 			continue;	/* Let kswapd poll it */
 
@@ -840,9 +797,6 @@ int try_to_free_pages(struct zone **zone
 
 	inc_page_state(allocstall);
 
-	for (i = 0; zones[i] != 0; i++)
-		zones[i]->temp_priority = DEF_PRIORITY;
-
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		int total_scanned = 0;
 		struct page_state ps;
@@ -875,8 +829,6 @@ int try_to_free_pages(struct zone **zone
 	if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY))
 		out_of_memory();
 out:
-	for (i = 0; zones[i] != 0; i++)
-		zones[i]->prev_priority = zones[i]->temp_priority;
 	return ret;
 }
 
@@ -914,12 +866,6 @@ static int balance_pgdat(pg_data_t *pgda
 
 	inc_page_state(pageoutrun);
 
-	for (i = 0; i < pgdat->nr_zones; i++) {
-		struct zone *zone = pgdat->node_zones + i;
-
-		zone->temp_priority = DEF_PRIORITY;
-	}
-
 	for (priority = DEF_PRIORITY; priority; priority--) {
 		int all_zones_ok = 1;
 		int pages_scanned = 0;
@@ -970,7 +916,6 @@ scan:
 				if (zone->free_pages <= zone->pages_high)
 					all_zones_ok = 0;
 			}
-			zone->temp_priority = priority;
 			max_scan = zone->nr_inactive >> priority;
 			reclaimed = shrink_zone(zone, max_scan, GFP_KERNEL,
 					&total_scanned, ps);
@@ -996,11 +941,6 @@ scan:
 			blk_congestion_wait(WRITE, HZ/10);
 	}
 out:
-	for (i = 0; i < pgdat->nr_zones; i++) {
-		struct zone *zone = pgdat->node_zones + i;
-
-		zone->prev_priority = zone->temp_priority;
-	}
 	return nr_pages - to_free;
 }
 
diff -puN include/linux/mmzone.h~vm-no-reclaim_mapped include/linux/mmzone.h
--- linux-2.6/include/linux/mmzone.h~vm-no-reclaim_mapped	2004-03-09 14:14:39.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mmzone.h	2004-03-09 16:31:21.000000000 +1100
@@ -86,25 +86,6 @@ struct zone {
 	ZONE_PADDING(_pad2_)
 
 	/*
-	 * prev_priority holds the scanning priority for this zone.  It is
-	 * defined as the scanning priority at which we achieved our reclaim
-	 * target at the previous try_to_free_pages() or balance_pgdat()
-	 * invokation.
-	 *
-	 * We use prev_priority as a measure of how much stress page reclaim is
-	 * under - it drives the swappiness decision: whether to unmap mapped
-	 * pages.
-	 *
-	 * temp_priority is used to remember the scanning priority at which
-	 * this zone was successfully refilled to free_pages == pages_high.
-	 *
-	 * Access to both these fields is quite racy even on uniprocessor.  But
-	 * it is expected to average out OK.
-	 */
-	int temp_priority;
-	int prev_priority;
-
-	/*
 	 * free areas of different sizes
 	 */
 	struct free_area	free_area[MAX_ORDER];
diff -puN mm/page_alloc.c~vm-no-reclaim_mapped mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-no-reclaim_mapped	2004-03-09 14:15:00.000000000 +1100
+++ linux-2.6-npiggin/mm/page_alloc.c	2004-03-09 16:31:21.000000000 +1100
@@ -1408,8 +1408,6 @@ static void __init free_area_init_core(s
 		zone->zone_pgdat = pgdat;
 		zone->free_pages = 0;
 
-		zone->temp_priority = zone->prev_priority = DEF_PRIORITY;
-
 		/*
 		 * The per-cpu-pages pools are set to around 1000th of the
 		 * size of the zone.  But no more than 1/4 of a meg - there's

_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  5:32 [RFC][PATCH 0/4] VM split active lists Nick Piggin
  2004-03-09  5:33 ` [RFC][PATCH 1/4] vm-lrutopage-cleanup Nick Piggin
  2004-03-09  5:34 ` [RFC][PATCH 3/4] vm-no-reclaim_mapped Nick Piggin
@ 2004-03-09  5:35 ` Nick Piggin
  2004-03-09  5:39   ` Nick Piggin
                     ` (2 more replies)
  2004-03-10  5:10 ` [RFC][PATCH 0/4] VM split active lists Nick Piggin
  3 siblings, 3 replies; 17+ messages in thread
From: Nick Piggin @ 2004-03-09  5:35 UTC (permalink / raw)
  To: linux-kernel, Linux Memory Management

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: vm-mapped-x-active-lists.patch --]
[-- Type: text/x-patch, Size: 16316 bytes --]


Split the active list into mapped and unmapped pages.


 linux-2.6-npiggin/include/linux/mm_inline.h  |   33 +++++++++--
 linux-2.6-npiggin/include/linux/mmzone.h     |    9 ++-
 linux-2.6-npiggin/include/linux/page-flags.h |   50 ++++++++++--------
 linux-2.6-npiggin/mm/page_alloc.c            |   24 +++++---
 linux-2.6-npiggin/mm/swap.c                  |   35 +++++++++---
 linux-2.6-npiggin/mm/vmscan.c                |   75 +++++++++++++++++++--------
 6 files changed, 156 insertions(+), 70 deletions(-)

diff -puN include/linux/mmzone.h~vm-mapped-x-active-lists include/linux/mmzone.h
--- linux-2.6/include/linux/mmzone.h~vm-mapped-x-active-lists	2004-03-09 14:16:26.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mmzone.h	2004-03-09 14:27:31.000000000 +1100
@@ -74,11 +74,14 @@ struct zone {
 	ZONE_PADDING(_pad1_)
 
 	spinlock_t		lru_lock;	
-	struct list_head	active_list;
+	struct list_head	active_mapped_list;
+	struct list_head	active_unmapped_list;
 	struct list_head	inactive_list;
-	atomic_t		nr_scan_active;
+	atomic_t		nr_scan_active_mapped;
+	atomic_t		nr_scan_active_unmapped;
 	atomic_t		nr_scan_inactive;
-	unsigned long		nr_active;
+	unsigned long		nr_active_mapped;
+	unsigned long		nr_active_unmapped;
 	unsigned long		nr_inactive;
 	int			all_unreclaimable; /* All pages pinned */
 	unsigned long		pages_scanned;	   /* since last reclaim */
diff -puN include/linux/mm_inline.h~vm-mapped-x-active-lists include/linux/mm_inline.h
--- linux-2.6/include/linux/mm_inline.h~vm-mapped-x-active-lists	2004-03-09 14:16:48.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mm_inline.h	2004-03-09 14:59:35.000000000 +1100
@@ -1,9 +1,16 @@
 
 static inline void
-add_page_to_active_list(struct zone *zone, struct page *page)
+add_page_to_active_mapped_list(struct zone *zone, struct page *page)
 {
-	list_add(&page->lru, &zone->active_list);
-	zone->nr_active++;
+	list_add(&page->lru, &zone->active_mapped_list);
+	zone->nr_active_mapped++;
+}
+
+static inline void
+add_page_to_active_unmapped_list(struct zone *zone, struct page *page)
+{
+	list_add(&page->lru, &zone->active_unmapped_list);
+	zone->nr_active_unmapped++;
 }
 
 static inline void
@@ -14,10 +21,17 @@ add_page_to_inactive_list(struct zone *z
 }
 
 static inline void
-del_page_from_active_list(struct zone *zone, struct page *page)
+del_page_from_active_mapped_list(struct zone *zone, struct page *page)
+{
+	list_del(&page->lru);
+	zone->nr_active_mapped--;
+}
+
+static inline void
+del_page_from_active_unmapped_list(struct zone *zone, struct page *page)
 {
 	list_del(&page->lru);
-	zone->nr_active--;
+	zone->nr_active_unmapped--;
 }
 
 static inline void
@@ -31,9 +45,12 @@ static inline void
 del_page_from_lru(struct zone *zone, struct page *page)
 {
 	list_del(&page->lru);
-	if (PageActive(page)) {
-		ClearPageActive(page);
-		zone->nr_active--;
+	if (PageActiveMapped(page)) {
+		ClearPageActiveMapped(page);
+		zone->nr_active_mapped--;
+	} else if (PageActiveUnmapped(page)) {
+		ClearPageActiveUnmapped(page);
+		zone->nr_active_unmapped--;
 	} else {
 		zone->nr_inactive--;
 	}
diff -puN mm/page_alloc.c~vm-mapped-x-active-lists mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-mapped-x-active-lists	2004-03-09 14:18:44.000000000 +1100
+++ linux-2.6-npiggin/mm/page_alloc.c	2004-03-09 14:58:32.000000000 +1100
@@ -81,7 +81,7 @@ static void bad_page(const char *functio
 	page->flags &= ~(1 << PG_private	|
 			1 << PG_locked	|
 			1 << PG_lru	|
-			1 << PG_active	|
+			1 << PG_active_mapped	|
 			1 << PG_dirty	|
 			1 << PG_writeback);
 	set_page_count(page, 0);
@@ -217,7 +217,8 @@ static inline void free_pages_check(cons
 			1 << PG_lru	|
 			1 << PG_private |
 			1 << PG_locked	|
-			1 << PG_active	|
+			1 << PG_active_mapped	|
+			1 << PG_active_unmapped	|
 			1 << PG_reclaim	|
 			1 << PG_slab	|
 			1 << PG_writeback )))
@@ -324,7 +325,8 @@ static void prep_new_page(struct page *p
 			1 << PG_private	|
 			1 << PG_locked	|
 			1 << PG_lru	|
-			1 << PG_active	|
+			1 << PG_active_mapped	|
+			1 << PG_active_unmapped	|
 			1 << PG_dirty	|
 			1 << PG_reclaim	|
 			1 << PG_writeback )))
@@ -818,7 +820,8 @@ unsigned int nr_used_zone_pages(void)
 	struct zone *zone;
 
 	for_each_zone(zone)
-		pages += zone->nr_active + zone->nr_inactive;
+		pages += zone->nr_active_mapped + zone->nr_active_unmapped
+			+ zone->nr_inactive;
 
 	return pages;
 }
@@ -955,7 +958,7 @@ void get_zone_counts(unsigned long *acti
 	*inactive = 0;
 	*free = 0;
 	for_each_zone(zone) {
-		*active += zone->nr_active;
+		*active += zone->nr_active_mapped + zone->nr_active_unmapped;
 		*inactive += zone->nr_inactive;
 		*free += zone->free_pages;
 	}
@@ -1068,7 +1071,7 @@ void show_free_areas(void)
 			K(zone->pages_min),
 			K(zone->pages_low),
 			K(zone->pages_high),
-			K(zone->nr_active),
+			K(zone->nr_active_mapped + zone->nr_active_unmapped),
 			K(zone->nr_inactive),
 			K(zone->present_pages)
 			);
@@ -1441,11 +1444,14 @@ static void __init free_area_init_core(s
 		}
 		printk("  %s zone: %lu pages, LIFO batch:%lu\n",
 				zone_names[j], realsize, batch);
-		INIT_LIST_HEAD(&zone->active_list);
+		INIT_LIST_HEAD(&zone->active_mapped_list);
+		INIT_LIST_HEAD(&zone->active_unmapped_list);
 		INIT_LIST_HEAD(&zone->inactive_list);
-		atomic_set(&zone->nr_scan_active, 0);
+		atomic_set(&zone->nr_scan_active_mapped, 0);
+		atomic_set(&zone->nr_scan_active_unmapped, 0);
 		atomic_set(&zone->nr_scan_inactive, 0);
-		zone->nr_active = 0;
+		zone->nr_active_mapped = 0;
+		zone->nr_active_unmapped = 0;
 		zone->nr_inactive = 0;
 		if (!size)
 			continue;
diff -puN mm/vmscan.c~vm-mapped-x-active-lists mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-mapped-x-active-lists	2004-03-09 14:20:02.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2004-03-09 15:20:44.000000000 +1100
@@ -272,7 +272,7 @@ shrink_list(struct list_head *page_list,
 		if (page_mapped(page) || PageSwapCache(page))
 			(*nr_scanned)++;
 
-		BUG_ON(PageActive(page));
+		BUG_ON(PageActiveMapped(page) || PageActiveUnmapped(page));
 
 		if (PageWriteback(page))
 			goto keep_locked;
@@ -450,7 +450,10 @@ free_it:
 		continue;
 
 activate_locked:
-		SetPageActive(page);
+		if (page_mapped(page))
+			SetPageActiveMapped(page);
+		else
+			SetPageActiveUnmapped(page);
 		pgactivate++;
 keep_locked:
 		unlock_page(page);
@@ -545,8 +548,10 @@ shrink_cache(struct zone *zone, unsigned
 			if (TestSetPageLRU(page))
 				BUG();
 			list_del(&page->lru);
-			if (PageActive(page))
-				add_page_to_active_list(zone, page);
+			if (PageActiveMapped(page))
+				add_page_to_active_mapped_list(zone, page);
+			else if (PageActiveUnmapped(page))
+				add_page_to_active_unmapped_list(zone, page);
 			else
 				add_page_to_inactive_list(zone, page);
 			if (!pagevec_add(&pvec, page)) {
@@ -580,9 +585,10 @@ done:
  * But we had to alter page->flags anyway.
  */
 static void shrink_active_list(struct zone *zone, struct list_head *list,
-				const int nr_scan, struct page_state *ps)
+		unsigned long *list_count, const int nr_scan,
+		struct page_state *ps)
 {
-	int pgmoved;
+	int pgmoved, pgmoved_unmapped;
 	int pgdeactivate = 0;
 	int nr_pages = nr_scan;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
@@ -611,7 +617,7 @@ static void shrink_active_list(struct zo
 		}
 		nr_pages--;
 	}
-	zone->nr_active -= pgmoved;
+	*list_count -= pgmoved;
 	spin_unlock_irq(&zone->lru_lock);
 
 	while (!list_empty(&l_hold)) {
@@ -645,7 +651,8 @@ static void shrink_active_list(struct zo
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
 		if (TestSetPageLRU(page))
 			BUG();
-		if (!TestClearPageActive(page))
+		if (!TestClearPageActiveMapped(page)
+				&& !TestClearPageActiveUnmapped(page))
 			BUG();
 		list_move(&page->lru, &zone->inactive_list);
 		pgmoved++;
@@ -669,23 +676,37 @@ static void shrink_active_list(struct zo
 	}
 
 	pgmoved = 0;
+	pgmoved_unmapped = 0;
 	while (!list_empty(&l_active)) {
 		page = lru_to_page(&l_active);
 		prefetchw_prev_lru_page(page, &l_active, flags);
 		if (TestSetPageLRU(page))
 			BUG();
-		BUG_ON(!PageActive(page));
-		list_move(&page->lru, list);
-		pgmoved++;
+		if(!TestClearPageActiveMapped(page)
+				&& !TestClearPageActiveUnmapped(page))
+			BUG();
+		if (page_mapped(page)) {
+			SetPageActiveMapped(page);
+			list_move(&page->lru, &zone->active_mapped_list);
+			pgmoved++;
+		} else {
+			SetPageActiveUnmapped(page);
+			list_move(&page->lru, &zone->active_unmapped_list);
+			pgmoved_unmapped++;
+		}
+			
 		if (!pagevec_add(&pvec, page)) {
-			zone->nr_active += pgmoved;
+			zone->nr_active_mapped += pgmoved;
 			pgmoved = 0;
+			zone->nr_active_unmapped += pgmoved_unmapped;
+			pgmoved_unmapped = 0;
 			spin_unlock_irq(&zone->lru_lock);
 			__pagevec_release(&pvec);
 			spin_lock_irq(&zone->lru_lock);
 		}
 	}
-	zone->nr_active += pgmoved;
+	zone->nr_active_mapped += pgmoved;
+	zone->nr_active_unmapped += pgmoved_unmapped;
 	spin_unlock_irq(&zone->lru_lock);
 	pagevec_release(&pvec);
 
@@ -702,6 +723,8 @@ shrink_zone(struct zone *zone, int max_s
 		int *total_scanned, struct page_state *ps)
 {
 	unsigned long ratio;
+	unsigned long long mapped_ratio;
+	unsigned long nr_active;
 	int count;
 
 	/*
@@ -714,14 +737,27 @@ shrink_zone(struct zone *zone, int max_s
 	 * just to make sure that the kernel will slowly sift through the
 	 * active list.
 	 */
-	ratio = (unsigned long)SWAP_CLUSTER_MAX * zone->nr_active /
-				((zone->nr_inactive | 1) * 2);
+	nr_active = zone->nr_active_mapped + zone->nr_active_unmapped;
+	ratio = (unsigned long)SWAP_CLUSTER_MAX * nr_active /
+				(zone->nr_inactive * 2 + 1);
+	mapped_ratio = (unsigned long long)ratio * nr_active;
+	do_div(mapped_ratio, zone->nr_active_mapped+1);
+
+	ratio = ratio - mapped_ratio;
+	atomic_add(ratio+1, &zone->nr_scan_active_unmapped);
+	count = atomic_read(&zone->nr_scan_active_unmapped);
+	if (count >= SWAP_CLUSTER_MAX) {
+		atomic_set(&zone->nr_scan_active_unmapped, 0);
+		shrink_active_list(zone, &zone->active_unmapped_list,
+					&zone->nr_active_unmapped, count, ps);
+	}
 
-	atomic_add(ratio+1, &zone->nr_scan_active);
-	count = atomic_read(&zone->nr_scan_active);
+	atomic_add(mapped_ratio+1, &zone->nr_scan_active_mapped);
+	count = atomic_read(&zone->nr_scan_active_mapped);
 	if (count >= SWAP_CLUSTER_MAX) {
-		atomic_set(&zone->nr_scan_active, 0);
-		shrink_active_list(zone, &zone->active_list, count, ps);
+		atomic_set(&zone->nr_scan_active_mapped, 0);
+		shrink_active_list(zone, &zone->active_mapped_list,
+					&zone->nr_active_mapped, count, ps);
 	}
 
 	atomic_add(max_scan, &zone->nr_scan_inactive);
@@ -793,7 +829,6 @@ int try_to_free_pages(struct zone **zone
 	int ret = 0;
 	int nr_reclaimed = 0;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
-	int i;
 
 	inc_page_state(allocstall);
 
diff -puN include/linux/page-flags.h~vm-mapped-x-active-lists include/linux/page-flags.h
--- linux-2.6/include/linux/page-flags.h~vm-mapped-x-active-lists	2004-03-09 14:28:57.000000000 +1100
+++ linux-2.6-npiggin/include/linux/page-flags.h	2004-03-09 15:01:04.000000000 +1100
@@ -58,23 +58,25 @@
 
 #define PG_dirty	 	 4
 #define PG_lru			 5
-#define PG_active		 6
-#define PG_slab			 7	/* slab debug (Suparna wants this) */
+#define PG_active_mapped	 6
+#define PG_active_unmapped	 7
 
-#define PG_highmem		 8
-#define PG_checked		 9	/* kill me in 2.5.<early>. */
-#define PG_arch_1		10
-#define PG_reserved		11
-
-#define PG_private		12	/* Has something at ->private */
-#define PG_writeback		13	/* Page is under writeback */
-#define PG_nosave		14	/* Used for system suspend/resume */
-#define PG_chainlock		15	/* lock bit for ->pte_chain */
-
-#define PG_direct		16	/* ->pte_chain points directly at pte */
-#define PG_mappedtodisk		17	/* Has blocks allocated on-disk */
-#define PG_reclaim		18	/* To be reclaimed asap */
-#define PG_compound		19	/* Part of a compound page */
+#define PG_slab			 8	/* slab debug (Suparna wants this) */
+#define PG_highmem		 9
+#define PG_checked		10	/* kill me in 2.5.<early>. */
+#define PG_arch_1		11
+
+#define PG_reserved		12
+#define PG_private		13	/* Has something at ->private */
+#define PG_writeback		14	/* Page is under writeback */
+#define PG_nosave		15	/* Used for system suspend/resume */
+
+#define PG_chainlock		16	/* lock bit for ->pte_chain */
+#define PG_direct		17	/* ->pte_chain points directly at pte */
+#define PG_mappedtodisk		18	/* Has blocks allocated on-disk */
+#define PG_reclaim		19	/* To be reclaimed asap */
+
+#define PG_compound		20	/* Part of a compound page */
 
 
 /*
@@ -211,11 +213,17 @@ extern void get_full_page_state(struct p
 #define TestSetPageLRU(page)	test_and_set_bit(PG_lru, &(page)->flags)
 #define TestClearPageLRU(page)	test_and_clear_bit(PG_lru, &(page)->flags)
 
-#define PageActive(page)	test_bit(PG_active, &(page)->flags)
-#define SetPageActive(page)	set_bit(PG_active, &(page)->flags)
-#define ClearPageActive(page)	clear_bit(PG_active, &(page)->flags)
-#define TestClearPageActive(page) test_and_clear_bit(PG_active, &(page)->flags)
-#define TestSetPageActive(page) test_and_set_bit(PG_active, &(page)->flags)
+#define PageActiveMapped(page)		test_bit(PG_active_mapped, &(page)->flags)
+#define SetPageActiveMapped(page)	set_bit(PG_active_mapped, &(page)->flags)
+#define ClearPageActiveMapped(page)	clear_bit(PG_active_mapped, &(page)->flags)
+#define TestClearPageActiveMapped(page) test_and_clear_bit(PG_active_mapped, &(page)->flags)
+#define TestSetPageActiveMapped(page) test_and_set_bit(PG_active_mapped, &(page)->flags)
+
+#define PageActiveUnmapped(page)	test_bit(PG_active_unmapped, &(page)->flags)
+#define SetPageActiveUnmapped(page)	set_bit(PG_active_unmapped, &(page)->flags)
+#define ClearPageActiveUnmapped(page)	clear_bit(PG_active_unmapped, &(page)->flags)
+#define TestClearPageActiveUnmapped(page) test_and_clear_bit(PG_active_unmapped, &(page)->flags)
+#define TestSetPageActiveUnmapped(page) test_and_set_bit(PG_active_unmapped, &(page)->flags)
 
 #define PageSlab(page)		test_bit(PG_slab, &(page)->flags)
 #define SetPageSlab(page)	set_bit(PG_slab, &(page)->flags)
diff -puN mm/swap.c~vm-mapped-x-active-lists mm/swap.c
--- linux-2.6/mm/swap.c~vm-mapped-x-active-lists	2004-03-09 14:33:07.000000000 +1100
+++ linux-2.6-npiggin/mm/swap.c	2004-03-09 15:00:35.000000000 +1100
@@ -58,14 +58,18 @@ int rotate_reclaimable_page(struct page 
 		return 1;
 	if (PageDirty(page))
 		return 1;
-	if (PageActive(page))
+	if (PageActiveMapped(page))
+		return 1;
+	if (PageActiveUnmapped(page))
 		return 1;
 	if (!PageLRU(page))
 		return 1;
 
 	zone = page_zone(page);
 	spin_lock_irqsave(&zone->lru_lock, flags);
-	if (PageLRU(page) && !PageActive(page)) {
+	if (PageLRU(page)
+		&& !PageActiveMapped(page) && !PageActiveUnmapped(page)) {
+
 		list_del(&page->lru);
 		list_add_tail(&page->lru, &zone->inactive_list);
 		inc_page_state(pgrotated);
@@ -84,10 +88,18 @@ void fastcall activate_page(struct page 
 	struct zone *zone = page_zone(page);
 
 	spin_lock_irq(&zone->lru_lock);
-	if (PageLRU(page) && !PageActive(page)) {
+	if (PageLRU(page)
+		&& !PageActiveMapped(page) && !PageActiveUnmapped(page)) {
+
 		del_page_from_inactive_list(zone, page);
-		SetPageActive(page);
-		add_page_to_active_list(zone, page);
+
+		if (page_mapped(page)) {
+			SetPageActiveMapped(page);
+			add_page_to_active_mapped_list(zone, page);
+		} else {
+			SetPageActiveUnmapped(page);
+			add_page_to_active_unmapped_list(zone, page);
+		}
 		inc_page_state(pgactivate);
 	}
 	spin_unlock_irq(&zone->lru_lock);
@@ -102,7 +114,8 @@ void fastcall activate_page(struct page 
  */
 void fastcall mark_page_accessed(struct page *page)
 {
-	if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) {
+	if (!PageActiveMapped(page) && !PageActiveUnmapped(page)
+			&& PageReferenced(page) && PageLRU(page)) {
 		activate_page(page);
 		ClearPageReferenced(page);
 	} else if (!PageReferenced(page)) {
@@ -310,9 +323,13 @@ void __pagevec_lru_add_active(struct pag
 		}
 		if (TestSetPageLRU(page))
 			BUG();
-		if (TestSetPageActive(page))
-			BUG();
-		add_page_to_active_list(zone, page);
+		if (page_mapped(page)) {
+			SetPageActiveMapped(page);
+			add_page_to_active_mapped_list(zone, page);
+		} else {
+			SetPageActiveMapped(page);
+			add_page_to_active_unmapped_list(zone, page);
+		}
 	}
 	if (zone)
 		spin_unlock_irq(&zone->lru_lock);

_

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  5:35 ` [RFC][PATCH 4/4] vm-mapped-x-active-lists Nick Piggin
@ 2004-03-09  5:39   ` Nick Piggin
  2004-03-09  5:47   ` Mike Fedyk
  2004-03-09 15:26   ` Marc-Christian Petersen
  2 siblings, 0 replies; 17+ messages in thread
From: Nick Piggin @ 2004-03-09  5:39 UTC (permalink / raw)
  To: linux-kernel, Linux Memory Management



Nick Piggin wrote:

>
>@@ -714,14 +737,27 @@ shrink_zone(struct zone *zone, int max_s
> 	 * just to make sure that the kernel will slowly sift through the
> 	 * active list.
> 	 */
>-	ratio = (unsigned long)SWAP_CLUSTER_MAX * zone->nr_active /
>-				((zone->nr_inactive | 1) * 2);
>+	nr_active = zone->nr_active_mapped + zone->nr_active_unmapped;
>+	ratio = (unsigned long)SWAP_CLUSTER_MAX * nr_active /
>+				(zone->nr_inactive * 2 + 1);
>+	mapped_ratio = (unsigned long long)ratio * nr_active;
>+	do_div(mapped_ratio, zone->nr_active_mapped+1);
>

Just for information, this is where you would balance mapped vs unmapped
pages:    do_div(mapped_ratio, 16); /* mapped pages are worth 16 times 
more */

>+
>+	ratio = ratio - mapped_ratio;
>+	atomic_add(ratio+1, &zone->nr_scan_active_unmapped);
>+	count = atomic_read(&zone->nr_scan_active_unmapped);
>+	if (count >= SWAP_CLUSTER_MAX) {
>+		atomic_set(&zone->nr_scan_active_unmapped, 0);
>+		shrink_active_list(zone, &zone->active_unmapped_list,
>+					&zone->nr_active_unmapped, count, ps);
>+	}
> 
>-	atomic_add(ratio+1, &zone->nr_scan_active);
>-	count = atomic_read(&zone->nr_scan_active);
>+	atomic_add(mapped_ratio+1, &zone->nr_scan_active_mapped);
>+	count = atomic_read(&zone->nr_scan_active_mapped);
> 	if (count >= SWAP_CLUSTER_MAX) {
>-		atomic_set(&zone->nr_scan_active, 0);
>-		shrink_active_list(zone, &zone->active_list, count, ps);
>+		atomic_set(&zone->nr_scan_active_mapped, 0);
>+		shrink_active_list(zone, &zone->active_mapped_list,
>+					&zone->nr_active_mapped, count, ps);
> 	}
> 
> 	atomic_add(max_scan, &zone->nr_scan_inactive);
>
>  
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  5:35 ` [RFC][PATCH 4/4] vm-mapped-x-active-lists Nick Piggin
  2004-03-09  5:39   ` Nick Piggin
@ 2004-03-09  5:47   ` Mike Fedyk
  2004-03-09  6:06     ` Nick Piggin
  2004-03-09 15:26   ` Marc-Christian Petersen
  2 siblings, 1 reply; 17+ messages in thread
From: Mike Fedyk @ 2004-03-09  5:47 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Linux Memory Management

Nick Piggin wrote:
> 
> 
> ------------------------------------------------------------------------
> 
> 
> Split the active list into mapped and unmapped pages.

This looks similar to Rik's Active and Active-anon lists in 2.4-rmap.

Also, how does this interact with Andrea's VM work?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  5:47   ` Mike Fedyk
@ 2004-03-09  6:06     ` Nick Piggin
  2004-03-09  7:02       ` William Lee Irwin III
  0 siblings, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2004-03-09  6:06 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel, Linux Memory Management



Mike Fedyk wrote:

> Nick Piggin wrote:
>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> Split the active list into mapped and unmapped pages.
>
>
> This looks similar to Rik's Active and Active-anon lists in 2.4-rmap.
>

Oh? I haven't looked at 2.4-rmap for a while. Well I guess that gives
it more credibility, thanks.

> Also, how does this interact with Andrea's VM work?
>

Not sure to be honest, I haven't looked at it :\. I'm not really
sure if the rmap mitigation direction is just a holdover until
page clustering or intended as a permanent feature...

Either way, I trust its proponents will take the onus for regressions.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  6:06     ` Nick Piggin
@ 2004-03-09  7:02       ` William Lee Irwin III
  2004-03-09  7:23         ` Nick Piggin
  0 siblings, 1 reply; 17+ messages in thread
From: William Lee Irwin III @ 2004-03-09  7:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Mike Fedyk, linux-kernel, Linux Memory Management

On Tue, Mar 09, 2004 at 05:06:37PM +1100, Nick Piggin wrote:
> Not sure to be honest, I haven't looked at it :\. I'm not really
> sure if the rmap mitigation direction is just a holdover until
> page clustering or intended as a permanent feature...
> Either way, I trust its proponents will take the onus for regressions.

Actually, anobjrmap does wonderful things wrt. liberating pgcl
internals from some very frustrating complications having to do with
assumptions of a 1:1 correspondence between pte pages and struct pages,
so I would regard work in the direction of anobjrmap as useful to
advance the state of page clustering regardless of its rmap mitigation
overtones.  The "partial" objrmap is actually insufficient to clean up
this assumption, and introduces new failure modes I don't like (which
it is in fact not necessary to do; aa's code is very close to doing the
partial-but-insufficient-for-pgcl objrmap properly apart from trying to
allocate more pte_chains than necessary and not falling back to the vma
lists for linear/nonlinear mapping mixtures). The current port has some
code to deal with this I'm extremely eager to dump as soon as things
such as anobjrmap etc. make it possible, if they're merged.

Current efforts are now a background/spare time affair centering around
non-i386 architectures and driver audits.


-- wli

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  7:02       ` William Lee Irwin III
@ 2004-03-09  7:23         ` Nick Piggin
  2004-03-09  7:37           ` William Lee Irwin III
  2004-03-09  9:24           ` William Lee Irwin III
  0 siblings, 2 replies; 17+ messages in thread
From: Nick Piggin @ 2004-03-09  7:23 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Mike Fedyk, linux-kernel, Linux Memory Management



William Lee Irwin III wrote:

>On Tue, Mar 09, 2004 at 05:06:37PM +1100, Nick Piggin wrote:
>
>>Not sure to be honest, I haven't looked at it :\. I'm not really
>>sure if the rmap mitigation direction is just a holdover until
>>page clustering or intended as a permanent feature...
>>Either way, I trust its proponents will take the onus for regressions.
>>
>
>Actually, anobjrmap does wonderful things wrt. liberating pgcl
>internals from some very frustrating complications having to do with
>assumptions of a 1:1 correspondence between pte pages and struct pages,
>so I would regard work in the direction of anobjrmap as useful to
>advance the state of page clustering regardless of its rmap mitigation
>overtones.  The "partial" objrmap is actually insufficient to clean up
>this assumption, and introduces new failure modes I don't like (which
>it is in fact not necessary to do; aa's code is very close to doing the
>partial-but-insufficient-for-pgcl objrmap properly apart from trying to
>allocate more pte_chains than necessary and not falling back to the vma
>lists for linear/nonlinear mapping mixtures). The current port has some
>code to deal with this I'm extremely eager to dump as soon as things
>such as anobjrmap etc. make it possible, if they're merged.
>
>Current efforts are now a background/spare time affair centering around
>non-i386 architectures and driver audits.
>

OK. I had just noticed that the people complaining about rmap most
are the ones using 4K page size (x86-64 uses 4K, doesn't it?). Not
that this fact means it is OK to ignore them problem, but I thought
maybe pgcl might solve it in a more general way.

I wonder how much you gain with objrmap / anobjrmap on say a 64K page
architecture?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  7:23         ` Nick Piggin
@ 2004-03-09  7:37           ` William Lee Irwin III
  2004-03-09  9:24           ` William Lee Irwin III
  1 sibling, 0 replies; 17+ messages in thread
From: William Lee Irwin III @ 2004-03-09  7:37 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Mike Fedyk, linux-kernel, Linux Memory Management

William Lee Irwin III wrote:
>> Current efforts are now a background/spare time affair centering around
>> non-i386 architectures and driver audits.

On Tue, Mar 09, 2004 at 06:23:53PM +1100, Nick Piggin wrote:
> OK. I had just noticed that the people complaining about rmap most
> are the ones using 4K page size (x86-64 uses 4K, doesn't it?). Not
> that this fact means it is OK to ignore them problem, but I thought
> maybe pgcl might solve it in a more general way.
> I wonder how much you gain with objrmap / anobjrmap on say a 64K page
> architecture?

pgcl doesn't reduce userspace's mapping granularity. The current
implementation has the same pte_chain overhead as mainline for the same
virtualspace mapped. It's unclear how feasible it is to reduce this
overhead, though various proposals have gone around. I've ignored the
potential pte_chain reduction issue entirely in favor of concentrating
on more basic correctness and functionality. The removal of the 1:1 pte
page : struct page assumption is the vastly more important aspect of
anobjrmap in relation to pgcl, since removing that assumption would
remove a significant piece of complexity.

-- wli

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  7:23         ` Nick Piggin
  2004-03-09  7:37           ` William Lee Irwin III
@ 2004-03-09  9:24           ` William Lee Irwin III
  1 sibling, 0 replies; 17+ messages in thread
From: William Lee Irwin III @ 2004-03-09  9:24 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Mike Fedyk, linux-kernel, Linux Memory Management

On Tue, Mar 09, 2004 at 06:23:53PM +1100, Nick Piggin wrote:
> OK. I had just noticed that the people complaining about rmap most
> are the ones using 4K page size (x86-64 uses 4K, doesn't it?). Not
> that this fact means it is OK to ignore them problem, but I thought
> maybe pgcl might solve it in a more general way.

There is something to be gained in terms of general cache and memory
footprint of non-reclamation-oriented operations. The sad thing is
that many of the arguments presented in favor of these object-based
physical-to-virtual resolution methods are largely for what I'd call
the wrong reasons. Kernel compiles are not a realistic workload. fork()
is used in real applications that are forking servers, and those are
what should be instrumented for the performance argument. Cache and
memory conservation are also legitimate concerns, which are being
expressed in ways that pollute them with the stigma of highmem.


On Tue, Mar 09, 2004 at 06:23:53PM +1100, Nick Piggin wrote:
> I wonder how much you gain with objrmap / anobjrmap on say a 64K page
> architecture?

The gains I spoke of earlier are completely in terms of implementation
mechanics and unrelated to concerns such as performance. Essentially,
ptes are expected to be of some size, and it's desirable that they
remain of those sizes and not "widened" artificially lest we incur more
fragmentation. The pte_chain -based physical-to-virtual resolution
algorithm utilized the struct page tracking a pte page, which is unique
in mainline, to shove information used by the physical-to-virtual
resolution algorithm into. This gets rather ugly when the struct page
corresponds to multiple pte pages. pte pages are already grossly
underutilized (ca. 20%) with stock 4K pte pages; jacking them up to
64KB/etc. worsens space utilization, has larger latencies associated
with bitblitting the things, and is just plain ugly to implement.

anobjrmap OTOH removes this dependency on the struct page tracking a
4K pte page. 4K (or otherwise sub-PAGE_SIZE) blocks of memory for ptes
may be freely used without incurring implementation complexity or the
other disadvantages above. The partial objrmap doesn't remove this
dependency on a unique struct page tracking a pte page, retaining it
for the cases of anonymous and nonlinearly-mapped pagecache pages.


-- wli

P.S.: The best word I could come up with for leaf radix tree nodes
	of pagetables was "pte page". This term is not meant to imply
	they are of size PAGE_SIZE.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09  5:35 ` [RFC][PATCH 4/4] vm-mapped-x-active-lists Nick Piggin
  2004-03-09  5:39   ` Nick Piggin
  2004-03-09  5:47   ` Mike Fedyk
@ 2004-03-09 15:26   ` Marc-Christian Petersen
  2004-03-09 15:42     ` Nikita Danilov
  2004-03-10  2:49     ` Nick Piggin
  2 siblings, 2 replies; 17+ messages in thread
From: Marc-Christian Petersen @ 2004-03-09 15:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nick Piggin, Linux Memory Management

[-- Attachment #1: Type: text/plain, Size: 135 bytes --]

On Tuesday 09 March 2004 06:35, Nick Piggin wrote:

Hi Nick,

seems the following patch is required ontop of your patches?

ciao, Marc

[-- Attachment #2: 002_03-vm-mapped-x-active-lists-1-fix.patch --]
[-- Type: text/x-diff, Size: 618 bytes --]

--- old/arch/i386/mm/hugetlbpage.c	2004-03-09 14:57:42.000000000 +0100
+++ new/arch/i386/mm/hugetlbpage.c	2004-03-09 15:36:15.000000000 +0100
@@ -411,8 +411,8 @@ static void update_and_free_page(struct 
 	htlbzone_pages--;
 	for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
 		map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
-				1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
-				1 << PG_private | 1<< PG_writeback);
+				1 << PG_dirty | 1 << PG_active_mapped | 1 << PG_active_unapped |
+				1 << PG_reserved | 1 << PG_private | 1<< PG_writeback);
 		set_page_count(map, 0);
 		map++;
 	}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09 15:26   ` Marc-Christian Petersen
@ 2004-03-09 15:42     ` Nikita Danilov
  2004-03-10  2:49     ` Nick Piggin
  1 sibling, 0 replies; 17+ messages in thread
From: Nikita Danilov @ 2004-03-09 15:42 UTC (permalink / raw)
  To: Marc-Christian Petersen
  Cc: linux-kernel, Nick Piggin, Linux Memory Management

Marc-Christian Petersen writes:
 > On Tuesday 09 March 2004 06:35, Nick Piggin wrote:
 > 
 > Hi Nick,
 > 
 > seems the following patch is required ontop of your patches?
 > 
 > ciao, Marc
 > --- old/arch/i386/mm/hugetlbpage.c	2004-03-09 14:57:42.000000000 +0100
 > +++ new/arch/i386/mm/hugetlbpage.c	2004-03-09 15:36:15.000000000 +0100
 > @@ -411,8 +411,8 @@ static void update_and_free_page(struct 
 >  	htlbzone_pages--;
 >  	for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) {
 >  		map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
 > -				1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
 > -				1 << PG_private | 1<< PG_writeback);
 > +				1 << PG_dirty | 1 << PG_active_mapped | 1 << PG_active_unapped |

PG_active_unapped?

 > +				1 << PG_reserved | 1 << PG_private | 1<< PG_writeback);
 >  		set_page_count(map, 0);
 >  		map++;
 >  	}

Nikita.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 4/4] vm-mapped-x-active-lists
  2004-03-09 15:26   ` Marc-Christian Petersen
  2004-03-09 15:42     ` Nikita Danilov
@ 2004-03-10  2:49     ` Nick Piggin
  1 sibling, 0 replies; 17+ messages in thread
From: Nick Piggin @ 2004-03-10  2:49 UTC (permalink / raw)
  To: Marc-Christian Petersen; +Cc: linux-kernel, Linux Memory Management



Marc-Christian Petersen wrote:

>On Tuesday 09 March 2004 06:35, Nick Piggin wrote:
>
>Hi Nick,
>
>seems the following patch is required ontop of your patches?
>
>

Hi Marc,
Yep thanks for that one. You're right of course, minus the typo.
It's funny, I made the same one in about 3 other places.

Nick


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] VM split active lists
  2004-03-09  5:32 [RFC][PATCH 0/4] VM split active lists Nick Piggin
                   ` (2 preceding siblings ...)
  2004-03-09  5:35 ` [RFC][PATCH 4/4] vm-mapped-x-active-lists Nick Piggin
@ 2004-03-10  5:10 ` Nick Piggin
  2004-03-12  9:58   ` Hans Reiser
  3 siblings, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2004-03-10  5:10 UTC (permalink / raw)
  To: linux-kernel, Linux Memory Management



Nick Piggin wrote:

>
> OK, the theory is that mapped pagecache pages are worth more than
> unmapped pages. This is a good theory because mapped pages will
> usually have far more random access patterns, so pagein *and* pageout
> will be much less efficient. Also, applications are probably coded to
> be more suited to blocking in read() than a random code / anon memory
> page. So a factor of >= 16 wouldn't be out of the question.
>

Just a followup - there is a small but significant bug in patch
#4/4. In shrink_zone, mapped_ratio should be divided by
nr_active_unmapped. I have this fixed, hugepage compile problems
fixed, and a mapped_page_cost tunable in place of swappiness. So
anyone interested in testing should please ask me for my latest
patch.

I'm getting some preliminary numbers now. They're pretty good,
looks like they should be similar to dont-rotate-active-list
which isn't too surprising.

Interestingly, mapped_page_cost of 8 is close to optimal for
swapping-kbuild throughput. Values of 4 and 16 are both worse.
mapped_page_cost is in units of unmapped page cost. Maybe it is
just me, but I find this scheme is more meaningful and provides
more control than swappiness.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] VM split active lists
  2004-03-10  5:10 ` [RFC][PATCH 0/4] VM split active lists Nick Piggin
@ 2004-03-12  9:58   ` Hans Reiser
  0 siblings, 0 replies; 17+ messages in thread
From: Hans Reiser @ 2004-03-12  9:58 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Linux Memory Management

I didn't review the code carefully, but it seems like a 
reasonable/better design overall.  Thanks for it.

-- 
Hans



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-03-12  9:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-09  5:32 [RFC][PATCH 0/4] VM split active lists Nick Piggin
2004-03-09  5:33 ` [RFC][PATCH 1/4] vm-lrutopage-cleanup Nick Piggin
2004-03-09  5:33   ` [RFC][PATCH 2/4] vm-nofixed-active-list Nick Piggin
2004-03-09  5:34 ` [RFC][PATCH 3/4] vm-no-reclaim_mapped Nick Piggin
2004-03-09  5:35 ` [RFC][PATCH 4/4] vm-mapped-x-active-lists Nick Piggin
2004-03-09  5:39   ` Nick Piggin
2004-03-09  5:47   ` Mike Fedyk
2004-03-09  6:06     ` Nick Piggin
2004-03-09  7:02       ` William Lee Irwin III
2004-03-09  7:23         ` Nick Piggin
2004-03-09  7:37           ` William Lee Irwin III
2004-03-09  9:24           ` William Lee Irwin III
2004-03-09 15:26   ` Marc-Christian Petersen
2004-03-09 15:42     ` Nikita Danilov
2004-03-10  2:49     ` Nick Piggin
2004-03-10  5:10 ` [RFC][PATCH 0/4] VM split active lists Nick Piggin
2004-03-12  9:58   ` Hans Reiser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).