All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Lumpy Reclaim V4
@ 2007-02-27 19:33 ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman

Following this email are five patches which represent the current
state of the lumpy reclaim patches; collectivly lumpy v4.  This
patch kit is designed as a complete drop-in replacement for the
lumpy patches in 2.6.20-mm2.  This stack is split out to show the
incremental changes in this version.  Andrew please replace your
current lumpy stack with this one, you may prefer to fold this kit
into a single patch lumpy-v4.

Comparitive testing between lumpy-v3 and lump-v4 generally shows a
small improvement, coming from the improved matching of pages taken
from the end of the active/inactive list (patch 2 in this series).

I have taken the lumpy-v2 patches and fixes as found in
2.6.20-rc6-mm2 and folded them back into a single patch (collectivly
lumpy v3), updating attribution.  On top of this are are four patches
which represent the updates mainly coming from the detailed review
comments from Andrew Morton:

lumpy-reclaim-v3: folded back base, lumpy-v3,

lumpy-isolate_lru_pages-wants-to-specifically-take-active-or-inactive-pages:
  ensure we take pages of the expected type from the end of
  the active/ inactive lists.  This is both a performance and
  correctness fix,

lumpy-ensure-that-we-compare-PageActive-and-active-safely: ensure
  comparisons between PageActive() and coded booleans are safe
  should PageActive() not return 1/0,

lumpy-update-commentry-on-subtle-comparisons-and-rounding-assumptions:
  update the code commentary to explain the subtle exit conditions, and

lumpy-only-check-for-valid-pages-when-holes-are-present:
  remove expensive check for invalid pages within MAX_ORDER blocks
  where those cannot exist.

Against: 2.6.20-mm2

-apw

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 0/5] Lumpy Reclaim V4
@ 2007-02-27 19:33 ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman

Following this email are five patches which represent the current
state of the lumpy reclaim patches; collectivly lumpy v4.  This
patch kit is designed as a complete drop-in replacement for the
lumpy patches in 2.6.20-mm2.  This stack is split out to show the
incremental changes in this version.  Andrew please replace your
current lumpy stack with this one, you may prefer to fold this kit
into a single patch lumpy-v4.

Comparitive testing between lumpy-v3 and lump-v4 generally shows a
small improvement, coming from the improved matching of pages taken
from the end of the active/inactive list (patch 2 in this series).

I have taken the lumpy-v2 patches and fixes as found in
2.6.20-rc6-mm2 and folded them back into a single patch (collectivly
lumpy v3), updating attribution.  On top of this are are four patches
which represent the updates mainly coming from the detailed review
comments from Andrew Morton:

lumpy-reclaim-v3: folded back base, lumpy-v3,

lumpy-isolate_lru_pages-wants-to-specifically-take-active-or-inactive-pages:
  ensure we take pages of the expected type from the end of
  the active/ inactive lists.  This is both a performance and
  correctness fix,

lumpy-ensure-that-we-compare-PageActive-and-active-safely: ensure
  comparisons between PageActive() and coded booleans are safe
  should PageActive() not return 1/0,

lumpy-update-commentry-on-subtle-comparisons-and-rounding-assumptions:
  update the code commentary to explain the subtle exit conditions, and

lumpy-only-check-for-valid-pages-when-holes-are-present:
  remove expensive check for invalid pages within MAX_ORDER blocks
  where those cannot exist.

Against: 2.6.20-mm2

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] Lumpy Reclaim V3
  2007-02-27 19:33 ` Andy Whitcroft
@ 2007-02-27 19:34   ` Andy Whitcroft
  -1 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman


When we are out of memory of a suitable size we enter reclaim.
The current reclaim algorithm targets pages in LRU order, which
is great for fairness but highly unsuitable if you desire pages at
higher orders.  To get pages of higher order we must shoot down a
very high proportion of memory; >95% in a lot of cases.

This patch set adds a lumpy reclaim algorithm to the allocator.
It targets groups of pages at the specified order anchored at the
end of the active and inactive lists.  This encourages groups of
pages at the requested orders to move from active to inactive,
and active to free lists.  This behaviour is only triggered out of
direct reclaim when higher order pages have been requested.

This patch set is particularly effective when utilised with
an anti-fragmentation scheme which groups pages of similar
reclaimability together.

This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch
which forms the foundation.

[akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]
[bunk@stusta.de: static declarations for internal functions]

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
diff --git a/fs/buffer.c b/fs/buffer.c
index 1b69d61..19d1d2c 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -363,7 +363,7 @@ static void free_more_memory(void)
 	for_each_online_pgdat(pgdat) {
 		zones = pgdat->node_zonelists[gfp_zone(GFP_NOFS)].zones;
 		if (*zones)
-			try_to_free_pages(zones, GFP_NOFS);
+			try_to_free_pages(zones, 0, GFP_NOFS);
 	}
 }
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index eae023a..382a30b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -190,7 +190,8 @@ extern int rotate_reclaimable_page(struct page *page);
 extern void swap_setup(void);
 
 /* linux/mm/vmscan.c */
-extern unsigned long try_to_free_pages(struct zone **, gfp_t);
+extern unsigned long try_to_free_pages(struct zone **zones, int order,
+					gfp_t gfp_mask);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 96451c5..887e435 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1415,7 +1415,7 @@ nofail_alloc:
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	did_some_progress = try_to_free_pages(zonelist->zones, gfp_mask);
+	did_some_progress = try_to_free_pages(zonelist->zones, order, gfp_mask);
 
 	p->reclaim_state = NULL;
 	p->flags &= ~PF_MEMALLOC;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2c8f197..f15ffcb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -68,6 +68,8 @@ struct scan_control {
 	int swappiness;
 
 	int all_unreclaimable;
+
+	int order;
 };
 
 /*
@@ -617,6 +619,36 @@ keep:
 }
 
 /*
+ * Attempt to remove the specified page from its LRU.  Only take this page
+ * if it is of the appropriate PageActive status.  Pages which are being
+ * freed elsewhere are also ignored.
+ *
+ * page:	page to consider
+ * active:	active/inactive flag only take pages of this type
+ *
+ * returns 0 on success, -ve errno on failure.
+ */
+static int __isolate_lru_page(struct page *page, int active)
+{
+	int ret = -EINVAL;
+
+	if (PageLRU(page) && (PageActive(page) == active)) {
+		ret = -EBUSY;
+		if (likely(get_page_unless_zero(page))) {
+			/*
+			 * Be careful not to clear PageLRU until after we're
+			 * sure the page is not being freed elsewhere -- the
+			 * page release code relies on it.
+			 */
+			ClearPageLRU(page);
+			ret = 0;
+		}
+	}
+
+	return ret;
+}
+
+/*
  * zone->lru_lock is heavily contended.  Some of the functions that
  * shrink the lists perform better by taking out a batch of pages
  * and working on them outside the LRU lock.
@@ -630,38 +662,83 @@ keep:
  * @src:	The LRU list to pull pages off.
  * @dst:	The temp list to put pages on to.
  * @scanned:	The number of pages that were scanned.
+ * @order:	The caller's attempted allocation order
  *
  * returns how many pages were moved onto *@dst.
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		struct list_head *src, struct list_head *dst,
-		unsigned long *scanned)
+		unsigned long *scanned, int order)
 {
 	unsigned long nr_taken = 0;
-	struct page *page;
 	unsigned long scan;
 
 	for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
-		struct list_head *target;
+		struct page *page;
+		unsigned long pfn;
+		unsigned long end_pfn;
+		unsigned long page_pfn;
+		int active;
+		int zone_id;
+
 		page = lru_to_page(src);
 		prefetchw_prev_lru_page(page, src, flags);
 
 		VM_BUG_ON(!PageLRU(page));
 
-		list_del(&page->lru);
-		target = src;
-		if (likely(get_page_unless_zero(page))) {
-			/*
-			 * Be careful not to clear PageLRU until after we're
-			 * sure the page is not being freed elsewhere -- the
-			 * page release code relies on it.
-			 */
-			ClearPageLRU(page);
-			target = dst;
+		active = PageActive(page);
+		switch (__isolate_lru_page(page, active)) {
+		case 0:
+			list_move(&page->lru, dst);
 			nr_taken++;
-		} /* else it is being freed elsewhere */
+			break;
+
+		case -EBUSY:
+			/* else it is being freed elsewhere */
+			list_move(&page->lru, src);
+			continue;
+
+		default:
+			BUG();
+		}
+
+		if (!order)
+			continue;
+
+		/*
+		 * Attempt to take all pages in the order aligned region
+		 * surrounding the tag page.  Only take those pages of
+		 * the same active state as that tag page.
+		 */
+		zone_id = page_zone_id(page);
+		page_pfn = page_to_pfn(page);
+		pfn = page_pfn & ~((1 << order) - 1);
+		end_pfn = pfn + (1 << order);
+		for (; pfn < end_pfn; pfn++) {
+			struct page *cursor_page;
+
+			if (unlikely(pfn == page_pfn))
+				continue;
+			if (unlikely(!pfn_valid(pfn)))
+				break;
 
-		list_add(&page->lru, target);
+			cursor_page = pfn_to_page(pfn);
+			if (unlikely(page_zone_id(cursor_page) != zone_id))
+				continue;
+			scan++;
+			switch (__isolate_lru_page(cursor_page, active)) {
+			case 0:
+				list_move(&cursor_page->lru, dst);
+				nr_taken++;
+				break;
+
+			case -EBUSY:
+				/* else it is being freed elsewhere */
+				list_move(&cursor_page->lru, src);
+			default:
+				break;
+			}
+		}
 	}
 
 	*scanned = scan;
@@ -692,7 +769,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 
 		nr_taken = isolate_lru_pages(sc->swap_cluster_max,
 					     &zone->inactive_list,
-					     &page_list, &nr_scan);
+					     &page_list, &nr_scan, sc->order);
 		__mod_zone_page_state(zone, NR_INACTIVE, -nr_taken);
 		zone->pages_scanned += nr_scan;
 		zone->total_scanned += nr_scan;
@@ -839,7 +916,7 @@ force_reclaim_mapped:
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 	pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
-				    &l_hold, &pgscanned);
+				    &l_hold, &pgscanned, sc->order);
 	zone->pages_scanned += pgscanned;
 	__mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
 	spin_unlock_irq(&zone->lru_lock);
@@ -1030,7 +1107,7 @@ static unsigned long shrink_zones(int priority, struct zone **zones,
  * holds filesystem locks which prevent writeout this might not work, and the
  * allocation attempt will fail.
  */
-unsigned long try_to_free_pages(struct zone **zones, gfp_t gfp_mask)
+unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask)
 {
 	int plugdepth;
 	int priority;
@@ -1046,6 +1123,7 @@ unsigned long try_to_free_pages(struct zone **zones, gfp_t gfp_mask)
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.may_swap = 1,
 		.swappiness = vm_swappiness,
+		.order = order,
 	};
 
 	delay_swap_prefetch();

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/5] Lumpy Reclaim V3
@ 2007-02-27 19:34   ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman

When we are out of memory of a suitable size we enter reclaim.
The current reclaim algorithm targets pages in LRU order, which
is great for fairness but highly unsuitable if you desire pages at
higher orders.  To get pages of higher order we must shoot down a
very high proportion of memory; >95% in a lot of cases.

This patch set adds a lumpy reclaim algorithm to the allocator.
It targets groups of pages at the specified order anchored at the
end of the active and inactive lists.  This encourages groups of
pages at the requested orders to move from active to inactive,
and active to free lists.  This behaviour is only triggered out of
direct reclaim when higher order pages have been requested.

This patch set is particularly effective when utilised with
an anti-fragmentation scheme which groups pages of similar
reclaimability together.

This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch
which forms the foundation.

[akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]
[bunk@stusta.de: static declarations for internal functions]

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
diff --git a/fs/buffer.c b/fs/buffer.c
index 1b69d61..19d1d2c 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -363,7 +363,7 @@ static void free_more_memory(void)
 	for_each_online_pgdat(pgdat) {
 		zones = pgdat->node_zonelists[gfp_zone(GFP_NOFS)].zones;
 		if (*zones)
-			try_to_free_pages(zones, GFP_NOFS);
+			try_to_free_pages(zones, 0, GFP_NOFS);
 	}
 }
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index eae023a..382a30b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -190,7 +190,8 @@ extern int rotate_reclaimable_page(struct page *page);
 extern void swap_setup(void);
 
 /* linux/mm/vmscan.c */
-extern unsigned long try_to_free_pages(struct zone **, gfp_t);
+extern unsigned long try_to_free_pages(struct zone **zones, int order,
+					gfp_t gfp_mask);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 96451c5..887e435 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1415,7 +1415,7 @@ nofail_alloc:
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	did_some_progress = try_to_free_pages(zonelist->zones, gfp_mask);
+	did_some_progress = try_to_free_pages(zonelist->zones, order, gfp_mask);
 
 	p->reclaim_state = NULL;
 	p->flags &= ~PF_MEMALLOC;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2c8f197..f15ffcb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -68,6 +68,8 @@ struct scan_control {
 	int swappiness;
 
 	int all_unreclaimable;
+
+	int order;
 };
 
 /*
@@ -617,6 +619,36 @@ keep:
 }
 
 /*
+ * Attempt to remove the specified page from its LRU.  Only take this page
+ * if it is of the appropriate PageActive status.  Pages which are being
+ * freed elsewhere are also ignored.
+ *
+ * page:	page to consider
+ * active:	active/inactive flag only take pages of this type
+ *
+ * returns 0 on success, -ve errno on failure.
+ */
+static int __isolate_lru_page(struct page *page, int active)
+{
+	int ret = -EINVAL;
+
+	if (PageLRU(page) && (PageActive(page) == active)) {
+		ret = -EBUSY;
+		if (likely(get_page_unless_zero(page))) {
+			/*
+			 * Be careful not to clear PageLRU until after we're
+			 * sure the page is not being freed elsewhere -- the
+			 * page release code relies on it.
+			 */
+			ClearPageLRU(page);
+			ret = 0;
+		}
+	}
+
+	return ret;
+}
+
+/*
  * zone->lru_lock is heavily contended.  Some of the functions that
  * shrink the lists perform better by taking out a batch of pages
  * and working on them outside the LRU lock.
@@ -630,38 +662,83 @@ keep:
  * @src:	The LRU list to pull pages off.
  * @dst:	The temp list to put pages on to.
  * @scanned:	The number of pages that were scanned.
+ * @order:	The caller's attempted allocation order
  *
  * returns how many pages were moved onto *@dst.
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		struct list_head *src, struct list_head *dst,
-		unsigned long *scanned)
+		unsigned long *scanned, int order)
 {
 	unsigned long nr_taken = 0;
-	struct page *page;
 	unsigned long scan;
 
 	for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
-		struct list_head *target;
+		struct page *page;
+		unsigned long pfn;
+		unsigned long end_pfn;
+		unsigned long page_pfn;
+		int active;
+		int zone_id;
+
 		page = lru_to_page(src);
 		prefetchw_prev_lru_page(page, src, flags);
 
 		VM_BUG_ON(!PageLRU(page));
 
-		list_del(&page->lru);
-		target = src;
-		if (likely(get_page_unless_zero(page))) {
-			/*
-			 * Be careful not to clear PageLRU until after we're
-			 * sure the page is not being freed elsewhere -- the
-			 * page release code relies on it.
-			 */
-			ClearPageLRU(page);
-			target = dst;
+		active = PageActive(page);
+		switch (__isolate_lru_page(page, active)) {
+		case 0:
+			list_move(&page->lru, dst);
 			nr_taken++;
-		} /* else it is being freed elsewhere */
+			break;
+
+		case -EBUSY:
+			/* else it is being freed elsewhere */
+			list_move(&page->lru, src);
+			continue;
+
+		default:
+			BUG();
+		}
+
+		if (!order)
+			continue;
+
+		/*
+		 * Attempt to take all pages in the order aligned region
+		 * surrounding the tag page.  Only take those pages of
+		 * the same active state as that tag page.
+		 */
+		zone_id = page_zone_id(page);
+		page_pfn = page_to_pfn(page);
+		pfn = page_pfn & ~((1 << order) - 1);
+		end_pfn = pfn + (1 << order);
+		for (; pfn < end_pfn; pfn++) {
+			struct page *cursor_page;
+
+			if (unlikely(pfn == page_pfn))
+				continue;
+			if (unlikely(!pfn_valid(pfn)))
+				break;
 
-		list_add(&page->lru, target);
+			cursor_page = pfn_to_page(pfn);
+			if (unlikely(page_zone_id(cursor_page) != zone_id))
+				continue;
+			scan++;
+			switch (__isolate_lru_page(cursor_page, active)) {
+			case 0:
+				list_move(&cursor_page->lru, dst);
+				nr_taken++;
+				break;
+
+			case -EBUSY:
+				/* else it is being freed elsewhere */
+				list_move(&cursor_page->lru, src);
+			default:
+				break;
+			}
+		}
 	}
 
 	*scanned = scan;
@@ -692,7 +769,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 
 		nr_taken = isolate_lru_pages(sc->swap_cluster_max,
 					     &zone->inactive_list,
-					     &page_list, &nr_scan);
+					     &page_list, &nr_scan, sc->order);
 		__mod_zone_page_state(zone, NR_INACTIVE, -nr_taken);
 		zone->pages_scanned += nr_scan;
 		zone->total_scanned += nr_scan;
@@ -839,7 +916,7 @@ force_reclaim_mapped:
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 	pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
-				    &l_hold, &pgscanned);
+				    &l_hold, &pgscanned, sc->order);
 	zone->pages_scanned += pgscanned;
 	__mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
 	spin_unlock_irq(&zone->lru_lock);
@@ -1030,7 +1107,7 @@ static unsigned long shrink_zones(int priority, struct zone **zones,
  * holds filesystem locks which prevent writeout this might not work, and the
  * allocation attempt will fail.
  */
-unsigned long try_to_free_pages(struct zone **zones, gfp_t gfp_mask)
+unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask)
 {
 	int plugdepth;
 	int priority;
@@ -1046,6 +1123,7 @@ unsigned long try_to_free_pages(struct zone **zones, gfp_t gfp_mask)
 		.swap_cluster_max = SWAP_CLUSTER_MAX,
 		.may_swap = 1,
 		.swappiness = vm_swappiness,
+		.order = order,
 	};
 
 	delay_swap_prefetch();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] lumpy: isolate_lru_pages wants to specifically take active or inactive pages
  2007-02-27 19:33 ` Andy Whitcroft
@ 2007-02-27 19:34   ` Andy Whitcroft
  -1 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman


The caller of isolate_lru_pages specifically knows whether it wants
to take either inactive or active pages.  Currently we take the
state of the LRU page at hand and use that to scan for matching
pages in the order sized block.  If that page is transiting we
can scan for the wrong type.  The caller knows what they want and
should be telling us.  Pass in the required active/inactive state
and match against that.

Note, that now we pass the expected active state when scanning the
active/inactive lists we may find missmatching target pages, pages
which are in the process of changing state.  This is no longer an
error and we should simply ignore them.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f15ffcb..b878d54 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -663,12 +663,13 @@ static int __isolate_lru_page(struct page *page, int active)
  * @dst:	The temp list to put pages on to.
  * @scanned:	The number of pages that were scanned.
  * @order:	The caller's attempted allocation order
+ * @active:	The caller's trying to obtain active or inactive pages
  *
  * returns how many pages were moved onto *@dst.
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		struct list_head *src, struct list_head *dst,
-		unsigned long *scanned, int order)
+		unsigned long *scanned, int order, int active)
 {
 	unsigned long nr_taken = 0;
 	unsigned long scan;
@@ -678,7 +679,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		unsigned long pfn;
 		unsigned long end_pfn;
 		unsigned long page_pfn;
-		int active;
 		int zone_id;
 
 		page = lru_to_page(src);
@@ -686,20 +686,16 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 		VM_BUG_ON(!PageLRU(page));
 
-		active = PageActive(page);
 		switch (__isolate_lru_page(page, active)) {
 		case 0:
 			list_move(&page->lru, dst);
 			nr_taken++;
 			break;
 
-		case -EBUSY:
-			/* else it is being freed elsewhere */
+		default:
+			/* page is being freed, or is a missmatch */
 			list_move(&page->lru, src);
 			continue;
-
-		default:
-			BUG();
 		}
 
 		if (!order)
@@ -768,8 +764,8 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 		unsigned long nr_freed;
 
 		nr_taken = isolate_lru_pages(sc->swap_cluster_max,
-					     &zone->inactive_list,
-					     &page_list, &nr_scan, sc->order);
+				     &zone->inactive_list,
+				     &page_list, &nr_scan, sc->order, 0);
 		__mod_zone_page_state(zone, NR_INACTIVE, -nr_taken);
 		zone->pages_scanned += nr_scan;
 		zone->total_scanned += nr_scan;
@@ -916,7 +912,7 @@ force_reclaim_mapped:
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 	pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
-				    &l_hold, &pgscanned, sc->order);
+				    &l_hold, &pgscanned, sc->order, 1);
 	zone->pages_scanned += pgscanned;
 	__mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
 	spin_unlock_irq(&zone->lru_lock);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] lumpy: isolate_lru_pages wants to specifically take active or inactive pages
@ 2007-02-27 19:34   ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman

The caller of isolate_lru_pages specifically knows whether it wants
to take either inactive or active pages.  Currently we take the
state of the LRU page at hand and use that to scan for matching
pages in the order sized block.  If that page is transiting we
can scan for the wrong type.  The caller knows what they want and
should be telling us.  Pass in the required active/inactive state
and match against that.

Note, that now we pass the expected active state when scanning the
active/inactive lists we may find missmatching target pages, pages
which are in the process of changing state.  This is no longer an
error and we should simply ignore them.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f15ffcb..b878d54 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -663,12 +663,13 @@ static int __isolate_lru_page(struct page *page, int active)
  * @dst:	The temp list to put pages on to.
  * @scanned:	The number of pages that were scanned.
  * @order:	The caller's attempted allocation order
+ * @active:	The caller's trying to obtain active or inactive pages
  *
  * returns how many pages were moved onto *@dst.
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		struct list_head *src, struct list_head *dst,
-		unsigned long *scanned, int order)
+		unsigned long *scanned, int order, int active)
 {
 	unsigned long nr_taken = 0;
 	unsigned long scan;
@@ -678,7 +679,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		unsigned long pfn;
 		unsigned long end_pfn;
 		unsigned long page_pfn;
-		int active;
 		int zone_id;
 
 		page = lru_to_page(src);
@@ -686,20 +686,16 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 		VM_BUG_ON(!PageLRU(page));
 
-		active = PageActive(page);
 		switch (__isolate_lru_page(page, active)) {
 		case 0:
 			list_move(&page->lru, dst);
 			nr_taken++;
 			break;
 
-		case -EBUSY:
-			/* else it is being freed elsewhere */
+		default:
+			/* page is being freed, or is a missmatch */
 			list_move(&page->lru, src);
 			continue;
-
-		default:
-			BUG();
 		}
 
 		if (!order)
@@ -768,8 +764,8 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 		unsigned long nr_freed;
 
 		nr_taken = isolate_lru_pages(sc->swap_cluster_max,
-					     &zone->inactive_list,
-					     &page_list, &nr_scan, sc->order);
+				     &zone->inactive_list,
+				     &page_list, &nr_scan, sc->order, 0);
 		__mod_zone_page_state(zone, NR_INACTIVE, -nr_taken);
 		zone->pages_scanned += nr_scan;
 		zone->total_scanned += nr_scan;
@@ -916,7 +912,7 @@ force_reclaim_mapped:
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 	pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
-				    &l_hold, &pgscanned, sc->order);
+				    &l_hold, &pgscanned, sc->order, 1);
 	zone->pages_scanned += pgscanned;
 	__mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
 	spin_unlock_irq(&zone->lru_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] lumpy: ensure that we compare PageActive and active safely
  2007-02-27 19:33 ` Andy Whitcroft
@ 2007-02-27 19:35   ` Andy Whitcroft
  -1 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman


Now that we are passing in a boolean active flag we need to
ensure that the result of PageActive(page) is comparible
to that boolean.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b878d54..2bfad79 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -632,7 +632,12 @@ static int __isolate_lru_page(struct page *page, int active)
 {
 	int ret = -EINVAL;
 
-	if (PageLRU(page) && (PageActive(page) == active)) {
+	/*
+	 * When checking the active state, we need to be sure we are
+	 * dealing with comparible boolean values.  Take the logical not
+	 * of each.
+	 */
+	if (PageLRU(page) && (!PageActive(page) == !active)) {
 		ret = -EBUSY;
 		if (likely(get_page_unless_zero(page))) {
 			/*

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] lumpy: ensure that we compare PageActive and active safely
@ 2007-02-27 19:35   ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman

Now that we are passing in a boolean active flag we need to
ensure that the result of PageActive(page) is comparible
to that boolean.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b878d54..2bfad79 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -632,7 +632,12 @@ static int __isolate_lru_page(struct page *page, int active)
 {
 	int ret = -EINVAL;
 
-	if (PageLRU(page) && (PageActive(page) == active)) {
+	/*
+	 * When checking the active state, we need to be sure we are
+	 * dealing with comparible boolean values.  Take the logical not
+	 * of each.
+	 */
+	if (PageLRU(page) && (!PageActive(page) == !active)) {
 		ret = -EBUSY;
 		if (likely(get_page_unless_zero(page))) {
 			/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] lumpy: update commentry on subtle comparisons and rounding assumptions
  2007-02-27 19:33 ` Andy Whitcroft
@ 2007-02-27 19:35   ` Andy Whitcroft
  -1 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman


We have a number of subtle comparisons when scanning a block, and
we make use of a lot of buddy mem_map guarentees.  Add commentary about
each.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2bfad79..bef7e92 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -709,7 +709,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		/*
 		 * Attempt to take all pages in the order aligned region
 		 * surrounding the tag page.  Only take those pages of
-		 * the same active state as that tag page.
+		 * the same active state as that tag page.  We may safely
+		 * round the target page pfn down to the requested order
+		 * as the mem_map is guarenteed valid out to MAX_ORDER,
+		 * where that page is in a different zone we will detect
+		 * it from its zone id and abort this block scan.
 		 */
 		zone_id = page_zone_id(page);
 		page_pfn = page_to_pfn(page);
@@ -718,12 +722,15 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		for (; pfn < end_pfn; pfn++) {
 			struct page *cursor_page;
 
+			/* The target page is in the block, ignore it. */
 			if (unlikely(pfn == page_pfn))
 				continue;
+			/* Avoid holes within the zone. */
 			if (unlikely(!pfn_valid(pfn)))
 				break;
 
 			cursor_page = pfn_to_page(pfn);
+			/* Check that we have not crossed a zone boundary. */
 			if (unlikely(page_zone_id(cursor_page) != zone_id))
 				continue;
 			scan++;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] lumpy: update commentry on subtle comparisons and rounding assumptions
@ 2007-02-27 19:35   ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman

We have a number of subtle comparisons when scanning a block, and
we make use of a lot of buddy mem_map guarentees.  Add commentary about
each.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2bfad79..bef7e92 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -709,7 +709,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		/*
 		 * Attempt to take all pages in the order aligned region
 		 * surrounding the tag page.  Only take those pages of
-		 * the same active state as that tag page.
+		 * the same active state as that tag page.  We may safely
+		 * round the target page pfn down to the requested order
+		 * as the mem_map is guarenteed valid out to MAX_ORDER,
+		 * where that page is in a different zone we will detect
+		 * it from its zone id and abort this block scan.
 		 */
 		zone_id = page_zone_id(page);
 		page_pfn = page_to_pfn(page);
@@ -718,12 +722,15 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		for (; pfn < end_pfn; pfn++) {
 			struct page *cursor_page;
 
+			/* The target page is in the block, ignore it. */
 			if (unlikely(pfn == page_pfn))
 				continue;
+			/* Avoid holes within the zone. */
 			if (unlikely(!pfn_valid(pfn)))
 				break;
 
 			cursor_page = pfn_to_page(pfn);
+			/* Check that we have not crossed a zone boundary. */
 			if (unlikely(page_zone_id(cursor_page) != zone_id))
 				continue;
 			scan++;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] lumpy: only check for valid pages when holes are present
  2007-02-27 19:33 ` Andy Whitcroft
@ 2007-02-27 19:36   ` Andy Whitcroft
  -1 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman


We only need to check that each page is valid with pfn_valid when
we are on an architecture which had holes within zones.  Make this
check conditional.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bef7e92..f249ad7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -725,9 +725,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			/* The target page is in the block, ignore it. */
 			if (unlikely(pfn == page_pfn))
 				continue;
+#ifdef CONFIG_HOLES_IN_ZONE
 			/* Avoid holes within the zone. */
 			if (unlikely(!pfn_valid(pfn)))
 				break;
+#endif
 
 			cursor_page = pfn_to_page(pfn);
 			/* Check that we have not crossed a zone boundary. */

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] lumpy: only check for valid pages when holes are present
@ 2007-02-27 19:36   ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-02-27 19:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, Andy Whitcroft, Mel Gorman

We only need to check that each page is valid with pfn_valid when
we are on an architecture which had holes within zones.  Make this
check conditional.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bef7e92..f249ad7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -725,9 +725,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			/* The target page is in the block, ignore it. */
 			if (unlikely(pfn == page_pfn))
 				continue;
+#ifdef CONFIG_HOLES_IN_ZONE
 			/* Avoid holes within the zone. */
 			if (unlikely(!pfn_valid(pfn)))
 				break;
+#endif
 
 			cursor_page = pfn_to_page(pfn);
 			/* Check that we have not crossed a zone boundary. */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] Lumpy Reclaim V3
  2007-02-27 19:34   ` Andy Whitcroft
@ 2007-02-28 18:14     ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2007-02-28 18:14 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman

On Tue, 27 Feb 2007, Andy Whitcroft wrote:

> +static int __isolate_lru_page(struct page *page, int active)
> +{
> +	int ret = -EINVAL;
> +
> +	if (PageLRU(page) && (PageActive(page) == active)) {
> +		ret = -EBUSY;
> +		if (likely(get_page_unless_zero(page))) {
> +			/*
> +			 * Be careful not to clear PageLRU until after we're
> +			 * sure the page is not being freed elsewhere -- the
> +			 * page release code relies on it.
> +			 */
> +			ClearPageLRU(page);
> +			ret = 0;

Is that really necessary? PageLRU is clear when a page is freed right? 
And clearing PageLRU requires the zone->lru_lock since we have to move it 
off the LRU.

> -			ClearPageLRU(page);
> -			target = dst;
> +		active = PageActive(page);

Why are we saving the active state? Page cannot be moved between LRUs 
while we hold the lru lock anyways.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] Lumpy Reclaim V3
@ 2007-02-28 18:14     ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2007-02-28 18:14 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman

On Tue, 27 Feb 2007, Andy Whitcroft wrote:

> +static int __isolate_lru_page(struct page *page, int active)
> +{
> +	int ret = -EINVAL;
> +
> +	if (PageLRU(page) && (PageActive(page) == active)) {
> +		ret = -EBUSY;
> +		if (likely(get_page_unless_zero(page))) {
> +			/*
> +			 * Be careful not to clear PageLRU until after we're
> +			 * sure the page is not being freed elsewhere -- the
> +			 * page release code relies on it.
> +			 */
> +			ClearPageLRU(page);
> +			ret = 0;

Is that really necessary? PageLRU is clear when a page is freed right? 
And clearing PageLRU requires the zone->lru_lock since we have to move it 
off the LRU.

> -			ClearPageLRU(page);
> -			target = dst;
> +		active = PageActive(page);

Why are we saving the active state? Page cannot be moved between LRUs 
while we hold the lru lock anyways.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] lumpy: isolate_lru_pages wants to specifically take active or inactive pages
  2007-02-27 19:34   ` Andy Whitcroft
@ 2007-02-28 18:17     ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2007-02-28 18:17 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman

On Tue, 27 Feb 2007, Andy Whitcroft wrote:

> The caller of isolate_lru_pages specifically knows whether it wants
> to take either inactive or active pages.  Currently we take the
> state of the LRU page at hand and use that to scan for matching
> pages in the order sized block.  If that page is transiting we
> can scan for the wrong type.  The caller knows what they want and
> should be telling us.  Pass in the required active/inactive state
> and match against that.

The page cannot be transiting since we hold the lru lock?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] lumpy: isolate_lru_pages wants to specifically take active or inactive pages
@ 2007-02-28 18:17     ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2007-02-28 18:17 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman

On Tue, 27 Feb 2007, Andy Whitcroft wrote:

> The caller of isolate_lru_pages specifically knows whether it wants
> to take either inactive or active pages.  Currently we take the
> state of the LRU page at hand and use that to scan for matching
> pages in the order sized block.  If that page is transiting we
> can scan for the wrong type.  The caller knows what they want and
> should be telling us.  Pass in the required active/inactive state
> and match against that.

The page cannot be transiting since we hold the lru lock?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] Lumpy Reclaim V3
  2007-02-28 18:14     ` Christoph Lameter
@ 2007-03-02 15:45       ` Andy Whitcroft
  -1 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-03-02 15:45 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman

Christoph Lameter wrote:
> On Tue, 27 Feb 2007, Andy Whitcroft wrote:
> 
>> +static int __isolate_lru_page(struct page *page, int active)
>> +{
>> +	int ret = -EINVAL;
>> +
>> +	if (PageLRU(page) && (PageActive(page) == active)) {
>> +		ret = -EBUSY;
>> +		if (likely(get_page_unless_zero(page))) {
>> +			/*
>> +			 * Be careful not to clear PageLRU until after we're
>> +			 * sure the page is not being freed elsewhere -- the
>> +			 * page release code relies on it.
>> +			 */
>> +			ClearPageLRU(page);
>> +			ret = 0;
> 
> Is that really necessary? PageLRU is clear when a page is freed right? 
> And clearing PageLRU requires the zone->lru_lock since we have to move it 
> off the LRU.

Although the PageLRU is stable as we have the zone->lru_lock we cannot
take the page off the LRU unless we can take a reference to it
preventing it from being released.  The page release code relies on the
the caller who takes the reference count to zero having exclusive access
to the page to release it.  To prevent a race with
put_page/__page_cache_release we cannot touch the page once its count
drops to zero, which occurs before the PageLRU is cleared.  PageLRU
being sampled outside the zone->lru_lock in that path to avoid taking
the lock if not required.

> 
>> -			ClearPageLRU(page);
>> -			target = dst;
>> +		active = PageActive(page);
> 
> Why are we saving the active state? Page cannot be moved between LRUs 
> while we hold the lru lock anyways.

This is used later in the function for comparison against all of the
other pages in the 'order' sized area rooted about the target page.  Its
mearly an optimisation.

-apw


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] Lumpy Reclaim V3
@ 2007-03-02 15:45       ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-03-02 15:45 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman

Christoph Lameter wrote:
> On Tue, 27 Feb 2007, Andy Whitcroft wrote:
> 
>> +static int __isolate_lru_page(struct page *page, int active)
>> +{
>> +	int ret = -EINVAL;
>> +
>> +	if (PageLRU(page) && (PageActive(page) == active)) {
>> +		ret = -EBUSY;
>> +		if (likely(get_page_unless_zero(page))) {
>> +			/*
>> +			 * Be careful not to clear PageLRU until after we're
>> +			 * sure the page is not being freed elsewhere -- the
>> +			 * page release code relies on it.
>> +			 */
>> +			ClearPageLRU(page);
>> +			ret = 0;
> 
> Is that really necessary? PageLRU is clear when a page is freed right? 
> And clearing PageLRU requires the zone->lru_lock since we have to move it 
> off the LRU.

Although the PageLRU is stable as we have the zone->lru_lock we cannot
take the page off the LRU unless we can take a reference to it
preventing it from being released.  The page release code relies on the
the caller who takes the reference count to zero having exclusive access
to the page to release it.  To prevent a race with
put_page/__page_cache_release we cannot touch the page once its count
drops to zero, which occurs before the PageLRU is cleared.  PageLRU
being sampled outside the zone->lru_lock in that path to avoid taking
the lock if not required.

> 
>> -			ClearPageLRU(page);
>> -			target = dst;
>> +		active = PageActive(page);
> 
> Why are we saving the active state? Page cannot be moved between LRUs 
> while we hold the lru lock anyways.

This is used later in the function for comparison against all of the
other pages in the 'order' sized area rooted about the target page.  Its
mearly an optimisation.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] lumpy: isolate_lru_pages wants to specifically take active or inactive pages
  2007-02-28 18:17     ` Christoph Lameter
@ 2007-03-02 15:57       ` Andy Whitcroft
  -1 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-03-02 15:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman

Christoph Lameter wrote:
> On Tue, 27 Feb 2007, Andy Whitcroft wrote:
> 
>> The caller of isolate_lru_pages specifically knows whether it wants
>> to take either inactive or active pages.  Currently we take the
>> state of the LRU page at hand and use that to scan for matching
>> pages in the order sized block.  If that page is transiting we
>> can scan for the wrong type.  The caller knows what they want and
>> should be telling us.  Pass in the required active/inactive state
>> and match against that.
> 
> The page cannot be transiting since we hold the lru lock?

As you say it should be gated by lru_lock and we should not expect to
see pages with the wrong type on the list.  I would swear that I was
seeing pages on the wrong list there for a bit in testing and mistakenly
thought they were in transition.  A quick review at least says thats
false.  So I'll reinstate the BUG() and retest to see if I am smoking
crack or there is a bigger bug out there.

-apw

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] lumpy: isolate_lru_pages wants to specifically take active or inactive pages
@ 2007-03-02 15:57       ` Andy Whitcroft
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Whitcroft @ 2007-03-02 15:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman

Christoph Lameter wrote:
> On Tue, 27 Feb 2007, Andy Whitcroft wrote:
> 
>> The caller of isolate_lru_pages specifically knows whether it wants
>> to take either inactive or active pages.  Currently we take the
>> state of the LRU page at hand and use that to scan for matching
>> pages in the order sized block.  If that page is transiting we
>> can scan for the wrong type.  The caller knows what they want and
>> should be telling us.  Pass in the required active/inactive state
>> and match against that.
> 
> The page cannot be transiting since we hold the lru lock?

As you say it should be gated by lru_lock and we should not expect to
see pages with the wrong type on the list.  I would swear that I was
seeing pages on the wrong list there for a bit in testing and mistakenly
thought they were in transition.  A quick review at least says thats
false.  So I'll reinstate the BUG() and retest to see if I am smoking
crack or there is a bigger bug out there.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-03-02 15:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-27 19:33 [PATCH 0/5] Lumpy Reclaim V4 Andy Whitcroft
2007-02-27 19:33 ` Andy Whitcroft
2007-02-27 19:34 ` [PATCH 1/5] Lumpy Reclaim V3 Andy Whitcroft
2007-02-27 19:34   ` Andy Whitcroft
2007-02-28 18:14   ` Christoph Lameter
2007-02-28 18:14     ` Christoph Lameter
2007-03-02 15:45     ` Andy Whitcroft
2007-03-02 15:45       ` Andy Whitcroft
2007-02-27 19:34 ` [PATCH 2/5] lumpy: isolate_lru_pages wants to specifically take active or inactive pages Andy Whitcroft
2007-02-27 19:34   ` Andy Whitcroft
2007-02-28 18:17   ` Christoph Lameter
2007-02-28 18:17     ` Christoph Lameter
2007-03-02 15:57     ` Andy Whitcroft
2007-03-02 15:57       ` Andy Whitcroft
2007-02-27 19:35 ` [PATCH 3/5] lumpy: ensure that we compare PageActive and active safely Andy Whitcroft
2007-02-27 19:35   ` Andy Whitcroft
2007-02-27 19:35 ` [PATCH 4/5] lumpy: update commentry on subtle comparisons and rounding assumptions Andy Whitcroft
2007-02-27 19:35   ` Andy Whitcroft
2007-02-27 19:36 ` [PATCH 5/5] lumpy: only check for valid pages when holes are present Andy Whitcroft
2007-02-27 19:36   ` Andy Whitcroft

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.