[PATCH 0/3] Reduce searching in the page allocator fast-path

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Reduce searching in the page allocator fast-path
@ 2009-08-28  8:44 ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
	Linux Kernel Mailing List, Mel Gorman

The following two patches remove searching in the page allocator fast-path
by maintaining multiple free-lists in the per-cpu structure. At the time the
search was introduced, increasing the per-cpu structures would waste a lot of
memory as per-cpu structures were statically allocated at compile-time. This
is no longer the case.

The patches are as follows. They are based on mmotm-2009-08-27.

Patch 1 adds multiple lists to struct per_cpu_pages, one per
	migratetype that can be stored on the PCP lists.

Patch 2 notes that the pcpu drain path check empty lists multiple times. The
	patch reduces the number of checks by maintaining a count of free
	lists encountered. Lists containing pages will then free multiple
	pages in batch

The patches were tested with kernbench, netperf udp/tcp, hackbench and
sysbench.  The netperf tests were not bound to any CPU in particular and
were run such that the results should be 99% confidence that the reported
results are within 1% of the estimated mean. sysbench was run with a postgres
background and read-only tests. Similar to netperf, it was run multiple
times so that it's 99% confidence results are within 1%. The patches were
tested on x86, x86-64 and ppc64 as

x86:	Intel Pentium D 3GHz with 8G RAM (no-brand machine)
	kernbench	- No significant difference, variance well within noise
	netperf-udp	- 1.34% to 2.28% gain
	netperf-tcp	- 0.45% to 1.22% gain
	hackbench	- Small variances, very close to noise
	sysbench	- Very small gains

x86-64:	AMD Phenom 9950 1.3GHz with 8G RAM (no-brand machine)
	kernbench	- No significant difference, variance well within noise
	netperf-udp	- 1.83% to 10.42% gains
	netperf-tcp	- No conclusive until buffer >= PAGE_SIZE
				4096	+15.83%
				8192	+ 0.34% (not significant)
				16384	+ 1%
	hackbench	- Small gains, very close to noise
	sysbench	- 0.79% to 1.6% gain

ppc64:	PPC970MP 2.5GHz with 10GB RAM (it's a terrasoft powerstation)
	kernbench	- No significant difference, variance well within noise
	netperf-udp	- 2-3% gain for almost all buffer sizes tested
	netperf-tcp	- losses on small buffers, gains on larger buffers
			  possibly indicates some bad caching effect.
	hackbench	- No significant difference
	sysbench	- 2-4% gain

 include/linux/mmzone.h |    5 ++-
 mm/page_alloc.c        |  119 +++++++++++++++++++++++++++--------------------
 2 files changed, 72 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/3] Reduce searching in the page allocator fast-path
@ 2009-08-28  8:44 ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
	Linux Kernel Mailing List, Mel Gorman

The following two patches remove searching in the page allocator fast-path
by maintaining multiple free-lists in the per-cpu structure. At the time the
search was introduced, increasing the per-cpu structures would waste a lot of
memory as per-cpu structures were statically allocated at compile-time. This
is no longer the case.

The patches are as follows. They are based on mmotm-2009-08-27.

Patch 1 adds multiple lists to struct per_cpu_pages, one per
	migratetype that can be stored on the PCP lists.

Patch 2 notes that the pcpu drain path check empty lists multiple times. The
	patch reduces the number of checks by maintaining a count of free
	lists encountered. Lists containing pages will then free multiple
	pages in batch

The patches were tested with kernbench, netperf udp/tcp, hackbench and
sysbench.  The netperf tests were not bound to any CPU in particular and
were run such that the results should be 99% confidence that the reported
results are within 1% of the estimated mean. sysbench was run with a postgres
background and read-only tests. Similar to netperf, it was run multiple
times so that it's 99% confidence results are within 1%. The patches were
tested on x86, x86-64 and ppc64 as

x86:	Intel Pentium D 3GHz with 8G RAM (no-brand machine)
	kernbench	- No significant difference, variance well within noise
	netperf-udp	- 1.34% to 2.28% gain
	netperf-tcp	- 0.45% to 1.22% gain
	hackbench	- Small variances, very close to noise
	sysbench	- Very small gains

x86-64:	AMD Phenom 9950 1.3GHz with 8G RAM (no-brand machine)
	kernbench	- No significant difference, variance well within noise
	netperf-udp	- 1.83% to 10.42% gains
	netperf-tcp	- No conclusive until buffer >= PAGE_SIZE
				4096	+15.83%
				8192	+ 0.34% (not significant)
				16384	+ 1%
	hackbench	- Small gains, very close to noise
	sysbench	- 0.79% to 1.6% gain

ppc64:	PPC970MP 2.5GHz with 10GB RAM (it's a terrasoft powerstation)
	kernbench	- No significant difference, variance well within noise
	netperf-udp	- 2-3% gain for almost all buffer sizes tested
	netperf-tcp	- losses on small buffers, gains on larger buffers
			  possibly indicates some bad caching effect.
	hackbench	- No significant difference
	sysbench	- 2-4% gain

 include/linux/mmzone.h |    5 ++-
 mm/page_alloc.c        |  119 +++++++++++++++++++++++++++--------------------
 2 files changed, 72 insertions(+), 52 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type
  2009-08-28  8:44 ` Mel Gorman
@ 2009-08-28  8:44   ` Mel Gorman
  -1 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
	Linux Kernel Mailing List, Mel Gorman

Currently the per-cpu page allocator searches the PCP list for pages of the
correct migrate-type to reduce the possibility of pages being inappropriate
placed from a fragmentation perspective. This search is potentially expensive
in a fast-path and undesirable. Splitting the per-cpu list into multiple
lists increases the size of a per-cpu structure and this was potentially
a major problem at the time the search was introduced. These problem has
been mitigated as now only the necessary number of structures is allocated
for the running system.

This patch replaces a list search in the per-cpu allocator with one list per
migrate type. The potential snag with this approach is when bulk freeing
pages. We round-robin free pages based on migrate type which has little
bearing on the cache hotness of the page and potentially checks empty lists
repeatedly in the event the majority of PCP pages are of one type.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Nick Piggin <npiggin@suse.de>
---
 include/linux/mmzone.h |    5 ++-
 mm/page_alloc.c        |  106 ++++++++++++++++++++++++++---------------------
 2 files changed, 63 insertions(+), 48 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 008cdcd..045348f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -38,6 +38,7 @@
 #define MIGRATE_UNMOVABLE     0
 #define MIGRATE_RECLAIMABLE   1
 #define MIGRATE_MOVABLE       2
+#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
 #define MIGRATE_RESERVE       3
 #define MIGRATE_ISOLATE       4 /* can't allocate from here */
 #define MIGRATE_TYPES         5
@@ -169,7 +170,9 @@ struct per_cpu_pages {
 	int count;		/* number of pages in the list */
 	int high;		/* high watermark, emptying needed */
 	int batch;		/* chunk size for buddy add/remove */
-	struct list_head list;	/* the list of pages */
+
+	/* Lists of pages, one per migrate type stored on the pcp-lists */
+	struct list_head lists[MIGRATE_PCPTYPES];
 };
 
 struct per_cpu_pageset {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ac3afe1..65eedb5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
 }
 
 /*
- * Frees a list of pages. 
+ * Frees a number of pages from the PCP lists
  * Assumes all pages on list are in same zone, and of same order.
  * count is the number of pages to free.
  *
@@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
  * And clear the zone's pages_scanned counter, to hold off the "all pages are
  * pinned" detection logic.
  */
-static void free_pages_bulk(struct zone *zone, int count,
-					struct list_head *list, int order)
+static void free_pcppages_bulk(struct zone *zone, int count,
+					struct per_cpu_pages *pcp)
 {
+	int migratetype = 0;
+
 	spin_lock(&zone->lock);
 	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
 	zone->pages_scanned = 0;
 
-	__mod_zone_page_state(zone, NR_FREE_PAGES, count << order);
+	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
 	while (count--) {
 		struct page *page;
+		struct list_head *list;
+
+		/*
+		 * Remove pages from lists in a round-robin fashion. This spinning
+		 * around potentially empty lists is bloody awful, alternatives that
+		 * don't suck are welcome
+		 */
+		do {
+			if (++migratetype == MIGRATE_PCPTYPES)
+				migratetype = 0;
+			list = &pcp->lists[migratetype];
+		} while (list_empty(list));
 
-		VM_BUG_ON(list_empty(list));
 		page = list_entry(list->prev, struct page, lru);
 		/* have to delete it as __free_one_page list manipulates */
 		list_del(&page->lru);
-		trace_mm_page_pcpu_drain(page, order, page_private(page));
-		__free_one_page(page, zone, order, page_private(page));
+		trace_mm_page_pcpu_drain(page, 0, migratetype);
+		__free_one_page(page, zone, 0, migratetype);
 	}
 	spin_unlock(&zone->lock);
 }
@@ -974,7 +987,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 		to_drain = pcp->batch;
 	else
 		to_drain = pcp->count;
-	free_pages_bulk(zone, to_drain, &pcp->list, 0);
+	free_pcppages_bulk(zone, to_drain, pcp);
 	pcp->count -= to_drain;
 	local_irq_restore(flags);
 }
@@ -1000,7 +1013,7 @@ static void drain_pages(unsigned int cpu)
 
 		pcp = &pset->pcp;
 		local_irq_save(flags);
-		free_pages_bulk(zone, pcp->count, &pcp->list, 0);
+		free_pcppages_bulk(zone, pcp->count, pcp);
 		pcp->count = 0;
 		local_irq_restore(flags);
 	}
@@ -1066,6 +1079,7 @@ static void free_hot_cold_page(struct page *page, int cold)
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
+	int migratetype;
 	int wasMlocked = __TestClearPageMlocked(page);
 
 	kmemcheck_free_shadow(page, 0);
@@ -1083,21 +1097,39 @@ static void free_hot_cold_page(struct page *page, int cold)
 	kernel_map_pages(page, 1, 0);
 
 	pcp = &zone_pcp(zone, get_cpu())->pcp;
-	set_page_private(page, get_pageblock_migratetype(page));
+	migratetype = get_pageblock_migratetype(page);
+	set_page_private(page, migratetype);
 	local_irq_save(flags);
 	if (unlikely(wasMlocked))
 		free_page_mlock(page);
 	__count_vm_event(PGFREE);
 
+	/*
+	 * We only track unmovable, reclaimable and movable on pcp lists.
+	 * Free ISOLATE pages back to the allocator because they are being
+	 * offlined but treat RESERVE as movable pages so we can get those
+	 * areas back if necessary. Otherwise, we may have to free
+	 * excessively into the page allocator
+	 */
+	if (migratetype >= MIGRATE_PCPTYPES) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+			free_one_page(zone, page, 0, migratetype);
+			goto out;
+		}
+		migratetype = MIGRATE_MOVABLE;
+	}
+
 	if (cold)
-		list_add_tail(&page->lru, &pcp->list);
+		list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	else
-		list_add(&page->lru, &pcp->list);
+		list_add(&page->lru, &pcp->lists[migratetype]);
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
-		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
+		free_pcppages_bulk(zone, pcp->batch, pcp);
 		pcp->count -= pcp->batch;
 	}
+
+out:
 	local_irq_restore(flags);
 	put_cpu();
 }
@@ -1155,46 +1187,24 @@ again:
 	cpu  = get_cpu();
 	if (likely(order == 0)) {
 		struct per_cpu_pages *pcp;
+		struct list_head *list;
 
 		pcp = &zone_pcp(zone, cpu)->pcp;
+		list = &pcp->lists[migratetype];
 		local_irq_save(flags);
-		if (!pcp->count) {
-			pcp->count = rmqueue_bulk(zone, 0,
-					pcp->batch, &pcp->list,
-					migratetype, cold);
-			if (unlikely(!pcp->count))
-				goto failed;
-		}
-
-		/* Find a page of the appropriate migrate type */
-		if (cold) {
-			list_for_each_entry_reverse(page, &pcp->list, lru)
-				if (page_private(page) == migratetype)
-					break;
-		} else {
-			list_for_each_entry(page, &pcp->list, lru)
-				if (page_private(page) == migratetype)
-					break;
-		}
-
-		/* Allocate more to the pcp list if necessary */
-		if (unlikely(&page->lru == &pcp->list)) {
-			int get_one_page = 0;
-
+		if (list_empty(list)) {
 			pcp->count += rmqueue_bulk(zone, 0,
-					pcp->batch, &pcp->list,
+					pcp->batch, list,
 					migratetype, cold);
-			list_for_each_entry(page, &pcp->list, lru) {
-				if (get_pageblock_migratetype(page) !=
-					    MIGRATE_ISOLATE) {
-					get_one_page = 1;
-					break;
-				}
-			}
-			if (!get_one_page)
+			if (unlikely(list_empty(list)))
 				goto failed;
 		}
 
+		if (cold)
+			page = list_entry(list->prev, struct page, lru);
+		else
+			page = list_entry(list->next, struct page, lru);
+
 		list_del(&page->lru);
 		pcp->count--;
 	} else {
@@ -3045,6 +3055,7 @@ static int zone_batchsize(struct zone *zone)
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
+	int migratetype;
 
 	memset(p, 0, sizeof(*p));
 
@@ -3052,7 +3063,8 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 	pcp->count = 0;
 	pcp->high = 6 * batch;
 	pcp->batch = max(1UL, 1 * batch);
-	INIT_LIST_HEAD(&pcp->list);
+	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
+		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
 /*
@@ -3244,7 +3256,7 @@ static int __zone_pcp_update(void *data)
 		pcp = &pset->pcp;
 
 		local_irq_save(flags);
-		free_pages_bulk(zone, pcp->count, &pcp->list, 0);
+		free_pcppages_bulk(zone, pcp->count, pcp);
 		setup_pageset(pset, batch);
 		local_irq_restore(flags);
 	}
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type
@ 2009-08-28  8:44   ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
	Linux Kernel Mailing List, Mel Gorman

Currently the per-cpu page allocator searches the PCP list for pages of the
correct migrate-type to reduce the possibility of pages being inappropriate
placed from a fragmentation perspective. This search is potentially expensive
in a fast-path and undesirable. Splitting the per-cpu list into multiple
lists increases the size of a per-cpu structure and this was potentially
a major problem at the time the search was introduced. These problem has
been mitigated as now only the necessary number of structures is allocated
for the running system.

This patch replaces a list search in the per-cpu allocator with one list per
migrate type. The potential snag with this approach is when bulk freeing
pages. We round-robin free pages based on migrate type which has little
bearing on the cache hotness of the page and potentially checks empty lists
repeatedly in the event the majority of PCP pages are of one type.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Nick Piggin <npiggin@suse.de>
---
 include/linux/mmzone.h |    5 ++-
 mm/page_alloc.c        |  106 ++++++++++++++++++++++++++---------------------
 2 files changed, 63 insertions(+), 48 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 008cdcd..045348f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -38,6 +38,7 @@
 #define MIGRATE_UNMOVABLE     0
 #define MIGRATE_RECLAIMABLE   1
 #define MIGRATE_MOVABLE       2
+#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
 #define MIGRATE_RESERVE       3
 #define MIGRATE_ISOLATE       4 /* can't allocate from here */
 #define MIGRATE_TYPES         5
@@ -169,7 +170,9 @@ struct per_cpu_pages {
 	int count;		/* number of pages in the list */
 	int high;		/* high watermark, emptying needed */
 	int batch;		/* chunk size for buddy add/remove */
-	struct list_head list;	/* the list of pages */
+
+	/* Lists of pages, one per migrate type stored on the pcp-lists */
+	struct list_head lists[MIGRATE_PCPTYPES];
 };
 
 struct per_cpu_pageset {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ac3afe1..65eedb5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
 }
 
 /*
- * Frees a list of pages. 
+ * Frees a number of pages from the PCP lists
  * Assumes all pages on list are in same zone, and of same order.
  * count is the number of pages to free.
  *
@@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
  * And clear the zone's pages_scanned counter, to hold off the "all pages are
  * pinned" detection logic.
  */
-static void free_pages_bulk(struct zone *zone, int count,
-					struct list_head *list, int order)
+static void free_pcppages_bulk(struct zone *zone, int count,
+					struct per_cpu_pages *pcp)
 {
+	int migratetype = 0;
+
 	spin_lock(&zone->lock);
 	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
 	zone->pages_scanned = 0;
 
-	__mod_zone_page_state(zone, NR_FREE_PAGES, count << order);
+	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
 	while (count--) {
 		struct page *page;
+		struct list_head *list;
+
+		/*
+		 * Remove pages from lists in a round-robin fashion. This spinning
+		 * around potentially empty lists is bloody awful, alternatives that
+		 * don't suck are welcome
+		 */
+		do {
+			if (++migratetype == MIGRATE_PCPTYPES)
+				migratetype = 0;
+			list = &pcp->lists[migratetype];
+		} while (list_empty(list));
 
-		VM_BUG_ON(list_empty(list));
 		page = list_entry(list->prev, struct page, lru);
 		/* have to delete it as __free_one_page list manipulates */
 		list_del(&page->lru);
-		trace_mm_page_pcpu_drain(page, order, page_private(page));
-		__free_one_page(page, zone, order, page_private(page));
+		trace_mm_page_pcpu_drain(page, 0, migratetype);
+		__free_one_page(page, zone, 0, migratetype);
 	}
 	spin_unlock(&zone->lock);
 }
@@ -974,7 +987,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 		to_drain = pcp->batch;
 	else
 		to_drain = pcp->count;
-	free_pages_bulk(zone, to_drain, &pcp->list, 0);
+	free_pcppages_bulk(zone, to_drain, pcp);
 	pcp->count -= to_drain;
 	local_irq_restore(flags);
 }
@@ -1000,7 +1013,7 @@ static void drain_pages(unsigned int cpu)
 
 		pcp = &pset->pcp;
 		local_irq_save(flags);
-		free_pages_bulk(zone, pcp->count, &pcp->list, 0);
+		free_pcppages_bulk(zone, pcp->count, pcp);
 		pcp->count = 0;
 		local_irq_restore(flags);
 	}
@@ -1066,6 +1079,7 @@ static void free_hot_cold_page(struct page *page, int cold)
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
+	int migratetype;
 	int wasMlocked = __TestClearPageMlocked(page);
 
 	kmemcheck_free_shadow(page, 0);
@@ -1083,21 +1097,39 @@ static void free_hot_cold_page(struct page *page, int cold)
 	kernel_map_pages(page, 1, 0);
 
 	pcp = &zone_pcp(zone, get_cpu())->pcp;
-	set_page_private(page, get_pageblock_migratetype(page));
+	migratetype = get_pageblock_migratetype(page);
+	set_page_private(page, migratetype);
 	local_irq_save(flags);
 	if (unlikely(wasMlocked))
 		free_page_mlock(page);
 	__count_vm_event(PGFREE);
 
+	/*
+	 * We only track unmovable, reclaimable and movable on pcp lists.
+	 * Free ISOLATE pages back to the allocator because they are being
+	 * offlined but treat RESERVE as movable pages so we can get those
+	 * areas back if necessary. Otherwise, we may have to free
+	 * excessively into the page allocator
+	 */
+	if (migratetype >= MIGRATE_PCPTYPES) {
+		if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+			free_one_page(zone, page, 0, migratetype);
+			goto out;
+		}
+		migratetype = MIGRATE_MOVABLE;
+	}
+
 	if (cold)
-		list_add_tail(&page->lru, &pcp->list);
+		list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	else
-		list_add(&page->lru, &pcp->list);
+		list_add(&page->lru, &pcp->lists[migratetype]);
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
-		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
+		free_pcppages_bulk(zone, pcp->batch, pcp);
 		pcp->count -= pcp->batch;
 	}
+
+out:
 	local_irq_restore(flags);
 	put_cpu();
 }
@@ -1155,46 +1187,24 @@ again:
 	cpu  = get_cpu();
 	if (likely(order == 0)) {
 		struct per_cpu_pages *pcp;
+		struct list_head *list;
 
 		pcp = &zone_pcp(zone, cpu)->pcp;
+		list = &pcp->lists[migratetype];
 		local_irq_save(flags);
-		if (!pcp->count) {
-			pcp->count = rmqueue_bulk(zone, 0,
-					pcp->batch, &pcp->list,
-					migratetype, cold);
-			if (unlikely(!pcp->count))
-				goto failed;
-		}
-
-		/* Find a page of the appropriate migrate type */
-		if (cold) {
-			list_for_each_entry_reverse(page, &pcp->list, lru)
-				if (page_private(page) == migratetype)
-					break;
-		} else {
-			list_for_each_entry(page, &pcp->list, lru)
-				if (page_private(page) == migratetype)
-					break;
-		}
-
-		/* Allocate more to the pcp list if necessary */
-		if (unlikely(&page->lru == &pcp->list)) {
-			int get_one_page = 0;
-
+		if (list_empty(list)) {
 			pcp->count += rmqueue_bulk(zone, 0,
-					pcp->batch, &pcp->list,
+					pcp->batch, list,
 					migratetype, cold);
-			list_for_each_entry(page, &pcp->list, lru) {
-				if (get_pageblock_migratetype(page) !=
-					    MIGRATE_ISOLATE) {
-					get_one_page = 1;
-					break;
-				}
-			}
-			if (!get_one_page)
+			if (unlikely(list_empty(list)))
 				goto failed;
 		}
 
+		if (cold)
+			page = list_entry(list->prev, struct page, lru);
+		else
+			page = list_entry(list->next, struct page, lru);
+
 		list_del(&page->lru);
 		pcp->count--;
 	} else {
@@ -3045,6 +3055,7 @@ static int zone_batchsize(struct zone *zone)
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
+	int migratetype;
 
 	memset(p, 0, sizeof(*p));
 
@@ -3052,7 +3063,8 @@ static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 	pcp->count = 0;
 	pcp->high = 6 * batch;
 	pcp->batch = max(1UL, 1 * batch);
-	INIT_LIST_HEAD(&pcp->list);
+	for (migratetype = 0; migratetype < MIGRATE_PCPTYPES; migratetype++)
+		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
 /*
@@ -3244,7 +3256,7 @@ static int __zone_pcp_update(void *data)
 		pcp = &pset->pcp;
 
 		local_irq_save(flags);
-		free_pages_bulk(zone, pcp->count, &pcp->list, 0);
+		free_pcppages_bulk(zone, pcp->count, pcp);
 		setup_pageset(pset, batch);
 		local_irq_restore(flags);
 	}
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
  2009-08-28  8:44 ` Mel Gorman
@ 2009-08-28  8:44   ` Mel Gorman
  -1 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
	Linux Kernel Mailing List, Mel Gorman

When round-robin freeing pages from the PCP lists, empty lists may be
encountered. In the event one of the lists has more pages than another,
there may be numerous checks for list_empty() which is undesirable. This
patch maintains a count of pages to free which is incremented when empty
lists are encountered. The intention is that more pages will then be freed
from fuller lists than the empty ones reducing the number of empty list
checks in the free path.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/page_alloc.c |   23 ++++++++++++++---------
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 65eedb5..9b86977 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -536,32 +536,37 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 					struct per_cpu_pages *pcp)
 {
 	int migratetype = 0;
+	int batch_free = 0;
 
 	spin_lock(&zone->lock);
 	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
 	zone->pages_scanned = 0;
 
 	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
-	while (count--) {
+	while (count) {
 		struct page *page;
 		struct list_head *list;
 
 		/*
-		 * Remove pages from lists in a round-robin fashion. This spinning
-		 * around potentially empty lists is bloody awful, alternatives that
-		 * don't suck are welcome
+		 * Remove pages from lists in a round-robin fashion. A batch_free
+		 * count is maintained that is incremented when an empty list is
+		 * encountered. This is so more pages are freed off fuller lists
+		 * instead of spinning excessively around empty lists
 		 */
 		do {
+			batch_free++;
 			if (++migratetype == MIGRATE_PCPTYPES)
 				migratetype = 0;
 			list = &pcp->lists[migratetype];
 		} while (list_empty(list));
 
-		page = list_entry(list->prev, struct page, lru);
-		/* have to delete it as __free_one_page list manipulates */
-		list_del(&page->lru);
-		trace_mm_page_pcpu_drain(page, 0, migratetype);
-		__free_one_page(page, zone, 0, migratetype);
+		do {
+			page = list_entry(list->prev, struct page, lru);
+			/* must delete as __free_one_page list manipulates */
+			list_del(&page->lru);
+			__free_one_page(page, zone, 0, migratetype);
+			trace_mm_page_pcpu_drain(page, 0, migratetype);
+		} while (--count && --batch_free && !list_empty(list));
 	}
 	spin_unlock(&zone->lock);
 }
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-28  8:44   ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
	Linux Kernel Mailing List, Mel Gorman

When round-robin freeing pages from the PCP lists, empty lists may be
encountered. In the event one of the lists has more pages than another,
there may be numerous checks for list_empty() which is undesirable. This
patch maintains a count of pages to free which is incremented when empty
lists are encountered. The intention is that more pages will then be freed
from fuller lists than the empty ones reducing the number of empty list
checks in the free path.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/page_alloc.c |   23 ++++++++++++++---------
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 65eedb5..9b86977 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -536,32 +536,37 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 					struct per_cpu_pages *pcp)
 {
 	int migratetype = 0;
+	int batch_free = 0;
 
 	spin_lock(&zone->lock);
 	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
 	zone->pages_scanned = 0;
 
 	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
-	while (count--) {
+	while (count) {
 		struct page *page;
 		struct list_head *list;
 
 		/*
-		 * Remove pages from lists in a round-robin fashion. This spinning
-		 * around potentially empty lists is bloody awful, alternatives that
-		 * don't suck are welcome
+		 * Remove pages from lists in a round-robin fashion. A batch_free
+		 * count is maintained that is incremented when an empty list is
+		 * encountered. This is so more pages are freed off fuller lists
+		 * instead of spinning excessively around empty lists
 		 */
 		do {
+			batch_free++;
 			if (++migratetype == MIGRATE_PCPTYPES)
 				migratetype = 0;
 			list = &pcp->lists[migratetype];
 		} while (list_empty(list));
 
-		page = list_entry(list->prev, struct page, lru);
-		/* have to delete it as __free_one_page list manipulates */
-		list_del(&page->lru);
-		trace_mm_page_pcpu_drain(page, 0, migratetype);
-		__free_one_page(page, zone, 0, migratetype);
+		do {
+			page = list_entry(list->prev, struct page, lru);
+			/* must delete as __free_one_page list manipulates */
+			list_del(&page->lru);
+			__free_one_page(page, zone, 0, migratetype);
+			trace_mm_page_pcpu_drain(page, 0, migratetype);
+		} while (--count && --batch_free && !list_empty(list));
 	}
 	spin_unlock(&zone->lock);
 }
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/3] Reduce searching in the page allocator fast-path
  2009-08-28  8:44 ` Mel Gorman
@ 2009-08-28  8:47   ` Mel Gorman
  -1 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28  8:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
	Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 09:44:25AM +0100, Mel Gorman wrote:
> The following two patches remove searching in the page allocator fast-path

Gack, and of course the subject should have been PATCH 0/2

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/3] Reduce searching in the page allocator fast-path
@ 2009-08-28  8:47   ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28  8:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, Christoph Lameter, Nick Piggin,
	Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 09:44:25AM +0100, Mel Gorman wrote:
> The following two patches remove searching in the page allocator fast-path

Gack, and of course the subject should have been PATCH 0/2

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type
  2009-08-28  8:44   ` Mel Gorman
@ 2009-08-28 11:52     ` Minchan Kim
  -1 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 11:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

Hi, Mel. 

On Fri, 28 Aug 2009 09:44:26 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> Currently the per-cpu page allocator searches the PCP list for pages of the
> correct migrate-type to reduce the possibility of pages being inappropriate
> placed from a fragmentation perspective. This search is potentially expensive
> in a fast-path and undesirable. Splitting the per-cpu list into multiple
> lists increases the size of a per-cpu structure and this was potentially
> a major problem at the time the search was introduced. These problem has
> been mitigated as now only the necessary number of structures is allocated
> for the running system.
> 
> This patch replaces a list search in the per-cpu allocator with one list per
> migrate type. The potential snag with this approach is when bulk freeing
> pages. We round-robin free pages based on migrate type which has little
> bearing on the cache hotness of the page and potentially checks empty lists
> repeatedly in the event the majority of PCP pages are of one type.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Acked-by: Nick Piggin <npiggin@suse.de>
> ---
>  include/linux/mmzone.h |    5 ++-
>  mm/page_alloc.c        |  106 ++++++++++++++++++++++++++---------------------
>  2 files changed, 63 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 008cdcd..045348f 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -38,6 +38,7 @@
>  #define MIGRATE_UNMOVABLE     0
>  #define MIGRATE_RECLAIMABLE   1
>  #define MIGRATE_MOVABLE       2
> +#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
>  #define MIGRATE_RESERVE       3
>  #define MIGRATE_ISOLATE       4 /* can't allocate from here */
>  #define MIGRATE_TYPES         5
> @@ -169,7 +170,9 @@ struct per_cpu_pages {
>  	int count;		/* number of pages in the list */
>  	int high;		/* high watermark, emptying needed */
>  	int batch;		/* chunk size for buddy add/remove */
> -	struct list_head list;	/* the list of pages */
> +
> +	/* Lists of pages, one per migrate type stored on the pcp-lists */
> +	struct list_head lists[MIGRATE_PCPTYPES];
>  };
>  
>  struct per_cpu_pageset {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ac3afe1..65eedb5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
>  }
>  
>  /*
> - * Frees a list of pages. 
> + * Frees a number of pages from the PCP lists
>   * Assumes all pages on list are in same zone, and of same order.
>   * count is the number of pages to free.
>   *
> @@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
>   * pinned" detection logic.
>   */
> -static void free_pages_bulk(struct zone *zone, int count,
> -					struct list_head *list, int order)
> +static void free_pcppages_bulk(struct zone *zone, int count,
> +					struct per_cpu_pages *pcp)
>  {
> +	int migratetype = 0;
> +

How about caching the last sucess migratetype
with 'per_cpu_pages->last_alloc_type'?
I think it could prevent a litte spinning empty list.

>  	spin_lock(&zone->lock);
>  	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
>  	zone->pages_scanned = 0;
>  
> -	__mod_zone_page_state(zone, NR_FREE_PAGES, count << order);
> +	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
>  	while (count--) {
>  		struct page *page;
> +		struct list_head *list;
> +
> +		/*
> +		 * Remove pages from lists in a round-robin fashion. This spinning
> +		 * around potentially empty lists is bloody awful, alternatives that
> +		 * don't suck are welcome
> +		 */
> +		do {
> +			if (++migratetype == MIGRATE_PCPTYPES)
> +				migratetype = 0;
> +			list = &pcp->lists[migratetype];
> +		} while (list_empty(list));
>  
> -		VM_BUG_ON(list_empty(list));
>  		page = list_entry(list->prev, struct page, lru);
>  		/* have to delete it as __free_one_page list manipulates */
>  		list_del(&page->lru);
> -		trace_mm_page_pcpu_drain(page, order, page_private(page));
> -		__free_one_page(page, zone, order, page_private(page));
> +		trace_mm_page_pcpu_drain(page, 0, migratetype);
> +		__free_one_page(page, zone, 0, migratetype);
>  	}
>  	spin_unlock(&zone->lock);
>  }



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type
@ 2009-08-28 11:52     ` Minchan Kim
  0 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 11:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

Hi, Mel. 

On Fri, 28 Aug 2009 09:44:26 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> Currently the per-cpu page allocator searches the PCP list for pages of the
> correct migrate-type to reduce the possibility of pages being inappropriate
> placed from a fragmentation perspective. This search is potentially expensive
> in a fast-path and undesirable. Splitting the per-cpu list into multiple
> lists increases the size of a per-cpu structure and this was potentially
> a major problem at the time the search was introduced. These problem has
> been mitigated as now only the necessary number of structures is allocated
> for the running system.
> 
> This patch replaces a list search in the per-cpu allocator with one list per
> migrate type. The potential snag with this approach is when bulk freeing
> pages. We round-robin free pages based on migrate type which has little
> bearing on the cache hotness of the page and potentially checks empty lists
> repeatedly in the event the majority of PCP pages are of one type.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Acked-by: Nick Piggin <npiggin@suse.de>
> ---
>  include/linux/mmzone.h |    5 ++-
>  mm/page_alloc.c        |  106 ++++++++++++++++++++++++++---------------------
>  2 files changed, 63 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 008cdcd..045348f 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -38,6 +38,7 @@
>  #define MIGRATE_UNMOVABLE     0
>  #define MIGRATE_RECLAIMABLE   1
>  #define MIGRATE_MOVABLE       2
> +#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
>  #define MIGRATE_RESERVE       3
>  #define MIGRATE_ISOLATE       4 /* can't allocate from here */
>  #define MIGRATE_TYPES         5
> @@ -169,7 +170,9 @@ struct per_cpu_pages {
>  	int count;		/* number of pages in the list */
>  	int high;		/* high watermark, emptying needed */
>  	int batch;		/* chunk size for buddy add/remove */
> -	struct list_head list;	/* the list of pages */
> +
> +	/* Lists of pages, one per migrate type stored on the pcp-lists */
> +	struct list_head lists[MIGRATE_PCPTYPES];
>  };
>  
>  struct per_cpu_pageset {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ac3afe1..65eedb5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
>  }
>  
>  /*
> - * Frees a list of pages. 
> + * Frees a number of pages from the PCP lists
>   * Assumes all pages on list are in same zone, and of same order.
>   * count is the number of pages to free.
>   *
> @@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
>   * pinned" detection logic.
>   */
> -static void free_pages_bulk(struct zone *zone, int count,
> -					struct list_head *list, int order)
> +static void free_pcppages_bulk(struct zone *zone, int count,
> +					struct per_cpu_pages *pcp)
>  {
> +	int migratetype = 0;
> +

How about caching the last sucess migratetype
with 'per_cpu_pages->last_alloc_type'?
I think it could prevent a litte spinning empty list.

>  	spin_lock(&zone->lock);
>  	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
>  	zone->pages_scanned = 0;
>  
> -	__mod_zone_page_state(zone, NR_FREE_PAGES, count << order);
> +	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
>  	while (count--) {
>  		struct page *page;
> +		struct list_head *list;
> +
> +		/*
> +		 * Remove pages from lists in a round-robin fashion. This spinning
> +		 * around potentially empty lists is bloody awful, alternatives that
> +		 * don't suck are welcome
> +		 */
> +		do {
> +			if (++migratetype == MIGRATE_PCPTYPES)
> +				migratetype = 0;
> +			list = &pcp->lists[migratetype];
> +		} while (list_empty(list));
>  
> -		VM_BUG_ON(list_empty(list));
>  		page = list_entry(list->prev, struct page, lru);
>  		/* have to delete it as __free_one_page list manipulates */
>  		list_del(&page->lru);
> -		trace_mm_page_pcpu_drain(page, order, page_private(page));
> -		__free_one_page(page, zone, order, page_private(page));
> +		trace_mm_page_pcpu_drain(page, 0, migratetype);
> +		__free_one_page(page, zone, 0, migratetype);
>  	}
>  	spin_unlock(&zone->lock);
>  }



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] page-allocator: Split per-cpu list into  one-list-per-migrate-type
  2009-08-28 11:52     ` Minchan Kim
@ 2009-08-28 12:00       ` Minchan Kim
  -1 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 12:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 8:52 PM, Minchan Kim<minchan.kim@gmail.com> wrote:
> Hi, Mel.
>
> On Fri, 28 Aug 2009 09:44:26 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
>> Currently the per-cpu page allocator searches the PCP list for pages of the
>> correct migrate-type to reduce the possibility of pages being inappropriate
>> placed from a fragmentation perspective. This search is potentially expensive
>> in a fast-path and undesirable. Splitting the per-cpu list into multiple
>> lists increases the size of a per-cpu structure and this was potentially
>> a major problem at the time the search was introduced. These problem has
>> been mitigated as now only the necessary number of structures is allocated
>> for the running system.
>>
>> This patch replaces a list search in the per-cpu allocator with one list per
>> migrate type. The potential snag with this approach is when bulk freeing
>> pages. We round-robin free pages based on migrate type which has little
>> bearing on the cache hotness of the page and potentially checks empty lists
>> repeatedly in the event the majority of PCP pages are of one type.
>>
>> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>> Acked-by: Nick Piggin <npiggin@suse.de>
>> ---
>>  include/linux/mmzone.h |    5 ++-
>>  mm/page_alloc.c        |  106 ++++++++++++++++++++++++++---------------------
>>  2 files changed, 63 insertions(+), 48 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 008cdcd..045348f 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -38,6 +38,7 @@
>>  #define MIGRATE_UNMOVABLE     0
>>  #define MIGRATE_RECLAIMABLE   1
>>  #define MIGRATE_MOVABLE       2
>> +#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
>>  #define MIGRATE_RESERVE       3
>>  #define MIGRATE_ISOLATE       4 /* can't allocate from here */
>>  #define MIGRATE_TYPES         5
>> @@ -169,7 +170,9 @@ struct per_cpu_pages {
>>       int count;              /* number of pages in the list */
>>       int high;               /* high watermark, emptying needed */
>>       int batch;              /* chunk size for buddy add/remove */
>> -     struct list_head list;  /* the list of pages */
>> +
>> +     /* Lists of pages, one per migrate type stored on the pcp-lists */
>> +     struct list_head lists[MIGRATE_PCPTYPES];
>>  };
>>
>>  struct per_cpu_pageset {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ac3afe1..65eedb5 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
>>  }
>>
>>  /*
>> - * Frees a list of pages.
>> + * Frees a number of pages from the PCP lists
>>   * Assumes all pages on list are in same zone, and of same order.
>>   * count is the number of pages to free.
>>   *
>> @@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
>>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
>>   * pinned" detection logic.
>>   */
>> -static void free_pages_bulk(struct zone *zone, int count,
>> -                                     struct list_head *list, int order)
>> +static void free_pcppages_bulk(struct zone *zone, int count,
>> +                                     struct per_cpu_pages *pcp)
>>  {
>> +     int migratetype = 0;
>> +
>
> How about caching the last sucess migratetype
> with 'per_cpu_pages->last_alloc_type'?
                                         ^^^^
                                         free
> I think it could prevent a litte spinning empty list.

Anyway, Ignore me.
I didn't see your next patch.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type
@ 2009-08-28 12:00       ` Minchan Kim
  0 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 12:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 8:52 PM, Minchan Kim<minchan.kim@gmail.com> wrote:
> Hi, Mel.
>
> On Fri, 28 Aug 2009 09:44:26 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
>> Currently the per-cpu page allocator searches the PCP list for pages of the
>> correct migrate-type to reduce the possibility of pages being inappropriate
>> placed from a fragmentation perspective. This search is potentially expensive
>> in a fast-path and undesirable. Splitting the per-cpu list into multiple
>> lists increases the size of a per-cpu structure and this was potentially
>> a major problem at the time the search was introduced. These problem has
>> been mitigated as now only the necessary number of structures is allocated
>> for the running system.
>>
>> This patch replaces a list search in the per-cpu allocator with one list per
>> migrate type. The potential snag with this approach is when bulk freeing
>> pages. We round-robin free pages based on migrate type which has little
>> bearing on the cache hotness of the page and potentially checks empty lists
>> repeatedly in the event the majority of PCP pages are of one type.
>>
>> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>> Acked-by: Nick Piggin <npiggin@suse.de>
>> ---
>>  include/linux/mmzone.h |    5 ++-
>>  mm/page_alloc.c        |  106 ++++++++++++++++++++++++++---------------------
>>  2 files changed, 63 insertions(+), 48 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 008cdcd..045348f 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -38,6 +38,7 @@
>>  #define MIGRATE_UNMOVABLE     0
>>  #define MIGRATE_RECLAIMABLE   1
>>  #define MIGRATE_MOVABLE       2
>> +#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
>>  #define MIGRATE_RESERVE       3
>>  #define MIGRATE_ISOLATE       4 /* can't allocate from here */
>>  #define MIGRATE_TYPES         5
>> @@ -169,7 +170,9 @@ struct per_cpu_pages {
>>       int count;              /* number of pages in the list */
>>       int high;               /* high watermark, emptying needed */
>>       int batch;              /* chunk size for buddy add/remove */
>> -     struct list_head list;  /* the list of pages */
>> +
>> +     /* Lists of pages, one per migrate type stored on the pcp-lists */
>> +     struct list_head lists[MIGRATE_PCPTYPES];
>>  };
>>
>>  struct per_cpu_pageset {
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ac3afe1..65eedb5 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
>>  }
>>
>>  /*
>> - * Frees a list of pages.
>> + * Frees a number of pages from the PCP lists
>>   * Assumes all pages on list are in same zone, and of same order.
>>   * count is the number of pages to free.
>>   *
>> @@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
>>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
>>   * pinned" detection logic.
>>   */
>> -static void free_pages_bulk(struct zone *zone, int count,
>> -                                     struct list_head *list, int order)
>> +static void free_pcppages_bulk(struct zone *zone, int count,
>> +                                     struct per_cpu_pages *pcp)
>>  {
>> +     int migratetype = 0;
>> +
>
> How about caching the last sucess migratetype
> with 'per_cpu_pages->last_alloc_type'?
                                         ^^^^
                                         free
> I think it could prevent a litte spinning empty list.

Anyway, Ignore me.
I didn't see your next patch.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to  free from the PCP
  2009-08-28  8:44   ` Mel Gorman
@ 2009-08-28 12:16     ` Pekka Enberg
  -1 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2009-08-28 12:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

Hi Mel,

On Fri, Aug 28, 2009 at 11:44 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> -               page = list_entry(list->prev, struct page, lru);
> -               /* have to delete it as __free_one_page list manipulates */
> -               list_del(&page->lru);
> -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> -               __free_one_page(page, zone, 0, migratetype);
> +               do {
> +                       page = list_entry(list->prev, struct page, lru);
> +                       /* must delete as __free_one_page list manipulates */
> +                       list_del(&page->lru);
> +                       __free_one_page(page, zone, 0, migratetype);
> +                       trace_mm_page_pcpu_drain(page, 0, migratetype);

This calls trace_mm_page_pcpu_drain() *after* __free_one_page(). It's
probably not a good idea as __free_one_page() can alter the struct
page in various ways.

> +               } while (--count && --batch_free && !list_empty(list));
>        }
>        spin_unlock(&zone->lock);
>  }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-28 12:16     ` Pekka Enberg
  0 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2009-08-28 12:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

Hi Mel,

On Fri, Aug 28, 2009 at 11:44 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> -               page = list_entry(list->prev, struct page, lru);
> -               /* have to delete it as __free_one_page list manipulates */
> -               list_del(&page->lru);
> -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> -               __free_one_page(page, zone, 0, migratetype);
> +               do {
> +                       page = list_entry(list->prev, struct page, lru);
> +                       /* must delete as __free_one_page list manipulates */
> +                       list_del(&page->lru);
> +                       __free_one_page(page, zone, 0, migratetype);
> +                       trace_mm_page_pcpu_drain(page, 0, migratetype);

This calls trace_mm_page_pcpu_drain() *after* __free_one_page(). It's
probably not a good idea as __free_one_page() can alter the struct
page in various ways.

> +               } while (--count && --batch_free && !list_empty(list));
>        }
>        spin_unlock(&zone->lock);
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type
  2009-08-28 12:00       ` Minchan Kim
@ 2009-08-28 12:56         ` Mel Gorman
  -1 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28 12:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 09:00:25PM +0900, Minchan Kim wrote:
> On Fri, Aug 28, 2009 at 8:52 PM, Minchan Kim<minchan.kim@gmail.com> wrote:
> > Hi, Mel.
> >
> > On Fri, 28 Aug 2009 09:44:26 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> >
> >> Currently the per-cpu page allocator searches the PCP list for pages of the
> >> correct migrate-type to reduce the possibility of pages being inappropriate
> >> placed from a fragmentation perspective. This search is potentially expensive
> >> in a fast-path and undesirable. Splitting the per-cpu list into multiple
> >> lists increases the size of a per-cpu structure and this was potentially
> >> a major problem at the time the search was introduced. These problem has
> >> been mitigated as now only the necessary number of structures is allocated
> >> for the running system.
> >>
> >> This patch replaces a list search in the per-cpu allocator with one list per
> >> migrate type. The potential snag with this approach is when bulk freeing
> >> pages. We round-robin free pages based on migrate type which has little
> >> bearing on the cache hotness of the page and potentially checks empty lists
> >> repeatedly in the event the majority of PCP pages are of one type.
> >>
> >> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> >> Acked-by: Nick Piggin <npiggin@suse.de>
> >> ---
> >>  include/linux/mmzone.h |    5 ++-
> >>  mm/page_alloc.c        |  106 ++++++++++++++++++++++++++---------------------
> >>  2 files changed, 63 insertions(+), 48 deletions(-)
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index 008cdcd..045348f 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -38,6 +38,7 @@
> >>  #define MIGRATE_UNMOVABLE     0
> >>  #define MIGRATE_RECLAIMABLE   1
> >>  #define MIGRATE_MOVABLE       2
> >> +#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
> >>  #define MIGRATE_RESERVE       3
> >>  #define MIGRATE_ISOLATE       4 /* can't allocate from here */
> >>  #define MIGRATE_TYPES         5
> >> @@ -169,7 +170,9 @@ struct per_cpu_pages {
> >>       int count;              /* number of pages in the list */
> >>       int high;               /* high watermark, emptying needed */
> >>       int batch;              /* chunk size for buddy add/remove */
> >> -     struct list_head list;  /* the list of pages */
> >> +
> >> +     /* Lists of pages, one per migrate type stored on the pcp-lists */
> >> +     struct list_head lists[MIGRATE_PCPTYPES];
> >>  };
> >>
> >>  struct per_cpu_pageset {
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index ac3afe1..65eedb5 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
> >>  }
> >>
> >>  /*
> >> - * Frees a list of pages.
> >> + * Frees a number of pages from the PCP lists
> >>   * Assumes all pages on list are in same zone, and of same order.
> >>   * count is the number of pages to free.
> >>   *
> >> @@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
> >>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
> >>   * pinned" detection logic.
> >>   */
> >> -static void free_pages_bulk(struct zone *zone, int count,
> >> -                                     struct list_head *list, int order)
> >> +static void free_pcppages_bulk(struct zone *zone, int count,
> >> +                                     struct per_cpu_pages *pcp)
> >>  {
> >> +     int migratetype = 0;
> >> +
> >
> > How about caching the last sucess migratetype
> > with 'per_cpu_pages->last_alloc_type'?
>                                          ^^^^
>                                          free
> > I think it could prevent a litte spinning empty list.
> 
> Anyway, Ignore me.
> I didn't see your next patch.
> 

Nah, it's a reasonable suggestion. Patch 2 was one effort to reduce
spinning but the comment was in patch 1 in case someone thought of
something better. I tried what you suggested before but it didn't work
out. For any sort of workload that varies the type of allocation (very
frequent), it didn't reduce spinning significantly.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type
@ 2009-08-28 12:56         ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28 12:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 09:00:25PM +0900, Minchan Kim wrote:
> On Fri, Aug 28, 2009 at 8:52 PM, Minchan Kim<minchan.kim@gmail.com> wrote:
> > Hi, Mel.
> >
> > On Fri, 28 Aug 2009 09:44:26 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> >
> >> Currently the per-cpu page allocator searches the PCP list for pages of the
> >> correct migrate-type to reduce the possibility of pages being inappropriate
> >> placed from a fragmentation perspective. This search is potentially expensive
> >> in a fast-path and undesirable. Splitting the per-cpu list into multiple
> >> lists increases the size of a per-cpu structure and this was potentially
> >> a major problem at the time the search was introduced. These problem has
> >> been mitigated as now only the necessary number of structures is allocated
> >> for the running system.
> >>
> >> This patch replaces a list search in the per-cpu allocator with one list per
> >> migrate type. The potential snag with this approach is when bulk freeing
> >> pages. We round-robin free pages based on migrate type which has little
> >> bearing on the cache hotness of the page and potentially checks empty lists
> >> repeatedly in the event the majority of PCP pages are of one type.
> >>
> >> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> >> Acked-by: Nick Piggin <npiggin@suse.de>
> >> ---
> >>  include/linux/mmzone.h |    5 ++-
> >>  mm/page_alloc.c        |  106 ++++++++++++++++++++++++++---------------------
> >>  2 files changed, 63 insertions(+), 48 deletions(-)
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index 008cdcd..045348f 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -38,6 +38,7 @@
> >>  #define MIGRATE_UNMOVABLE     0
> >>  #define MIGRATE_RECLAIMABLE   1
> >>  #define MIGRATE_MOVABLE       2
> >> +#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
> >>  #define MIGRATE_RESERVE       3
> >>  #define MIGRATE_ISOLATE       4 /* can't allocate from here */
> >>  #define MIGRATE_TYPES         5
> >> @@ -169,7 +170,9 @@ struct per_cpu_pages {
> >>       int count;              /* number of pages in the list */
> >>       int high;               /* high watermark, emptying needed */
> >>       int batch;              /* chunk size for buddy add/remove */
> >> -     struct list_head list;  /* the list of pages */
> >> +
> >> +     /* Lists of pages, one per migrate type stored on the pcp-lists */
> >> +     struct list_head lists[MIGRATE_PCPTYPES];
> >>  };
> >>
> >>  struct per_cpu_pageset {
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index ac3afe1..65eedb5 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -522,7 +522,7 @@ static inline int free_pages_check(struct page *page)
> >>  }
> >>
> >>  /*
> >> - * Frees a list of pages.
> >> + * Frees a number of pages from the PCP lists
> >>   * Assumes all pages on list are in same zone, and of same order.
> >>   * count is the number of pages to free.
> >>   *
> >> @@ -532,23 +532,36 @@ static inline int free_pages_check(struct page *page)
> >>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
> >>   * pinned" detection logic.
> >>   */
> >> -static void free_pages_bulk(struct zone *zone, int count,
> >> -                                     struct list_head *list, int order)
> >> +static void free_pcppages_bulk(struct zone *zone, int count,
> >> +                                     struct per_cpu_pages *pcp)
> >>  {
> >> +     int migratetype = 0;
> >> +
> >
> > How about caching the last sucess migratetype
> > with 'per_cpu_pages->last_alloc_type'?
>                                          ^^^^
>                                          free
> > I think it could prevent a litte spinning empty list.
> 
> Anyway, Ignore me.
> I didn't see your next patch.
> 

Nah, it's a reasonable suggestion. Patch 2 was one effort to reduce
spinning but the comment was in patch 1 in case someone thought of
something better. I tried what you suggested before but it didn't work
out. For any sort of workload that varies the type of allocation (very
frequent), it didn't reduce spinning significantly.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
  2009-08-28 12:16     ` Pekka Enberg
@ 2009-08-28 12:57       ` Mel Gorman
  -1 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28 12:57 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 03:16:34PM +0300, Pekka Enberg wrote:
> Hi Mel,
> 
> On Fri, Aug 28, 2009 at 11:44 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> > -               page = list_entry(list->prev, struct page, lru);
> > -               /* have to delete it as __free_one_page list manipulates */
> > -               list_del(&page->lru);
> > -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> > -               __free_one_page(page, zone, 0, migratetype);
> > +               do {
> > +                       page = list_entry(list->prev, struct page, lru);
> > +                       /* must delete as __free_one_page list manipulates */
> > +                       list_del(&page->lru);
> > +                       __free_one_page(page, zone, 0, migratetype);
> > +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> 
> This calls trace_mm_page_pcpu_drain() *after* __free_one_page(). It's
> probably not a good idea as __free_one_page() can alter the struct
> page in various ways.
> 

While true, does it alter the struct page in any way that matters?

> 
> > +               } while (--count && --batch_free && !list_empty(list));
> >        }
> >        spin_unlock(&zone->lock);
> >  }
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-28 12:57       ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28 12:57 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 03:16:34PM +0300, Pekka Enberg wrote:
> Hi Mel,
> 
> On Fri, Aug 28, 2009 at 11:44 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> > -               page = list_entry(list->prev, struct page, lru);
> > -               /* have to delete it as __free_one_page list manipulates */
> > -               list_del(&page->lru);
> > -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> > -               __free_one_page(page, zone, 0, migratetype);
> > +               do {
> > +                       page = list_entry(list->prev, struct page, lru);
> > +                       /* must delete as __free_one_page list manipulates */
> > +                       list_del(&page->lru);
> > +                       __free_one_page(page, zone, 0, migratetype);
> > +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> 
> This calls trace_mm_page_pcpu_drain() *after* __free_one_page(). It's
> probably not a good idea as __free_one_page() can alter the struct
> page in various ways.
> 

While true, does it alter the struct page in any way that matters?

> 
> > +               } while (--count && --batch_free && !list_empty(list));
> >        }
> >        spin_unlock(&zone->lock);
> >  }
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
  2009-08-28 12:57       ` Mel Gorman
@ 2009-08-28 13:02         ` Pekka Enberg
  -1 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2009-08-28 13:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

Hi Mel,

On Fri, 2009-08-28 at 13:57 +0100, Mel Gorman wrote:
> On Fri, Aug 28, 2009 at 03:16:34PM +0300, Pekka Enberg wrote:
> > Hi Mel,
> > 
> > On Fri, Aug 28, 2009 at 11:44 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> > > -               page = list_entry(list->prev, struct page, lru);
> > > -               /* have to delete it as __free_one_page list manipulates */
> > > -               list_del(&page->lru);
> > > -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> > > -               __free_one_page(page, zone, 0, migratetype);
> > > +               do {
> > > +                       page = list_entry(list->prev, struct page, lru);
> > > +                       /* must delete as __free_one_page list manipulates */
> > > +                       list_del(&page->lru);
> > > +                       __free_one_page(page, zone, 0, migratetype);
> > > +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> > 
> > This calls trace_mm_page_pcpu_drain() *after* __free_one_page(). It's
> > probably not a good idea as __free_one_page() can alter the struct
> > page in various ways.
> > 
> 
> While true, does it alter the struct page in any way that matters?

Page flags and order are probably interesting for tracing?

			Pekka


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-28 13:02         ` Pekka Enberg
  0 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2009-08-28 13:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

Hi Mel,

On Fri, 2009-08-28 at 13:57 +0100, Mel Gorman wrote:
> On Fri, Aug 28, 2009 at 03:16:34PM +0300, Pekka Enberg wrote:
> > Hi Mel,
> > 
> > On Fri, Aug 28, 2009 at 11:44 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> > > -               page = list_entry(list->prev, struct page, lru);
> > > -               /* have to delete it as __free_one_page list manipulates */
> > > -               list_del(&page->lru);
> > > -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> > > -               __free_one_page(page, zone, 0, migratetype);
> > > +               do {
> > > +                       page = list_entry(list->prev, struct page, lru);
> > > +                       /* must delete as __free_one_page list manipulates */
> > > +                       list_del(&page->lru);
> > > +                       __free_one_page(page, zone, 0, migratetype);
> > > +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> > 
> > This calls trace_mm_page_pcpu_drain() *after* __free_one_page(). It's
> > probably not a good idea as __free_one_page() can alter the struct
> > page in various ways.
> > 
> 
> While true, does it alter the struct page in any way that matters?

Page flags and order are probably interesting for tracing?

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
  2009-08-28 13:02         ` Pekka Enberg
@ 2009-08-28 13:36           ` Mel Gorman
  -1 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28 13:36 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 04:02:44PM +0300, Pekka Enberg wrote:
> Hi Mel,
> 
> On Fri, 2009-08-28 at 13:57 +0100, Mel Gorman wrote:
> > On Fri, Aug 28, 2009 at 03:16:34PM +0300, Pekka Enberg wrote:
> > > Hi Mel,
> > > 
> > > On Fri, Aug 28, 2009 at 11:44 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> > > > -               page = list_entry(list->prev, struct page, lru);
> > > > -               /* have to delete it as __free_one_page list manipulates */
> > > > -               list_del(&page->lru);
> > > > -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> > > > -               __free_one_page(page, zone, 0, migratetype);
> > > > +               do {
> > > > +                       page = list_entry(list->prev, struct page, lru);
> > > > +                       /* must delete as __free_one_page list manipulates */
> > > > +                       list_del(&page->lru);
> > > > +                       __free_one_page(page, zone, 0, migratetype);
> > > > +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> > > 
> > > This calls trace_mm_page_pcpu_drain() *after* __free_one_page(). It's
> > > probably not a good idea as __free_one_page() can alter the struct
> > > page in various ways.
> > > 
> > 
> > While true, does it alter the struct page in any way that matters?
> 
> Page flags and order are probably interesting for tracing?
> 

This is PCPU draining. The flags are already clear of any values of interest
and the order is always 0. I can follow up a fix-patch that reverses it just
in case but I don't think it makes a major difference?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-28 13:36           ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-28 13:36 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 04:02:44PM +0300, Pekka Enberg wrote:
> Hi Mel,
> 
> On Fri, 2009-08-28 at 13:57 +0100, Mel Gorman wrote:
> > On Fri, Aug 28, 2009 at 03:16:34PM +0300, Pekka Enberg wrote:
> > > Hi Mel,
> > > 
> > > On Fri, Aug 28, 2009 at 11:44 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> > > > -               page = list_entry(list->prev, struct page, lru);
> > > > -               /* have to delete it as __free_one_page list manipulates */
> > > > -               list_del(&page->lru);
> > > > -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> > > > -               __free_one_page(page, zone, 0, migratetype);
> > > > +               do {
> > > > +                       page = list_entry(list->prev, struct page, lru);
> > > > +                       /* must delete as __free_one_page list manipulates */
> > > > +                       list_del(&page->lru);
> > > > +                       __free_one_page(page, zone, 0, migratetype);
> > > > +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> > > 
> > > This calls trace_mm_page_pcpu_drain() *after* __free_one_page(). It's
> > > probably not a good idea as __free_one_page() can alter the struct
> > > page in various ways.
> > > 
> > 
> > While true, does it alter the struct page in any way that matters?
> 
> Page flags and order are probably interesting for tracing?
> 

This is PCPU draining. The flags are already clear of any values of interest
and the order is always 0. I can follow up a fix-patch that reverses it just
in case but I don't think it makes a major difference?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to  free from the PCP
  2009-08-28 13:36           ` Mel Gorman
@ 2009-08-28 13:43             ` Pekka Enberg
  -1 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2009-08-28 13:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 4:36 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> This is PCPU draining. The flags are already clear of any values of interest
> and the order is always 0. I can follow up a fix-patch that reverses it just
> in case but I don't think it makes a major difference?

Aha, OK. Probably not worth it then.

                             Pekka

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-28 13:43             ` Pekka Enberg
  0 siblings, 0 replies; 32+ messages in thread
From: Pekka Enberg @ 2009-08-28 13:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 4:36 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> This is PCPU draining. The flags are already clear of any values of interest
> and the order is always 0. I can follow up a fix-patch that reverses it just
> in case but I don't think it makes a major difference?

Aha, OK. Probably not worth it then.

                             Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] page-allocator: Split per-cpu list into  one-list-per-migrate-type
  2009-08-28 12:56         ` Mel Gorman
@ 2009-08-28 13:46           ` Minchan Kim
  -1 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 13:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 9:56 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> On Fri, Aug 28, 2009 at 09:00:25PM +0900, Minchan Kim wrote:
>> On Fri, Aug 28, 2009 at 8:52 PM, Minchan Kim<minchan.kim@gmail.com> wrote:
>> > Hi, Mel.
>> >
>> > On Fri, 28 Aug 2009 09:44:26 +0100
>> > Mel Gorman <mel@csn.ul.ie> wrote:
>> >
>> >> Currently the per-cpu page allocator searches the PCP list for pages of the
>> >> correct migrate-type to reduce the possibility of pages being inappropriate
>> >> placed from a fragmentation perspective. This search is potentially expensive
>> >> in a fast-path and undesirable. Splitting the per-cpu list into multiple
>> >> lists increases the size of a per-cpu structure and this was potentially
>> >> a major problem at the time the search was introduced. These problem has
>> >> been mitigated as now only the necessary number of structures is allocated
>> >> for the running system.
>> >>
>> >> This patch replaces a list search in the per-cpu allocator with one list per
>> >> migrate type. The potential snag with this approach is when bulk freeing
>> >> pages. We round-robin free pages based on migrate type which has little
>> >> bearing on the cache hotness of the page and potentially checks empty lists
>> >> repeatedly in the event the majority of PCP pages are of one type.
>> >>
>> >> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>> >> Acked-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

>> >>   */
>> >> -static void free_pages_bulk(struct zone *zone, int count,
>> >> -                                     struct list_head *list, int order)
>> >> +static void free_pcppages_bulk(struct zone *zone, int count,
>> >> +                                     struct per_cpu_pages *pcp)
>> >>  {
>> >> +     int migratetype = 0;
>> >> +
>> >
>> > How about caching the last sucess migratetype
>> > with 'per_cpu_pages->last_alloc_type'?
>>                                          ^^^^
>>                                          free
>> > I think it could prevent a litte spinning empty list.
>>
>> Anyway, Ignore me.
>> I didn't see your next patch.
>>
>
> Nah, it's a reasonable suggestion. Patch 2 was one effort to reduce
> spinning but the comment was in patch 1 in case someone thought of
> something better. I tried what you suggested before but it didn't work
> out. For any sort of workload that varies the type of allocation (very
> frequent), it didn't reduce spinning significantly.

Thanks for good information.

> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type
@ 2009-08-28 13:46           ` Minchan Kim
  0 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 13:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 9:56 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> On Fri, Aug 28, 2009 at 09:00:25PM +0900, Minchan Kim wrote:
>> On Fri, Aug 28, 2009 at 8:52 PM, Minchan Kim<minchan.kim@gmail.com> wrote:
>> > Hi, Mel.
>> >
>> > On Fri, 28 Aug 2009 09:44:26 +0100
>> > Mel Gorman <mel@csn.ul.ie> wrote:
>> >
>> >> Currently the per-cpu page allocator searches the PCP list for pages of the
>> >> correct migrate-type to reduce the possibility of pages being inappropriate
>> >> placed from a fragmentation perspective. This search is potentially expensive
>> >> in a fast-path and undesirable. Splitting the per-cpu list into multiple
>> >> lists increases the size of a per-cpu structure and this was potentially
>> >> a major problem at the time the search was introduced. These problem has
>> >> been mitigated as now only the necessary number of structures is allocated
>> >> for the running system.
>> >>
>> >> This patch replaces a list search in the per-cpu allocator with one list per
>> >> migrate type. The potential snag with this approach is when bulk freeing
>> >> pages. We round-robin free pages based on migrate type which has little
>> >> bearing on the cache hotness of the page and potentially checks empty lists
>> >> repeatedly in the event the majority of PCP pages are of one type.
>> >>
>> >> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>> >> Acked-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

>> >>   */
>> >> -static void free_pages_bulk(struct zone *zone, int count,
>> >> -                                     struct list_head *list, int order)
>> >> +static void free_pcppages_bulk(struct zone *zone, int count,
>> >> +                                     struct per_cpu_pages *pcp)
>> >>  {
>> >> +     int migratetype = 0;
>> >> +
>> >
>> > How about caching the last sucess migratetype
>> > with 'per_cpu_pages->last_alloc_type'?
>>                                          ^^^^
>>                                          free
>> > I think it could prevent a litte spinning empty list.
>>
>> Anyway, Ignore me.
>> I didn't see your next patch.
>>
>
> Nah, it's a reasonable suggestion. Patch 2 was one effort to reduce
> spinning but the comment was in patch 1 in case someone thought of
> something better. I tried what you suggested before but it didn't work
> out. For any sort of workload that varies the type of allocation (very
> frequent), it didn't reduce spinning significantly.

Thanks for good information.

> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to  free from the PCP
  2009-08-28  8:44   ` Mel Gorman
@ 2009-08-28 13:49     ` Minchan Kim
  -1 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 13:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 5:44 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> When round-robin freeing pages from the PCP lists, empty lists may be
> encountered. In the event one of the lists has more pages than another,
> there may be numerous checks for list_empty() which is undesirable. This
> patch maintains a count of pages to free which is incremented when empty
> lists are encountered. The intention is that more pages will then be freed
> from fuller lists than the empty ones reducing the number of empty list
> checks in the free path.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

I like this idea. :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-28 13:49     ` Minchan Kim
  0 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 13:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Fri, Aug 28, 2009 at 5:44 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> When round-robin freeing pages from the PCP lists, empty lists may be
> encountered. In the event one of the lists has more pages than another,
> there may be numerous checks for list_empty() which is undesirable. This
> patch maintains a count of pages to free which is incremented when empty
> lists are encountered. The intention is that more pages will then be freed
> from fuller lists than the empty ones reducing the number of empty list
> checks in the free path.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

I like this idea. :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to  free from the PCP
  2009-08-28  8:44   ` Mel Gorman
@ 2009-08-28 15:04     ` Minchan Kim
  -1 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 15:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

Hi, Mel.

On Fri, Aug 28, 2009 at 5:44 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> When round-robin freeing pages from the PCP lists, empty lists may be
> encountered. In the event one of the lists has more pages than another,
> there may be numerous checks for list_empty() which is undesirable. This
> patch maintains a count of pages to free which is incremented when empty
> lists are encountered. The intention is that more pages will then be freed
> from fuller lists than the empty ones reducing the number of empty list
> checks in the free path.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  mm/page_alloc.c |   23 ++++++++++++++---------
>  1 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 65eedb5..9b86977 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -536,32 +536,37 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>                                        struct per_cpu_pages *pcp)
>  {
>        int migratetype = 0;
> +       int batch_free = 0;
>
>        spin_lock(&zone->lock);
>        zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
>        zone->pages_scanned = 0;
>
>        __mod_zone_page_state(zone, NR_FREE_PAGES, count);
> -       while (count--) {
> +       while (count) {
>                struct page *page;
>                struct list_head *list;
>
>                /*
> -                * Remove pages from lists in a round-robin fashion. This spinning
> -                * around potentially empty lists is bloody awful, alternatives that
> -                * don't suck are welcome
> +                * Remove pages from lists in a round-robin fashion. A batch_free
> +                * count is maintained that is incremented when an empty list is
> +                * encountered. This is so more pages are freed off fuller lists
> +                * instead of spinning excessively around empty lists
>                 */
>                do {
> +                       batch_free++;
>                        if (++migratetype == MIGRATE_PCPTYPES)
>                                migratetype = 0;
>                        list = &pcp->lists[migratetype];
>                } while (list_empty(list));

How about increasing the weight by batch_free ?

batch_free = 1 << (batch_free - 1);

It's assumed that if batch_free is big, it means
there are contiguous empty lists.
Then it is likely to need more time to refill empty lists than
one list refill. So I think it can decrease spinning empty list
a little more.

>
> -               page = list_entry(list->prev, struct page, lru);
> -               /* have to delete it as __free_one_page list manipulates */
> -               list_del(&page->lru);
> -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> -               __free_one_page(page, zone, 0, migratetype);
> +               do {
> +                       page = list_entry(list->prev, struct page, lru);
> +                       /* must delete as __free_one_page list manipulates */
> +                       list_del(&page->lru);
> +                       __free_one_page(page, zone, 0, migratetype);
> +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> +               } while (--count && --batch_free && !list_empty(list));
>        }
>        spin_unlock(&zone->lock);
>  }
> --
> 1.6.3.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-28 15:04     ` Minchan Kim
  0 siblings, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2009-08-28 15:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

Hi, Mel.

On Fri, Aug 28, 2009 at 5:44 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> When round-robin freeing pages from the PCP lists, empty lists may be
> encountered. In the event one of the lists has more pages than another,
> there may be numerous checks for list_empty() which is undesirable. This
> patch maintains a count of pages to free which is incremented when empty
> lists are encountered. The intention is that more pages will then be freed
> from fuller lists than the empty ones reducing the number of empty list
> checks in the free path.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  mm/page_alloc.c |   23 ++++++++++++++---------
>  1 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 65eedb5..9b86977 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -536,32 +536,37 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>                                        struct per_cpu_pages *pcp)
>  {
>        int migratetype = 0;
> +       int batch_free = 0;
>
>        spin_lock(&zone->lock);
>        zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
>        zone->pages_scanned = 0;
>
>        __mod_zone_page_state(zone, NR_FREE_PAGES, count);
> -       while (count--) {
> +       while (count) {
>                struct page *page;
>                struct list_head *list;
>
>                /*
> -                * Remove pages from lists in a round-robin fashion. This spinning
> -                * around potentially empty lists is bloody awful, alternatives that
> -                * don't suck are welcome
> +                * Remove pages from lists in a round-robin fashion. A batch_free
> +                * count is maintained that is incremented when an empty list is
> +                * encountered. This is so more pages are freed off fuller lists
> +                * instead of spinning excessively around empty lists
>                 */
>                do {
> +                       batch_free++;
>                        if (++migratetype == MIGRATE_PCPTYPES)
>                                migratetype = 0;
>                        list = &pcp->lists[migratetype];
>                } while (list_empty(list));

How about increasing the weight by batch_free ?

batch_free = 1 << (batch_free - 1);

It's assumed that if batch_free is big, it means
there are contiguous empty lists.
Then it is likely to need more time to refill empty lists than
one list refill. So I think it can decrease spinning empty list
a little more.

>
> -               page = list_entry(list->prev, struct page, lru);
> -               /* have to delete it as __free_one_page list manipulates */
> -               list_del(&page->lru);
> -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> -               __free_one_page(page, zone, 0, migratetype);
> +               do {
> +                       page = list_entry(list->prev, struct page, lru);
> +                       /* must delete as __free_one_page list manipulates */
> +                       list_del(&page->lru);
> +                       __free_one_page(page, zone, 0, migratetype);
> +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> +               } while (--count && --batch_free && !list_empty(list));
>        }
>        spin_unlock(&zone->lock);
>  }
> --
> 1.6.3.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
  2009-08-28 15:04     ` Minchan Kim
@ 2009-08-31 12:11       ` Mel Gorman
  -1 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-31 12:11 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Sat, Aug 29, 2009 at 12:04:48AM +0900, Minchan Kim wrote:
> 
> On Fri, Aug 28, 2009 at 5:44 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> > When round-robin freeing pages from the PCP lists, empty lists may be
> > encountered. In the event one of the lists has more pages than another,
> > there may be numerous checks for list_empty() which is undesirable. This
> > patch maintains a count of pages to free which is incremented when empty
> > lists are encountered. The intention is that more pages will then be freed
> > from fuller lists than the empty ones reducing the number of empty list
> > checks in the free path.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  mm/page_alloc.c |   23 ++++++++++++++---------
> >  1 files changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 65eedb5..9b86977 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -536,32 +536,37 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >                                        struct per_cpu_pages *pcp)
> >  {
> >        int migratetype = 0;
> > +       int batch_free = 0;
> >
> >        spin_lock(&zone->lock);
> >        zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
> >        zone->pages_scanned = 0;
> >
> >        __mod_zone_page_state(zone, NR_FREE_PAGES, count);
> > -       while (count--) {
> > +       while (count) {
> >                struct page *page;
> >                struct list_head *list;
> >
> >                /*
> > -                * Remove pages from lists in a round-robin fashion. This spinning
> > -                * around potentially empty lists is bloody awful, alternatives that
> > -                * don't suck are welcome
> > +                * Remove pages from lists in a round-robin fashion. A batch_free
> > +                * count is maintained that is incremented when an empty list is
> > +                * encountered. This is so more pages are freed off fuller lists
> > +                * instead of spinning excessively around empty lists
> >                 */
> >                do {
> > +                       batch_free++;
> >                        if (++migratetype == MIGRATE_PCPTYPES)
> >                                migratetype = 0;
> >                        list = &pcp->lists[migratetype];
> >                } while (list_empty(list));
> 
> How about increasing the weight by batch_free ?
> 
> batch_free = 1 << (batch_free - 1);
> 
> It's assumed that if batch_free is big, it means
> there are contiguous empty lists.
> Then it is likely to need more time to refill empty lists than
> one list refill. So I think it can decrease spinning empty list
> a little more.
> 

Maybe. As it is, the list spinning is not too bad and significant
amounts of time are no longer being spent in there. I've taken note to
follow-up and investigate your suggestion to see if it works out and if
it makes a difference.

Thanks

> >
> > -               page = list_entry(list->prev, struct page, lru);
> > -               /* have to delete it as __free_one_page list manipulates */
> > -               list_del(&page->lru);
> > -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> > -               __free_one_page(page, zone, 0, migratetype);
> > +               do {
> > +                       page = list_entry(list->prev, struct page, lru);
> > +                       /* must delete as __free_one_page list manipulates */
> > +                       list_del(&page->lru);
> > +                       __free_one_page(page, zone, 0, migratetype);
> > +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> > +               } while (--count && --batch_free && !list_empty(list));
> >        }
> >        spin_unlock(&zone->lock);
> >  }
> > --
> > 1.6.3.3
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP
@ 2009-08-31 12:11       ` Mel Gorman
  0 siblings, 0 replies; 32+ messages in thread
From: Mel Gorman @ 2009-08-31 12:11 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux Memory Management List, Christoph Lameter,
	Nick Piggin, Linux Kernel Mailing List

On Sat, Aug 29, 2009 at 12:04:48AM +0900, Minchan Kim wrote:
> 
> On Fri, Aug 28, 2009 at 5:44 PM, Mel Gorman<mel@csn.ul.ie> wrote:
> > When round-robin freeing pages from the PCP lists, empty lists may be
> > encountered. In the event one of the lists has more pages than another,
> > there may be numerous checks for list_empty() which is undesirable. This
> > patch maintains a count of pages to free which is incremented when empty
> > lists are encountered. The intention is that more pages will then be freed
> > from fuller lists than the empty ones reducing the number of empty list
> > checks in the free path.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  mm/page_alloc.c |   23 ++++++++++++++---------
> >  1 files changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 65eedb5..9b86977 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -536,32 +536,37 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >                                        struct per_cpu_pages *pcp)
> >  {
> >        int migratetype = 0;
> > +       int batch_free = 0;
> >
> >        spin_lock(&zone->lock);
> >        zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
> >        zone->pages_scanned = 0;
> >
> >        __mod_zone_page_state(zone, NR_FREE_PAGES, count);
> > -       while (count--) {
> > +       while (count) {
> >                struct page *page;
> >                struct list_head *list;
> >
> >                /*
> > -                * Remove pages from lists in a round-robin fashion. This spinning
> > -                * around potentially empty lists is bloody awful, alternatives that
> > -                * don't suck are welcome
> > +                * Remove pages from lists in a round-robin fashion. A batch_free
> > +                * count is maintained that is incremented when an empty list is
> > +                * encountered. This is so more pages are freed off fuller lists
> > +                * instead of spinning excessively around empty lists
> >                 */
> >                do {
> > +                       batch_free++;
> >                        if (++migratetype == MIGRATE_PCPTYPES)
> >                                migratetype = 0;
> >                        list = &pcp->lists[migratetype];
> >                } while (list_empty(list));
> 
> How about increasing the weight by batch_free ?
> 
> batch_free = 1 << (batch_free - 1);
> 
> It's assumed that if batch_free is big, it means
> there are contiguous empty lists.
> Then it is likely to need more time to refill empty lists than
> one list refill. So I think it can decrease spinning empty list
> a little more.
> 

Maybe. As it is, the list spinning is not too bad and significant
amounts of time are no longer being spent in there. I've taken note to
follow-up and investigate your suggestion to see if it works out and if
it makes a difference.

Thanks

> >
> > -               page = list_entry(list->prev, struct page, lru);
> > -               /* have to delete it as __free_one_page list manipulates */
> > -               list_del(&page->lru);
> > -               trace_mm_page_pcpu_drain(page, 0, migratetype);
> > -               __free_one_page(page, zone, 0, migratetype);
> > +               do {
> > +                       page = list_entry(list->prev, struct page, lru);
> > +                       /* must delete as __free_one_page list manipulates */
> > +                       list_del(&page->lru);
> > +                       __free_one_page(page, zone, 0, migratetype);
> > +                       trace_mm_page_pcpu_drain(page, 0, migratetype);
> > +               } while (--count && --batch_free && !list_empty(list));
> >        }
> >        spin_unlock(&zone->lock);
> >  }
> > --
> > 1.6.3.3
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2009-08-31 12:11 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-28  8:44 [PATCH 0/3] Reduce searching in the page allocator fast-path Mel Gorman
2009-08-28  8:44 ` Mel Gorman
2009-08-28  8:44 ` [PATCH 1/2] page-allocator: Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-08-28  8:44   ` Mel Gorman
2009-08-28 11:52   ` Minchan Kim
2009-08-28 11:52     ` Minchan Kim
2009-08-28 12:00     ` Minchan Kim
2009-08-28 12:00       ` Minchan Kim
2009-08-28 12:56       ` Mel Gorman
2009-08-28 12:56         ` Mel Gorman
2009-08-28 13:46         ` Minchan Kim
2009-08-28 13:46           ` Minchan Kim
2009-08-28  8:44 ` [PATCH 2/2] page-allocator: Maintain rolling count of pages to free from the PCP Mel Gorman
2009-08-28  8:44   ` Mel Gorman
2009-08-28 12:16   ` Pekka Enberg
2009-08-28 12:16     ` Pekka Enberg
2009-08-28 12:57     ` Mel Gorman
2009-08-28 12:57       ` Mel Gorman
2009-08-28 13:02       ` Pekka Enberg
2009-08-28 13:02         ` Pekka Enberg
2009-08-28 13:36         ` Mel Gorman
2009-08-28 13:36           ` Mel Gorman
2009-08-28 13:43           ` Pekka Enberg
2009-08-28 13:43             ` Pekka Enberg
2009-08-28 13:49   ` Minchan Kim
2009-08-28 13:49     ` Minchan Kim
2009-08-28 15:04   ` Minchan Kim
2009-08-28 15:04     ` Minchan Kim
2009-08-31 12:11     ` Mel Gorman
2009-08-31 12:11       ` Mel Gorman
2009-08-28  8:47 ` [PATCH 0/3] Reduce searching in the page allocator fast-path Mel Gorman
2009-08-28  8:47   ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.