linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Memory Compaction v2
@ 2007-06-18  9:28 Mel Gorman
  2007-06-18  9:28 ` [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches Mel Gorman
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-18  9:28 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mel Gorman, kamezawa.hiroyu, clameter

This is V2 for the memory compaction patches. They depend on the two starting
patches from the memory hot-remove patchset which I've included here as the
first patch. All comments are welcome and they should be in a state useful
for wider testing.

Changelog since V1
o Bug fix when checking if a given node ID is valid or not
o Using latest patch from Kame-san to compact memory in-kernel
o Added trigger for direct compaction instead of direct reclaim
o Obey watermarks in split_pagebuddy_pages()
o Do not call lru_add_drain_all() frequently

The patchset implements memory compaction for the page allocator reducing
external fragmentation so that free memory exists as fewer, but larger
contiguous blocks. Instead of being a full defragmentation solution,
this focuses exclusively on pages that are movable via the page migration
mechanism.

The compaction mechanism operates within a zone and moves movable pages
towards the higher PFNs. Grouping pages by mobility biases the location
of unmovable pages is biased towards the lower addresses, so the strategies
work in conjunction.

A full compaction run involves two scanners operating within a zone - a
migration and a free scanner. The migration scanner starts at the beginning
of a zone and finds all movable pages within one pageblock_nr_pages-sized
area and isolates them on a migratepages list. The free scanner begins at
the end of the zone and searches on a per-area basis for enough free pages to
migrate all the pages on the migratepages list. As each area is respecively
migrated or exhaused of free pages, the scanners are advanced one area.
A compaction run completes within a zone when the two scanners meet.

This is what /proc/buddyinfo looks like before and after a compaction run.

mel@arnold:~/results$ cat before-buddyinfo.txt 
Node 0, zone      DMA    150     33      6      4      2      1      1      1      1      0      0 
Node 0, zone   Normal   7901   3005   2205   1511    758    245     34      3      0      1      0 

mel@arnold:~/results$ cat after-buddyinfo.txt 
Node 0, zone      DMA    150     33      6      4      2      1      1      1      1      0      0 
Node 0, zone   Normal   1900   1187    609    325    228    178    110     32      6      4     24 

Memory compaction may be triggered explicitly by writing a node number to
/proc/sys/vm/compact_node. When a process fails to allocate a high-order
page, it may compact memory in an attempt to satisfy the allocation. Explicit
compaction does not finish until the two scanners meet. Direct compaction
ends if a suitable page becomes available.

The first patch is a rollup from the memory hot-remove patchset. The two
patches after that are changes to page migration. The second patch allows
CONFIG_MIGRATION to be set without CONFIG_NUMA.  The third patch allows
LRU pages to be isolated in batch instead of acquiring and releasing the
LRU lock a lot.

The fourth patch exports some metrics on external fragmentation which
are relevant to memory compaction. The fifth patch is what implements
memory compaction for a single zone. The sixth patch enables a node to be
compacted explicitly by writing to a special file in /proc and the final
patch implements direct compaction.

This version of the patchset should be usable on all machines and I
consider it ready for testing. It's passed tests here on x86, x86_64 and
ppc64 machines.

Here are some outstanding items on a TODO list in
no particular order.

o Have split_pagebuddy_order make blocks MOVABLE when the free page order
  is greater than pageblock_order
o Avoid racing with other allocators when direct compaction by taking the page
  the moment it becomes free
o Implement compaction_debug boot-time option like slub_debug
o Implement compaction_disable boot-time option just in case
o Investigate using debugfs as the manual compaction trigger instead of proc

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches
  2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
@ 2007-06-18  9:28 ` Mel Gorman
  2007-06-18 16:56   ` Christoph Lameter
  2007-06-18  9:29 ` [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA Mel Gorman
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2007-06-18  9:28 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mel Gorman, kamezawa.hiroyu, clameter


This is a rollup of two patches from KAMEZAWA Hiroyuki. A slightly later
version exists but this is the one I tested with and it checks page_mapped()
with the RCU lock held.

Patch 1 is "page migration by kernel v5."
Patch 2 is "isolate lru page race fix."

Changelog V5->V6
 - removed dummy_vma and uses rcu_read_lock().

In usual, migrate_pages(page,,) is called with holoding mm->sem by systemcall.
(mm here is a mm_struct which maps the migration target page.)
This semaphore helps avoiding some race conditions.

But, if we want to migrate a page by some kernel codes, we have to avoid
some races. This patch adds check code for following race condition.

1. A page which is not mapped can be target of migration. Then, we have
   to check page_mapped() before calling try_to_unmap().

2. anon_vma can be freed while page is unmapped, but page->mapping remains as
   it was. We drop page->mapcount to be 0. Then we cannot trust page->mapping.
   So, use rcu_read_lock() to prevent anon_vma pointed by page->mapping will
   not be freed during migration.

release_pages() in mm/swap.c changes page_count() to be 0
without removing PageLRU flag...

This means isolate_lru_page() can see a page, PageLRU() && page_count(page)==0..
This is BUG. (get_page() will be called against count=0 page.)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 migrate.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-clean/mm/migrate.c linux-2.6.22-rc4-mm2-005_migrationkernel/mm/migrate.c
--- linux-2.6.22-rc4-mm2-clean/mm/migrate.c	2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-005_migrationkernel/mm/migrate.c	2007-06-15 16:25:31.000000000 +0100
@@ -49,9 +49,8 @@ int isolate_lru_page(struct page *page, 
 		struct zone *zone = page_zone(page);
 
 		spin_lock_irq(&zone->lru_lock);
-		if (PageLRU(page)) {
+		if (PageLRU(page) && get_page_unless_zero(page)) {
 			ret = 0;
-			get_page(page);
 			ClearPageLRU(page);
 			if (PageActive(page))
 				del_page_from_active_list(zone, page);
@@ -612,6 +611,7 @@ static int unmap_and_move(new_page_t get
 	int rc = 0;
 	int *result = NULL;
 	struct page *newpage = get_new_page(page, private, &result);
+	int rcu_locked = 0;
 
 	if (!newpage)
 		return -ENOMEM;
@@ -632,18 +632,27 @@ static int unmap_and_move(new_page_t get
 			goto unlock;
 		wait_on_page_writeback(page);
 	}
-
+	/* anon_vma should not be freed while migration. */
+	if (PageAnon(page)) {
+		rcu_read_lock();
+		rcu_locked = 1;
+	}
 	/*
 	 * Establish migration ptes or remove ptes
 	 */
-	try_to_unmap(page, 1);
 	if (!page_mapped(page))
-		rc = move_to_new_page(newpage, page);
+		goto unlock;
+
+	try_to_unmap(page, 1);
+	rc = move_to_new_page(newpage, page);
 
 	if (rc)
 		remove_migration_ptes(page, page);
 
 unlock:
+	if (rcu_locked)
+		rcu_read_unlock();
+
 	unlock_page(page);
 
 	if (rc != -EAGAIN) {

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA
  2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
  2007-06-18  9:28 ` [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches Mel Gorman
@ 2007-06-18  9:29 ` Mel Gorman
  2007-06-18 17:04   ` Christoph Lameter
  2007-06-18  9:29 ` [PATCH 3/7] Introduce isolate_lru_page_nolock() as a lockless version of isolate_lru_page() Mel Gorman
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2007-06-18  9:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mel Gorman, kamezawa.hiroyu, clameter


CONFIG_MIGRATION currently depends on CONFIG_NUMA. move_pages() is the only
user of migration today and as this system call is only meaningful on NUMA,
it makes sense. However, memory compaction will operate within a zone and is
useful on both NUMA and non-NUMA systems. This patch allows CONFIG_MIGRATION
to be used in all memory models. To preserve existing behaviour, move_pages()
is only available when CONFIG_NUMA is set.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 include/linux/migrate.h |    6 +++---
 include/linux/mm.h      |    2 ++
 mm/Kconfig              |    1 -
 3 files changed, 5 insertions(+), 4 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-005_migrationkernel/include/linux/migrate.h linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/migrate.h
--- linux-2.6.22-rc4-mm2-005_migrationkernel/include/linux/migrate.h	2007-06-05 01:57:25.000000000 +0100
+++ linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/migrate.h	2007-06-15 16:25:37.000000000 +0100
@@ -7,7 +7,7 @@
 
 typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 
-#ifdef CONFIG_MIGRATION
+#ifdef CONFIG_NUMA
 /* Check if a vma is migratable */
 static inline int vma_migratable(struct vm_area_struct *vma)
 {
@@ -24,7 +24,9 @@ static inline int vma_migratable(struct 
 			return 0;
 	return 1;
 }
+#endif
 
+#ifdef CONFIG_MIGRATION
 extern int isolate_lru_page(struct page *p, struct list_head *pagelist);
 extern int putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
@@ -39,8 +41,6 @@ extern int migrate_vmas(struct mm_struct
 		const nodemask_t *from, const nodemask_t *to,
 		unsigned long flags);
 #else
-static inline int vma_migratable(struct vm_area_struct *vma)
-					{ return 0; }
 
 static inline int isolate_lru_page(struct page *p, struct list_head *list)
 					{ return -ENOSYS; }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-005_migrationkernel/include/linux/mm.h linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/mm.h
--- linux-2.6.22-rc4-mm2-005_migrationkernel/include/linux/mm.h	2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/mm.h	2007-06-15 16:25:37.000000000 +0100
@@ -241,6 +241,8 @@ struct vm_operations_struct {
 	int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new);
 	struct mempolicy *(*get_policy)(struct vm_area_struct *vma,
 					unsigned long addr);
+#endif /* CONFIG_NUMA */
+#ifdef CONFIG_MIGRATION
 	int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from,
 		const nodemask_t *to, unsigned long flags);
 #endif
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-005_migrationkernel/mm/Kconfig linux-2.6.22-rc4-mm2-015_migration_flatmem/mm/Kconfig
--- linux-2.6.22-rc4-mm2-005_migrationkernel/mm/Kconfig	2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-015_migration_flatmem/mm/Kconfig	2007-06-15 16:25:37.000000000 +0100
@@ -145,7 +145,6 @@ config SPLIT_PTLOCK_CPUS
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful for

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/7] Introduce isolate_lru_page_nolock() as a lockless version of isolate_lru_page()
  2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
  2007-06-18  9:28 ` [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches Mel Gorman
  2007-06-18  9:29 ` [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA Mel Gorman
@ 2007-06-18  9:29 ` Mel Gorman
  2007-06-18 17:05   ` Christoph Lameter
  2007-06-18  9:29 ` [PATCH 4/7] Provide metrics on the extent of fragmentation in zones Mel Gorman
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2007-06-18  9:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mel Gorman, kamezawa.hiroyu, clameter


Migration uses isolate_lru_page() to isolate an LRU page. This acquires
the zone->lru_lock to safely remove the page and place it on a private
list. However, this prevents the caller from batching up isolation of
multiple pages.  This patch introduces a nolock version of isolate_lru_page()
for callers that are aware of the locking requirements.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 include/linux/migrate.h |    8 +++++++-
 mm/migrate.c            |   36 +++++++++++++++++++++++++++---------
 2 files changed, 34 insertions(+), 10 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/migrate.h linux-2.6.22-rc4-mm2-020_isolate_nolock/include/linux/migrate.h
--- linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/migrate.h	2007-06-15 16:25:37.000000000 +0100
+++ linux-2.6.22-rc4-mm2-020_isolate_nolock/include/linux/migrate.h	2007-06-15 16:25:46.000000000 +0100
@@ -27,6 +27,8 @@ static inline int vma_migratable(struct 
 #endif
 
 #ifdef CONFIG_MIGRATION
+extern int locked_isolate_lru_page(struct zone *zone, struct page *p,
+						struct list_head *pagelist);
 extern int isolate_lru_page(struct page *p, struct list_head *pagelist);
 extern int putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
@@ -41,7 +43,11 @@ extern int migrate_vmas(struct mm_struct
 		const nodemask_t *from, const nodemask_t *to,
 		unsigned long flags);
 #else
-
+static inline int locked_isolate_lru_page(struct zone *zone, struct page *p,
+						struct list_head *list)
+{
+	return -ENOSYS;
+}
 static inline int isolate_lru_page(struct page *p, struct list_head *list)
 					{ return -ENOSYS; }
 static inline int putback_lru_pages(struct list_head *l) { return 0; }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-015_migration_flatmem/mm/migrate.c linux-2.6.22-rc4-mm2-020_isolate_nolock/mm/migrate.c
--- linux-2.6.22-rc4-mm2-015_migration_flatmem/mm/migrate.c	2007-06-15 16:25:31.000000000 +0100
+++ linux-2.6.22-rc4-mm2-020_isolate_nolock/mm/migrate.c	2007-06-15 16:25:46.000000000 +0100
@@ -41,6 +41,32 @@
  *  -EBUSY: page not on LRU list
  *  0: page removed from LRU list and added to the specified list.
  */
+int locked_isolate_lru_page(struct zone *zone, struct page *page,
+						struct list_head *pagelist)
+{
+	int ret = -EBUSY;
+
+	if (PageLRU(page) && get_page_unless_zero(page)) {
+		ret = 0;
+		ClearPageLRU(page);
+		if (PageActive(page))
+			del_page_from_active_list(zone, page);
+		else
+			del_page_from_inactive_list(zone, page);
+		list_add_tail(&page->lru, pagelist);
+	}
+
+	return ret;
+}
+
+/*
+ * Acquire the zone->lru_lock and isolate one page from the LRU lists. If
+ * successful put it onto the indicated list with elevated page count.
+ *
+ * Result:
+ *  -EBUSY: page not on LRU list
+ *  0: page removed from LRU list and added to the specified list.
+ */
 int isolate_lru_page(struct page *page, struct list_head *pagelist)
 {
 	int ret = -EBUSY;
@@ -49,15 +75,7 @@ int isolate_lru_page(struct page *page, 
 		struct zone *zone = page_zone(page);
 
 		spin_lock_irq(&zone->lru_lock);
-		if (PageLRU(page) && get_page_unless_zero(page)) {
-			ret = 0;
-			ClearPageLRU(page);
-			if (PageActive(page))
-				del_page_from_active_list(zone, page);
-			else
-				del_page_from_inactive_list(zone, page);
-			list_add_tail(&page->lru, pagelist);
-		}
+		ret = locked_isolate_lru_page(zone, page, pagelist);
 		spin_unlock_irq(&zone->lru_lock);
 	}
 	return ret;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/7] Provide metrics on the extent of fragmentation in zones
  2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
                   ` (2 preceding siblings ...)
  2007-06-18  9:29 ` [PATCH 3/7] Introduce isolate_lru_page_nolock() as a lockless version of isolate_lru_page() Mel Gorman
@ 2007-06-18  9:29 ` Mel Gorman
  2007-06-18 17:07   ` Christoph Lameter
  2007-06-18  9:30 ` [PATCH 5/7] Introduce a means of compacting memory within a zone Mel Gorman
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2007-06-18  9:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mel Gorman, kamezawa.hiroyu, clameter


It is useful to know the state of external fragmentation in the system
and whether allocation failures are due to low memory or external
fragmentation. This patch introduces two metrics for evaluation the state
of fragmentation and exports the information to /proc/pagetypeinfo. The
metrics will be used later to determine if it is better to compact memory
or directly reclaim for a high-order allocation to succeed.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 vmstat.c |  131 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-020_isolate_nolock/mm/vmstat.c linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/vmstat.c
--- linux-2.6.22-rc4-mm2-020_isolate_nolock/mm/vmstat.c	2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/vmstat.c	2007-06-15 16:25:55.000000000 +0100
@@ -625,6 +625,135 @@ static void pagetypeinfo_showmixedcount_
 #endif /* CONFIG_PAGE_OWNER */
 
 /*
+ * Calculate the number of free pages in a zone and how many contiguous
+ * pages are free and how many are large enough to satisfy an allocation of
+ * the target size
+ */
+void calculate_freepages(struct zone *zone, unsigned int target_order,
+				unsigned long *ret_freepages,
+				unsigned long *ret_areas_free,
+				unsigned long *ret_suitable_areas_free)
+{
+	unsigned int order;
+	unsigned long freepages;
+	unsigned long areas_free;
+	unsigned long suitable_areas_free;
+
+	freepages = areas_free = suitable_areas_free = 0;
+	for (order = 0; order < MAX_ORDER; order++) {
+		unsigned long order_areas_free;
+
+		/* Count number of free blocks */
+		order_areas_free = zone->free_area[order].nr_free;
+		areas_free += order_areas_free;
+
+		/* Count free base pages */
+		freepages += order_areas_free << order;
+
+		/* Count the number of target_order sized free blocks */
+		if (order >= target_order)
+			suitable_areas_free += order_areas_free <<
+							(order - target_order);
+	}
+
+	*ret_freepages = freepages;
+	*ret_areas_free = areas_free;
+	*ret_suitable_areas_free = suitable_areas_free;
+}
+
+/*
+ * Return an index indicating how much of the available free memory is
+ * unusable for an allocation of the requested size. A value towards 100
+ * implies that the majority of free memory is unusable and compaction
+ * may be required.
+ */
+int unusable_free_index(struct zone *zone, unsigned int target_order)
+{
+	unsigned long freepages, areas_free, suitable_areas_free;
+
+	calculate_freepages(zone, target_order,
+				&freepages, &areas_free, &suitable_areas_free);
+
+	/* No free memory is interpreted as all free memory is unusable */
+	if (freepages == 0)
+		return 100;
+
+	return ((freepages - (suitable_areas_free << target_order)) * 100) /
+								freepages;
+}
+
+/*
+ * Return the external fragmentation index for a zone. Values towards 100
+ * imply the allocation failure was due to external fragmentation. Values
+ * towards 0 imply the failure was due to lack of memory. The value is only
+ * useful when an allocation of the requested order would fail and it does
+ * not take into account pages free on the pcp list.
+ */
+int fragmentation_index(struct zone *zone, unsigned int target_order)
+{
+	unsigned long freepages, areas_free, suitable_areas_free;
+
+	calculate_freepages(zone, target_order,
+				&freepages, &areas_free, &suitable_areas_free);
+
+	/* An allocation succeeding implies this index has no meaning */
+	if (suitable_areas_free)
+		return -1;
+
+	return 100 - ((freepages / (1 << target_order)) * 100) / areas_free;
+}
+
+static void pagetypeinfo_showunusable_print(struct seq_file *m,
+					pg_data_t *pgdat, struct zone *zone)
+{
+	unsigned int order;
+
+	seq_printf(m, "Node %4d, zone %8s %19s",
+				pgdat->node_id,
+				zone->name, " ");
+	for (order = 0; order < MAX_ORDER; ++order)
+		seq_printf(m, "%6d ", unusable_free_index(zone, order));
+
+	seq_putc(m, '\n');
+}
+
+/* Print out percentage of unusable free memory at each order */
+static int pagetypeinfo_showunusable(struct seq_file *m, void *arg)
+{
+	pg_data_t *pgdat = (pg_data_t *)arg;
+
+	seq_printf(m, "\nPercentage unusable free memory at order\n");
+	walk_zones_in_node(m, pgdat, pagetypeinfo_showunusable_print);
+
+	return 0;
+}
+
+static void pagetypeinfo_showfragmentation_print(struct seq_file *m,
+					pg_data_t *pgdat, struct zone *zone)
+{
+	unsigned int order;
+
+	seq_printf(m, "Node %4d, zone %8s %19s",
+				pgdat->node_id,
+				zone->name, " ");
+	for (order = 0; order < MAX_ORDER; ++order)
+		seq_printf(m, "%6d ", fragmentation_index(zone, order));
+
+	seq_putc(m, '\n');
+}
+
+/* Print the fragmentation index at each order */
+static int pagetypeinfo_showfragmentation(struct seq_file *m, void *arg)
+{
+	pg_data_t *pgdat = (pg_data_t *)arg;
+
+	seq_printf(m, "\nFragmentation index\n");
+	walk_zones_in_node(m, pgdat, pagetypeinfo_showfragmentation_print);
+
+	return 0;
+}
+
+/*
  * Print out the number of pageblocks for each migratetype that contain pages
  * of other types. This gives an indication of how well fallbacks are being
  * contained by rmqueue_fallback(). It requires information from PAGE_OWNER
@@ -656,6 +785,8 @@ static int pagetypeinfo_show(struct seq_
 	seq_printf(m, "Pages per block:  %lu\n", pageblock_nr_pages);
 	seq_putc(m, '\n');
 	pagetypeinfo_showfree(m, pgdat);
+	pagetypeinfo_showunusable(m, pgdat);
+	pagetypeinfo_showfragmentation(m, pgdat);
 	pagetypeinfo_showblockcount(m, pgdat);
 	pagetypeinfo_showmixedcount(m, pgdat);
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 5/7] Introduce a means of compacting memory within a zone
  2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
                   ` (3 preceding siblings ...)
  2007-06-18  9:29 ` [PATCH 4/7] Provide metrics on the extent of fragmentation in zones Mel Gorman
@ 2007-06-18  9:30 ` Mel Gorman
  2007-06-18 17:18   ` Christoph Lameter
  2007-06-19 12:54   ` Yasunori Goto
  2007-06-18  9:30 ` [PATCH 6/7] Add /proc/sys/vm/compact_node for the explicit compaction of a node Mel Gorman
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-18  9:30 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mel Gorman, kamezawa.hiroyu, clameter


This patch is the core of the memory compaction mechanism. It compacts memory
in a zone such that movable pages are relocated towards the end of the zone.

A single compaction run involves a migration scanner and a free scanner.
Both scanners operate on pageblock-sized areas in the zone. The migration
scanner starts at the bottom of the zone and searches for all movable pages
within each area, isolating them onto a private list called migratelist.
The free scanner starts at the top of the zone and searches for suitable
areas and consumes the free pages within making them available for the
migration scanner. The pages isolated for migration are then migrated to
the newly isolated free pages.

Note that after this patch is applied there is still no means of triggering
a compaction run. Later patches will introduce the triggers, initially a
manual trigger.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 include/linux/compaction.h |    8 +
 include/linux/mm.h         |    1 
 mm/Makefile                |    2 
 mm/compaction.c            |  297 ++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c            |   38 +++++
 5 files changed, 345 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/include/linux/compaction.h linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/compaction.h
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/include/linux/compaction.h	2007-06-14 00:08:58.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/compaction.h	2007-06-15 16:28:59.000000000 +0100
@@ -0,0 +1,8 @@
+#ifndef _LINUX_COMPACTION_H
+#define _LINUX_COMPACTION_H
+
+/* Return values for compact_zone() */
+#define COMPACT_INCOMPLETE	0
+#define COMPACT_COMPLETE	1
+
+#endif /* _LINUX_COMPACTION_H */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/include/linux/mm.h linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/mm.h
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/include/linux/mm.h	2007-06-15 16:25:37.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/mm.h	2007-06-15 16:28:59.000000000 +0100
@@ -336,6 +336,7 @@ void put_page(struct page *page);
 void put_pages_list(struct list_head *pages);
 
 void split_page(struct page *page, unsigned int order);
+int split_free_page(struct page *page);
 
 /*
  * Compound pages have a destructor function.  Provide a
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/compaction.c linux-2.6.22-rc4-mm2-110_compact_zone/mm/compaction.c
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/compaction.c	2007-06-14 00:08:58.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/mm/compaction.c	2007-06-15 16:28:59.000000000 +0100
@@ -0,0 +1,297 @@
+/*
+ * linux/mm/compaction.c
+ *
+ * Memory compaction for the reduction of external fragmentation
+ * Copyright IBM Corp. 2007 Mel Gorman <mel@csn.ul.ie>
+ */
+#include <linux/migrate.h>
+#include <linux/compaction.h>
+#include "internal.h"
+
+/*
+ * compact_control is used to track pages being migrated and the free pages
+ * they are being migrated to during memory compaction. The free_pfn starts
+ * at the end of a zone and migrate_pfn begins at the start. Movable pages
+ * are moved to the end of a zone during a compaction run and the run
+ * completes when free_pfn <= migrate_pfn
+ */
+struct compact_control {
+	struct list_head freepages;	/* List of free pages to migrate to */
+	struct list_head migratepages;	/* List of pages being migrated */
+	unsigned long nr_freepages;	/* Number of isolated free pages */
+	unsigned long nr_migratepages;	/* Number of pages to migrate */
+	unsigned long free_pfn;		/* isolate_freepages search base */
+	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+};
+
+static int release_freepages(struct zone *zone, struct list_head *freelist)
+{
+	struct page *page, *next;
+	int count = 0;
+
+	list_for_each_entry_safe(page, next, freelist, lru) {
+		list_del(&page->lru);
+		__free_page(page);
+		count++;
+	}
+
+	return count;
+}
+
+/* Isolate free pages onto a private freelist. Must hold zone->lock */
+static int isolate_freepages_block(struct zone *zone,
+				unsigned long blockpfn,
+				struct list_head *freelist)
+{
+	unsigned long zone_end_pfn, end_pfn;
+	int total_isolated = 0;
+
+	/* Get the last PFN we should scan for free pages at */
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+	end_pfn = blockpfn + pageblock_nr_pages;
+	if (end_pfn > zone_end_pfn)
+		end_pfn = zone_end_pfn;
+
+	/* Isolate free pages. This assumes the block is valid */
+	for (; blockpfn < end_pfn; blockpfn++) {
+		struct page *page;
+		int isolated, i;
+
+		if (!pfn_valid_within(blockpfn))
+			continue;
+
+		page = pfn_to_page(blockpfn);
+		if (!PageBuddy(page))
+			continue;
+
+		/* Found a free page, break it into order-0 pages */
+		isolated = split_free_page(page);
+		total_isolated += isolated;
+		for (i = 0; i < isolated; i++) {
+			list_add(&page->lru, freelist);
+			page++;
+		}
+		blockpfn += isolated - 1;
+	}
+
+	return total_isolated;
+}
+
+/* Returns 1 if the page is within a block suitable for migration to */
+static int pageblock_migratable(struct page *page)
+{
+	/* If the page is a large free page, then allow migration */
+	if (PageBuddy(page) && page_order(page) >= pageblock_order)
+		return 1;
+
+	/* If the block is MIGRATE_MOVABLE, allow migration */
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+		return 1;
+
+	/* Otherwise skip the block */
+	return 0;
+}
+
+/*
+ * Based on information in the current compact_control, find blocks
+ * suitable for isolating free pages from
+ */
+static void isolate_freepages(struct zone *zone,
+				struct compact_control *cc)
+{
+	struct page *page;
+	unsigned long high_pfn, low_pfn, pfn;
+	int nr_freepages = cc->nr_freepages;
+	struct list_head *freelist = &cc->freepages;
+
+	pfn = cc->free_pfn;
+	low_pfn = cc->migrate_pfn + pageblock_nr_pages;
+	high_pfn = low_pfn;
+
+	/*
+	 * Isolate free pages until enough are available to migrate the
+	 * pages on cc->migratepages. We stop searching if the migrate
+	 * and free page scanners meet or enough free pages are isolated.
+	 */
+	spin_lock_irq(&zone->lock);
+	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
+					pfn -= pageblock_nr_pages) {
+		int isolated;
+
+		if (!pfn_valid(pfn))
+			continue;
+
+		/* Check for overlapping nodes/zones */
+		page = pfn_to_page(pfn);
+		if (page_zone(page) != zone)
+			continue;
+
+		/* Check the block is suitable for migration */
+		if (!pageblock_migratable(page))
+			continue;
+
+		/* Found a block suitable for isolating free pages from */
+		isolated = isolate_freepages_block(zone, pfn, freelist);
+		nr_freepages += isolated;
+
+		/*
+		 * Record the highest PFN we isolated pages from. When next
+		 * looking for free pages, the search will restart here as
+		 * page migration may have returned some pages to the allocator
+		 */
+		if (isolated)
+			high_pfn = max(high_pfn, pfn);
+	}
+	spin_unlock_irq(&zone->lock);
+
+	cc->free_pfn = high_pfn;
+	cc->nr_freepages = nr_freepages;
+}
+
+/*
+ * Isolate all pages that can be migrated from the block pointed to by
+ * the migrate scanner within compact_control. We migrate pages from
+ * all block-types as the intention is to have all movable pages towards
+ * the end of the zone.
+ */
+static int isolate_migratepages(struct zone *zone,
+					struct compact_control *cc)
+{
+	unsigned long high_pfn, low_pfn, end_pfn, start_pfn;
+	struct page *page;
+	int isolated = 0;
+	struct list_head *migratelist;
+
+	high_pfn = cc->free_pfn;
+	low_pfn = ALIGN(cc->migrate_pfn, pageblock_nr_pages);
+	migratelist = &cc->migratepages;
+
+	/* Do not scan outside zone boundaries */
+	if (low_pfn < zone->zone_start_pfn)
+		low_pfn = zone->zone_start_pfn;
+
+	/* Setup to scan one block but not past where we are migrating to */
+	end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
+	if (end_pfn > high_pfn)
+		end_pfn = high_pfn;
+	start_pfn = low_pfn;
+
+	/* Time to isolate some pages for migration */
+	spin_lock_irq(&zone->lru_lock);
+	for (; low_pfn < end_pfn; low_pfn++) {
+		if (!pfn_valid_within(low_pfn))
+			continue;
+
+		/* Get the page and skip if free */
+		page = pfn_to_page(low_pfn);
+		if (PageBuddy(page)) {
+			low_pfn += (1 << page_order(page)) - 1;
+			continue;
+		}
+
+		/* Try isolate the page */
+		if (locked_isolate_lru_page(zone, page, migratelist) == 0)
+			isolated++;
+	}
+	spin_unlock_irq(&zone->lru_lock);
+
+	cc->migrate_pfn = end_pfn;
+	cc->nr_migratepages += isolated;
+	return isolated;
+}
+
+/*
+ * This is a migrate-callback that "allocates" freepages by taking pages
+ * from the isolated freelists in the block we are migrating to.
+ */
+static struct page *compaction_alloc(struct page *migratepage,
+					unsigned long data,
+					int **result)
+{
+	struct compact_control *cc = (struct compact_control *)data;
+	struct page *freepage;
+
+	VM_BUG_ON(cc == NULL);
+	if (list_empty(&cc->freepages))
+		return NULL;
+
+	freepage = list_entry(cc->freepages.next, struct page, lru);
+	list_del(&freepage->lru);
+	cc->nr_freepages--;
+
+#ifdef CONFIG_PAGE_OWNER
+	freepage->order = migratepage->order;
+	freepage->gfp_mask = migratepage->gfp_mask;
+	memcpy(freepage->trace, migratepage->trace, sizeof(freepage->trace));
+#endif
+
+	return freepage;
+}
+
+/*
+ * We cannot control nr_migratepages and nr_freepages fully when migration is
+ * running as migrate_pages() has no knowledge of compact_control. When
+ * migration is complete, we count the number of pages on the lists by hand.
+ */
+static void update_nr_listpages(struct compact_control *cc)
+{
+	int nr_migratepages = 0;
+	int nr_freepages = 0;
+	struct page *page;
+	list_for_each_entry(page, &cc->migratepages, lru)
+		nr_migratepages++;
+	list_for_each_entry(page, &cc->freepages, lru)
+		nr_freepages++;
+
+	cc->nr_migratepages = nr_migratepages;
+	cc->nr_freepages = nr_freepages;
+}
+
+static inline int compact_finished(struct zone *zone,
+						struct compact_control *cc)
+{
+	/* Compaction run completes if the migrate and free scanner meet */
+	if (cc->free_pfn <= cc->migrate_pfn)
+		return COMPACT_COMPLETE;
+
+	return COMPACT_INCOMPLETE;
+}
+
+static int compact_zone(struct zone *zone, struct compact_control *cc)
+{
+	int ret = COMPACT_INCOMPLETE;
+
+	/* Setup to move all movable pages to the end of the zone */
+	cc->migrate_pfn = zone->zone_start_pfn;
+	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
+	cc->free_pfn &= ~(pageblock_nr_pages-1);
+
+	for (; ret == COMPACT_INCOMPLETE; ret = compact_finished(zone, cc)) {
+		isolate_migratepages(zone, cc);
+
+		if (!cc->nr_migratepages)
+			continue;
+
+		/* Isolate free pages if necessary */
+		if (cc->nr_freepages < cc->nr_migratepages)
+			isolate_freepages(zone, cc);
+
+		/* Stop compacting if we cannot get enough free pages */
+		if (cc->nr_freepages < cc->nr_migratepages)
+			break;
+
+		migrate_pages(&cc->migratepages, compaction_alloc,
+							(unsigned long)cc);
+		update_nr_listpages(cc);
+	}
+
+	/* Release free pages and check accounting */
+	cc->nr_freepages -= release_freepages(zone, &cc->freepages);
+	WARN_ON(cc->nr_freepages != 0);
+
+	/* Release LRU pages not migrated */
+	if (!list_empty(&cc->migratepages))
+		putback_lru_pages(&cc->migratepages);
+
+	return ret;
+}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/Makefile linux-2.6.22-rc4-mm2-110_compact_zone/mm/Makefile
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/Makefile	2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/mm/Makefile	2007-06-15 16:28:59.000000000 +0100
@@ -27,7 +27,7 @@ obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_SLUB) += slub.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
-obj-$(CONFIG_MIGRATION) += migrate.o
+obj-$(CONFIG_MIGRATION) += migrate.o compaction.o
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/page_alloc.c linux-2.6.22-rc4-mm2-110_compact_zone/mm/page_alloc.c
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/page_alloc.c	2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/mm/page_alloc.c	2007-06-15 16:28:59.000000000 +0100
@@ -1060,6 +1060,44 @@ void split_page(struct page *page, unsig
 		set_page_refcounted(page + i);
 }
 
+/* Similar to split_page except the page is already free */
+int split_free_page(struct page *page)
+{
+	int order;
+	struct zone *zone;
+
+	/* Should never happen but handle it anyway */
+	if (!page || !PageBuddy(page))
+		return 0;
+
+	zone = page_zone(page);
+	order = page_order(page);
+
+	/* Obey watermarks or the system could deadlock */
+	if (!zone_watermark_ok(zone, 0, zone->pages_low + (1 << order), 0, 0))
+		return 0;
+
+	/* Remove page from free list */
+	list_del(&page->lru);
+	zone->free_area[order].nr_free--;
+	rmv_page_order(page);
+	__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+
+	/* Split into individual pages */
+	set_page_refcounted(page);
+	split_page(page, order);
+
+	/* Set the migratetype of the block if necessary */
+	if (order >= pageblock_order - 1 &&
+			get_pageblock_migratetype(page) != MIGRATE_MOVABLE) {
+		struct page *endpage = page + (1 << order) - 1;
+		for (; page < endpage; page += pageblock_nr_pages)
+			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	}
+
+	return 1 << order;
+}
+
 /*
  * Really, prep_compound_page() should be called from __rmqueue_bulk().  But
  * we cheat by calling it from here, in the order > 0 path.  Saves a branch

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/7] Add /proc/sys/vm/compact_node for the explicit compaction of a node
  2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
                   ` (4 preceding siblings ...)
  2007-06-18  9:30 ` [PATCH 5/7] Introduce a means of compacting memory within a zone Mel Gorman
@ 2007-06-18  9:30 ` Mel Gorman
  2007-06-18  9:30 ` [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails Mel Gorman
  2007-06-18 17:24 ` [PATCH 0/7] Memory Compaction v2 Christoph Lameter
  7 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-18  9:30 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mel Gorman, kamezawa.hiroyu, clameter


This patch adds a special file /proc/sys/vm/compact_node. When a number is
written to this file, each zone in that node will be compacted.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
---

 include/linux/compaction.h |    7 +++++
 kernel/sysctl.c            |   13 +++++++++
 mm/compaction.c            |   54 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/compaction.h linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/compaction.h
--- linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/compaction.h	2007-06-15 16:28:59.000000000 +0100
+++ linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/compaction.h	2007-06-15 16:29:08.000000000 +0100
@@ -5,4 +5,11 @@
 #define COMPACT_INCOMPLETE	0
 #define COMPACT_COMPLETE	1
 
+#ifdef CONFIG_MIGRATION
+
+extern int sysctl_compaction_handler(struct ctl_table *table, int write,
+				struct file *file, void __user *buffer,
+				size_t *length, loff_t *ppos);
+
+#endif /* CONFIG_MIGRATION */
 #endif /* _LINUX_COMPACTION_H */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-110_compact_zone/kernel/sysctl.c linux-2.6.22-rc4-mm2-115_compact_viaproc/kernel/sysctl.c
--- linux-2.6.22-rc4-mm2-110_compact_zone/kernel/sysctl.c	2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-115_compact_viaproc/kernel/sysctl.c	2007-06-15 16:29:08.000000000 +0100
@@ -47,6 +47,7 @@
 #include <linux/nfs_fs.h>
 #include <linux/acpi.h>
 #include <linux/reboot.h>
+#include <linux/compaction.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -77,6 +78,7 @@ extern int printk_ratelimit_jiffies;
 extern int printk_ratelimit_burst;
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
+extern int sysctl_compact_node;
 extern int percpu_pagelist_fraction;
 extern int compat_log;
 extern int maps_protect;
@@ -858,6 +860,17 @@ static ctl_table vm_table[] = {
 		.proc_handler	= drop_caches_sysctl_handler,
 		.strategy	= &sysctl_intvec,
 	},
+#ifdef CONFIG_MIGRATION
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "compact_node",
+		.data		= &sysctl_compact_node,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= sysctl_compaction_handler,
+		.strategy	= &sysctl_intvec,
+	},
+#endif /* CONFIG_MIGRATION */
 	{
 		.ctl_name	= VM_MIN_FREE_KBYTES,
 		.procname	= "min_free_kbytes",
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-110_compact_zone/mm/compaction.c linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/compaction.c
--- linux-2.6.22-rc4-mm2-110_compact_zone/mm/compaction.c	2007-06-15 16:28:59.000000000 +0100
+++ linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/compaction.c	2007-06-15 16:29:08.000000000 +0100
@@ -6,6 +6,8 @@
  */
 #include <linux/migrate.h>
 #include <linux/compaction.h>
+#include <linux/swap.h>
+#include <linux/sysctl.h>
 #include "internal.h"
 
 /*
@@ -295,3 +297,55 @@ static int compact_zone(struct zone *zon
 
 	return ret;
 }
+
+/* Compact all zones within a node */
+int compact_node(int nodeid)
+{
+	int zoneid;
+	pg_data_t *pgdat;
+	struct zone *zone;
+
+	if (nodeid < 0 || nodeid > nr_node_ids || !node_online(nodeid))
+		return -EINVAL;
+	pgdat = NODE_DATA(nodeid);
+
+	/* Flush pending updates to the LRU lists */
+	lru_add_drain_all();
+
+	printk(KERN_INFO "Compacting memory in node %d\n", nodeid);
+	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
+		struct compact_control cc;
+
+		zone = &pgdat->node_zones[zoneid];
+		if (!populated_zone(zone))
+			continue;
+
+		cc.nr_freepages = 0;
+		cc.nr_migratepages = 0;
+		INIT_LIST_HEAD(&cc.freepages);
+		INIT_LIST_HEAD(&cc.migratepages);
+
+		compact_zone(zone, &cc);
+
+		VM_BUG_ON(!list_empty(&cc.freepages));
+		VM_BUG_ON(!list_empty(&cc.migratepages));
+	}
+	printk(KERN_INFO "Compaction of node %d complete\n", nodeid);
+
+	return 0;
+}
+
+/* This is global and fierce ugly but it's straight-forward */
+int sysctl_compact_node;
+
+/* This is the entry point for compacting nodes via /proc/sys/vm */
+int sysctl_compaction_handler(struct ctl_table *table, int write,
+			struct file *file, void __user *buffer,
+			size_t *length, loff_t *ppos)
+{
+	proc_dointvec(table, write, file, buffer, length, ppos);
+	if (write)
+		return compact_node(sysctl_compact_node);
+
+	return 0;
+}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails
  2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
                   ` (5 preceding siblings ...)
  2007-06-18  9:30 ` [PATCH 6/7] Add /proc/sys/vm/compact_node for the explicit compaction of a node Mel Gorman
@ 2007-06-18  9:30 ` Mel Gorman
  2007-06-18 17:22   ` Christoph Lameter
  2007-06-21 12:28   ` Andrew Morton
  2007-06-18 17:24 ` [PATCH 0/7] Memory Compaction v2 Christoph Lameter
  7 siblings, 2 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-18  9:30 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Mel Gorman, kamezawa.hiroyu, clameter


Ordinarily when a high-order allocation fails, direct reclaim is entered to
free pages to satisfy the allocation.  With this patch, it is determined if
an allocation failed due to external fragmentation instead of low memory
and if so, the calling process will compact until a suitable page is
freed. Compaction by moving pages in memory is considerably cheaper than
paging out to disk and works where there are locked pages or no swap. If
compaction fails to free a page of a suitable size, then reclaim will
still occur.

Direct compaction returns as soon as possible. As each block is compacted,
it is checked if a suitable page has been freed and if so, it returns.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---

 include/linux/compaction.h |   12 ++++
 include/linux/vmstat.h     |    1 
 mm/compaction.c            |  103 ++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c            |   21 ++++++++
 mm/vmstat.c                |    4 +
 5 files changed, 140 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/compaction.h linux-2.6.22-rc4-mm2-120_compact_direct/include/linux/compaction.h
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/compaction.h	2007-06-15 16:29:08.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/include/linux/compaction.h	2007-06-15 16:29:20.000000000 +0100
@@ -1,15 +1,25 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
-/* Return values for compact_zone() */
+/* Return values for compact_zone() and try_to_compact_pages() */
 #define COMPACT_INCOMPLETE	0
 #define COMPACT_COMPLETE	1
+#define COMPACT_PARTIAL	2
 
 #ifdef CONFIG_MIGRATION
 
+extern int fragmentation_index(struct zone *zone, unsigned int target_order);
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
 				struct file *file, void __user *buffer,
 				size_t *length, loff_t *ppos);
+extern unsigned long try_to_compact_pages(struct zone **zones,
+						int order, gfp_t gfp_mask);
 
+#else
+static inline unsigned long try_to_compact_pages(struct zone **zones,
+						int order, gfp_t gfp_mask)
+{
+	return COMPACT_COMPLETE;
+}
 #endif /* CONFIG_MIGRATION */
 #endif /* _LINUX_COMPACTION_H */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/vmstat.h linux-2.6.22-rc4-mm2-120_compact_direct/include/linux/vmstat.h
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/vmstat.h	2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/include/linux/vmstat.h	2007-06-15 16:29:20.000000000 +0100
@@ -37,6 +37,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 		FOR_ALL_ZONES(PGSCAN_DIRECT),
 		PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
 		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+		COMPACTSTALL, COMPACTSUCCESS, COMPACTRACE,
 		NR_VM_EVENT_ITEMS
 };
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/compaction.c linux-2.6.22-rc4-mm2-120_compact_direct/mm/compaction.c
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/compaction.c	2007-06-15 16:29:08.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/mm/compaction.c	2007-06-15 16:32:27.000000000 +0100
@@ -24,6 +24,8 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	int required_order;		/* order a direct compactor needs */
+	int mtype;			/* type of high-order page required */
 };
 
 static int release_freepages(struct zone *zone, struct list_head *freelist)
@@ -252,10 +254,29 @@ static void update_nr_listpages(struct c
 static inline int compact_finished(struct zone *zone,
 						struct compact_control *cc)
 {
+	int order;
+
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (cc->free_pfn <= cc->migrate_pfn)
 		return COMPACT_COMPLETE;
 
+	if (cc->required_order == -1)
+		return COMPACT_INCOMPLETE;
+
+	/* Check for page of the appropriate type when direct compacting */
+	for (order = cc->required_order; order < MAX_ORDER; order++) {
+		/*
+		 * If the current order is greater than pageblock_order, then
+		 * the block is eligible for allocation
+		 */
+		if (order >= pageblock_order && zone->free_area[order].nr_free)
+			return COMPACT_PARTIAL;
+
+		/* Otherwise use a page is free and of the right type */
+		if (!list_empty(&zone->free_area[order].free_list[cc->mtype]))
+			return COMPACT_PARTIAL;
+	}
+
 	return COMPACT_INCOMPLETE;
 }
 
@@ -298,6 +319,87 @@ static int compact_zone(struct zone *zon
 	return ret;
 }
 
+static inline unsigned long compact_zone_order(struct zone *zone,
+						int order, gfp_t gfp_mask)
+{
+	struct compact_control cc = {
+		.nr_freepages = 0,
+		.nr_migratepages = 0,
+		.required_order = order,
+		.mtype = allocflags_to_migratetype(gfp_mask),
+	};
+	INIT_LIST_HEAD(&cc.freepages);
+	INIT_LIST_HEAD(&cc.migratepages);
+
+	return compact_zone(zone, &cc);
+}
+
+/**
+ * try_to_compact_pages - Compact memory directly to satisfy a high-order allocation
+ * @zones: The zonelist used for the current allocation
+ * @order: The order of the current allocation
+ * @gfp_mask: The GFP mask of the current allocation
+ *
+ * This is the main entry point for direct page compaction.
+ *
+ * Returns 0 if compaction fails to free a page of the required size and type
+ * Returns non-zero on success
+ */
+unsigned long try_to_compact_pages(struct zone **zones,
+						int order, gfp_t gfp_mask)
+{
+	unsigned long watermark;
+	int may_enter_fs = gfp_mask & __GFP_FS;
+	int may_perform_io = gfp_mask & __GFP_IO;
+	int i;
+	int status = COMPACT_INCOMPLETE;
+
+	/* Check whether it is worth even starting compaction */
+	if (order == 0 || !may_enter_fs || !may_perform_io)
+		return status;
+
+	/* Flush pending updates to the LRU lists on the local CPU */
+	lru_add_drain();
+
+	/* Compact each zone in the list */
+	for (i = 0; zones[i] != NULL; i++) {
+		struct zone *zone = zones[i];
+		int fragindex;
+
+		/*
+		 * If watermarks are not met, compaction will not help.
+		 * Note that we check the watermarks at order-0 as we
+		 * are assuming some free pages will coalesce
+		 */
+		watermark = zone->pages_low + (1 << order);
+		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+			continue;
+
+		/*
+		 * fragmentation index determines if allocation failures are
+		 * due to low memory or external fragmentation
+		 *
+		 * index of -1 implies allocations would succeed
+		 * index < 50 implies alloc failure is due to lack of memory
+		 */
+		fragindex = fragmentation_index(zone, order);
+		if (fragindex < 50)
+			continue;
+
+		status = compact_zone_order(zone, order, gfp_mask);
+		if (status == COMPACT_PARTIAL) {
+			count_vm_event(COMPACTSUCCESS);
+			break;
+		}
+	}
+
+	/* Account for it if we stalled due to compaction */
+	if (status != COMPACT_INCOMPLETE)
+		count_vm_event(COMPACTSTALL);
+
+	return status;
+}
+
 /* Compact all zones within a node */
 int compact_node(int nodeid)
 {
@@ -322,6 +424,7 @@ int compact_node(int nodeid)
 
 		cc.nr_freepages = 0;
 		cc.nr_migratepages = 0;
+		cc.required_order = -1;
 		INIT_LIST_HEAD(&cc.freepages);
 		INIT_LIST_HEAD(&cc.migratepages);
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/page_alloc.c linux-2.6.22-rc4-mm2-120_compact_direct/mm/page_alloc.c
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/page_alloc.c	2007-06-15 16:28:59.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/mm/page_alloc.c	2007-06-15 16:29:20.000000000 +0100
@@ -41,6 +41,7 @@
 #include <linux/pfn.h>
 #include <linux/backing-dev.h>
 #include <linux/fault-inject.h>
+#include <linux/compaction.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1670,6 +1671,26 @@ nofail_alloc:
 
 	cond_resched();
 
+	/* Try memory compaction for high-order allocations before reclaim */
+	if (order != 0) {
+		drain_all_local_pages();
+		did_some_progress = try_to_compact_pages(zonelist->zones,
+							order, gfp_mask);
+		if (did_some_progress == COMPACT_PARTIAL) {
+			page = get_page_from_freelist(gfp_mask, order,
+						zonelist, alloc_flags);
+
+			if (page)
+				goto got_pg;
+
+			/*
+			 * It's a race if compaction frees a suitable page but
+			 * someone else allocates it
+			 */
+			count_vm_event(COMPACTRACE);
+		}
+	}
+
 	/* We now go into synchronous reclaim */
 	cpuset_memory_pressure_bump();
 	p->flags |= PF_MEMALLOC;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/vmstat.c linux-2.6.22-rc4-mm2-120_compact_direct/mm/vmstat.c
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/vmstat.c	2007-06-15 16:25:55.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/mm/vmstat.c	2007-06-15 16:29:20.000000000 +0100
@@ -882,6 +882,10 @@ static const char * const vmstat_text[] 
 	"allocstall",
 
 	"pgrotated",
+
+	"compact_stall",
+	"compact_success",
+	"compact_race",
 #endif
 };
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches
  2007-06-18  9:28 ` [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches Mel Gorman
@ 2007-06-18 16:56   ` Christoph Lameter
  2007-06-19 15:52     ` Mel Gorman
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-06-18 16:56 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On Mon, 18 Jun 2007, Mel Gorman wrote:

> @@ -632,18 +632,27 @@ static int unmap_and_move(new_page_t get
>  			goto unlock;
>  		wait_on_page_writeback(page);
>  	}
> -
> +	/* anon_vma should not be freed while migration. */
> +	if (PageAnon(page)) {
> +		rcu_read_lock();
> +		rcu_locked = 1;
> +	}

We agreed on doing rcu_read_lock removing the status variable 
and checking for PageAnon(). Doing so deuglifies the 
function.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA
  2007-06-18  9:29 ` [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA Mel Gorman
@ 2007-06-18 17:04   ` Christoph Lameter
  2007-06-19 15:59     ` Mel Gorman
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-06-18 17:04 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On Mon, 18 Jun 2007, Mel Gorman wrote:

> 
> CONFIG_MIGRATION currently depends on CONFIG_NUMA. move_pages() is the only
> user of migration today and as this system call is only meaningful on NUMA,
> it makes sense. However, memory compaction will operate within a zone and is

There are more user of migration. move_pages is one of them, then there is
cpuset process migration, MPOL_BIND page migration and sys_migrate_pages 
for explicit process migration.

> useful on both NUMA and non-NUMA systems. This patch allows CONFIG_MIGRATION
> to be used in all memory models. To preserve existing behaviour, move_pages()
> is only available when CONFIG_NUMA is set.

What does this have to do with memory models? A bit unclear.

Otherwise

Acked-by: Christoph Lameter <clameter@sgi.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] Introduce isolate_lru_page_nolock() as a lockless version of isolate_lru_page()
  2007-06-18  9:29 ` [PATCH 3/7] Introduce isolate_lru_page_nolock() as a lockless version of isolate_lru_page() Mel Gorman
@ 2007-06-18 17:05   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2007-06-18 17:05 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

Acked-by: Christoph Lameter <clameter@sgi.com>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/7] Provide metrics on the extent of fragmentation in zones
  2007-06-18  9:29 ` [PATCH 4/7] Provide metrics on the extent of fragmentation in zones Mel Gorman
@ 2007-06-18 17:07   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2007-06-18 17:07 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

Good idea. 

Signed-off-by: Christoph Lameter <clameter@sgi.com>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/7] Introduce a means of compacting memory within a zone
  2007-06-18  9:30 ` [PATCH 5/7] Introduce a means of compacting memory within a zone Mel Gorman
@ 2007-06-18 17:18   ` Christoph Lameter
  2007-06-19 16:36     ` Mel Gorman
  2007-06-19 12:54   ` Yasunori Goto
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-06-18 17:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On Mon, 18 Jun 2007, Mel Gorman wrote:

> +	/* Isolate free pages. This assumes the block is valid */
> +	for (; blockpfn < end_pfn; blockpfn++) {
> +		struct page *page;
> +		int isolated, i;
> +
> +		if (!pfn_valid_within(blockpfn))
> +			continue;
> +
> +		page = pfn_to_page(blockpfn);
> +		if (!PageBuddy(page))
> +			continue;

The name PageBuddy is getting to be misleading. Maybe rename this to
PageFree or so?

> +
> +		/* Found a free page, break it into order-0 pages */
> +		isolated = split_free_page(page);
> +		total_isolated += isolated;
> +		for (i = 0; i < isolated; i++) {
> +			list_add(&page->lru, freelist);
> +			page++;
> +		}

Why do you need to break them all up? Easier to coalesce later?

> +/* Returns 1 if the page is within a block suitable for migration to */
> +static int pageblock_migratable(struct page *page)
> +{
> +	/* If the page is a large free page, then allow migration */
> +	if (PageBuddy(page) && page_order(page) >= pageblock_order)
> +		return 1;

if (PageSlab(page) && page->slab->ops->kick) {
	migratable slab
}

if (page table page) {
	migratable page table page?
}

etc?

> +		/* Try isolate the page */
> +		if (locked_isolate_lru_page(zone, page, migratelist) == 0)
> +			isolated++;

Support for other ways of migrating a page?

> +static int compact_zone(struct zone *zone, struct compact_control *cc)
> +{
> +	int ret = COMPACT_INCOMPLETE;
> +
> +	/* Setup to move all movable pages to the end of the zone */
> +	cc->migrate_pfn = zone->zone_start_pfn;
> +	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
> +	cc->free_pfn &= ~(pageblock_nr_pages-1);
> +
> +	for (; ret == COMPACT_INCOMPLETE; ret = compact_finished(zone, cc)) {
> +		isolate_migratepages(zone, cc);
> +
> +		if (!cc->nr_migratepages)
> +			continue;
> +
> +		/* Isolate free pages if necessary */
> +		if (cc->nr_freepages < cc->nr_migratepages)
> +			isolate_freepages(zone, cc);
> +
> +		/* Stop compacting if we cannot get enough free pages */
> +		if (cc->nr_freepages < cc->nr_migratepages)
> +			break;
> +
> +		migrate_pages(&cc->migratepages, compaction_alloc,
> +							(unsigned long)cc);

You do not need to check the result of migration? Page migration is a best 
effort that may fail.

Looks good otherwise.

Acked-by: Christoph Lameter <clameter@sgi.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails
  2007-06-18  9:30 ` [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails Mel Gorman
@ 2007-06-18 17:22   ` Christoph Lameter
  2007-06-19 16:50     ` Mel Gorman
  2007-06-21 12:28   ` Andrew Morton
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-06-18 17:22 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

You are amazing.

Acked-by: Christoph Lameter <clameter@sgi.com>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Memory Compaction v2
  2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
                   ` (6 preceding siblings ...)
  2007-06-18  9:30 ` [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails Mel Gorman
@ 2007-06-18 17:24 ` Christoph Lameter
  2007-06-19 16:58   ` Mel Gorman
  7 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-06-18 17:24 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On Mon, 18 Jun 2007, Mel Gorman wrote:

> The patchset implements memory compaction for the page allocator reducing
> external fragmentation so that free memory exists as fewer, but larger
> contiguous blocks. Instead of being a full defragmentation solution,
> this focuses exclusively on pages that are movable via the page migration
> mechanism.

We need an additional facility at some point that allows the moving of 
pages that are not on the LRU. Such support seems to be possible
for page table pages and slab pages.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/7] Introduce a means of compacting memory within a zone
  2007-06-18  9:30 ` [PATCH 5/7] Introduce a means of compacting memory within a zone Mel Gorman
  2007-06-18 17:18   ` Christoph Lameter
@ 2007-06-19 12:54   ` Yasunori Goto
  2007-06-19 16:49     ` Mel Gorman
  1 sibling, 1 reply; 26+ messages in thread
From: Yasunori Goto @ 2007-06-19 12:54 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu, clameter

Hi Mel-san.
This is very interesting feature.

Now, I'm testing your patches.

> +static int isolate_migratepages(struct zone *zone,
> +					struct compact_control *cc)
> +{
> +	unsigned long high_pfn, low_pfn, end_pfn, start_pfn;

(snip)

> +	/* Time to isolate some pages for migration */
> +	spin_lock_irq(&zone->lru_lock);
> +	for (; low_pfn < end_pfn; low_pfn++) {
> +		if (!pfn_valid_within(low_pfn))
> +			continue;
> +
> +		/* Get the page and skip if free */
> +		page = pfn_to_page(low_pfn);

I met panic at here on my tiger4.

I compiled with CONFIG_SPARSEMEM. So, CONFIG_HOLES_IN_ZONE is not set.
pfn_valid_within() returns 1 every time on this configuration.
(This config is for only virtual memmap)
But, my tiger4 box has memory holes in normal zone.

When it is changed to normal pfn_valid(), no panic occurs.

Hmmm.

Bye.
-- 
Yasunori Goto 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches
  2007-06-18 16:56   ` Christoph Lameter
@ 2007-06-19 15:52     ` Mel Gorman
  0 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-19 15:52 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On (18/06/07 09:56), Christoph Lameter didst pronounce:
> On Mon, 18 Jun 2007, Mel Gorman wrote:
> 
> > @@ -632,18 +632,27 @@ static int unmap_and_move(new_page_t get
> >  			goto unlock;
> >  		wait_on_page_writeback(page);
> >  	}
> > -
> > +	/* anon_vma should not be freed while migration. */
> > +	if (PageAnon(page)) {
> > +		rcu_read_lock();
> > +		rcu_locked = 1;
> > +	}
> 
> We agreed on doing rcu_read_lock removing the status variable 
> and checking for PageAnon(). Doing so deuglifies the 
> function.

It makes it less ugly but when making the retry-logic for migration better I
was also routinely locking up my test-box hard. I intend to run this inside
a simulator so I can use gdb to figure out what is going wrong but for the
moment I've actually gone back to using a slightly modified anon_vma patch.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA
  2007-06-18 17:04   ` Christoph Lameter
@ 2007-06-19 15:59     ` Mel Gorman
  0 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-19 15:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On (18/06/07 10:04), Christoph Lameter didst pronounce:
> On Mon, 18 Jun 2007, Mel Gorman wrote:
> 
> > 
> > CONFIG_MIGRATION currently depends on CONFIG_NUMA. move_pages() is the only
> > user of migration today and as this system call is only meaningful on NUMA,
> > it makes sense. However, memory compaction will operate within a zone and is
> 
> There are more user of migration. move_pages is one of them, then there is
> cpuset process migration, MPOL_BIND page migration and sys_migrate_pages 
> for explicit process migration.

Ok, this was poor phrasing. Each of those features are NUMA related even
though the core migration mechanism is not dependant on NUMA.

> 
> > useful on both NUMA and non-NUMA systems. This patch allows CONFIG_MIGRATION
> > to be used in all memory models. To preserve existing behaviour, move_pages()
> > is only available when CONFIG_NUMA is set.
> 
> What does this have to do with memory models? A bit unclear.
> 

More poor phrasing. It would have been clearer to simply say that the
patch allows CONFIG_MIGRATION to be used without NUMA.

> Otherwise
> 
> Acked-by: Christoph Lameter <clameter@sgi.com>

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/7] Introduce a means of compacting memory within a zone
  2007-06-18 17:18   ` Christoph Lameter
@ 2007-06-19 16:36     ` Mel Gorman
  2007-06-19 19:20       ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2007-06-19 16:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On (18/06/07 10:18), Christoph Lameter didst pronounce:
> On Mon, 18 Jun 2007, Mel Gorman wrote:
> 
> > +	/* Isolate free pages. This assumes the block is valid */
> > +	for (; blockpfn < end_pfn; blockpfn++) {
> > +		struct page *page;
> > +		int isolated, i;
> > +
> > +		if (!pfn_valid_within(blockpfn))
> > +			continue;
> > +
> > +		page = pfn_to_page(blockpfn);
> > +		if (!PageBuddy(page))
> > +			continue;
> 
> The name PageBuddy is getting to be misleading. Maybe rename this to
> PageFree or so?
> 

That would be suprisingly ambiguous. per-cpu pages are free pages but are not
PageBuddy pages. In this case, I really mean a PageBuddy page, not a free page.

> > +
> > +		/* Found a free page, break it into order-0 pages */
> > +		isolated = split_free_page(page);
> > +		total_isolated += isolated;
> > +		for (i = 0; i < isolated; i++) {
> > +			list_add(&page->lru, freelist);
> > +			page++;
> > +		}
> 
> Why do you need to break them all up? Easier to coalesce later?
> 

They are broken up because migration currently works on order-0 pages.
It is easier to break them up now for compaction_alloc() to give out one
at a time than trying to figure out how to split them up later.

> > +/* Returns 1 if the page is within a block suitable for migration to */
> > +static int pageblock_migratable(struct page *page)
> > +{
> > +	/* If the page is a large free page, then allow migration */
> > +	if (PageBuddy(page) && page_order(page) >= pageblock_order)
> > +		return 1;
> 
> if (PageSlab(page) && page->slab->ops->kick) {
> 	migratable slab
> }
> 
> if (page table page) {
> 	migratable page table page?
> }
> 
> etc?
> 

Not quite. pageblock_migratable() is telling if this block is suitable for
taking free pages from so movable pages can be migrated there.  Right now
that means checking if there are enough free pages that the whole block
becomes MOVABLE or if the block is already being used for movable pages.

The block could become movable if the decision was made to kick out slab
pages that are located towards the end of the zone. If page tables
become movable, then they would need to be identified here but that is
not the case.

The pageblock_migratable() function is named so that this decision can
be easily revisited in one place.

> > +		/* Try isolate the page */
> > +		if (locked_isolate_lru_page(zone, page, migratelist) == 0)
> > +			isolated++;
> 
> Support for other ways of migrating a page?
> 

When other mechanisms exist, they would be added here. Right now,
isolate_lru_page() is the only one I am aware of.

> > +static int compact_zone(struct zone *zone, struct compact_control *cc)
> > +{
> > +	int ret = COMPACT_INCOMPLETE;
> > +
> > +	/* Setup to move all movable pages to the end of the zone */
> > +	cc->migrate_pfn = zone->zone_start_pfn;
> > +	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
> > +	cc->free_pfn &= ~(pageblock_nr_pages-1);
> > +
> > +	for (; ret == COMPACT_INCOMPLETE; ret = compact_finished(zone, cc)) {
> > +		isolate_migratepages(zone, cc);
> > +
> > +		if (!cc->nr_migratepages)
> > +			continue;
> > +
> > +		/* Isolate free pages if necessary */
> > +		if (cc->nr_freepages < cc->nr_migratepages)
> > +			isolate_freepages(zone, cc);
> > +
> > +		/* Stop compacting if we cannot get enough free pages */
> > +		if (cc->nr_freepages < cc->nr_migratepages)
> > +			break;
> > +
> > +		migrate_pages(&cc->migratepages, compaction_alloc,
> > +							(unsigned long)cc);
> 
> You do not need to check the result of migration? Page migration is a best 
> effort that may fail.
> 

You're right. I used to check it for debugging purposes to make sure migration
was actually occuring. It is not unusual still for a fair number of pages
to fail to migrate. migration already uses a retry logic and I shouldn't
be replicating it.

More importantly, by leaving the pages on the migratelist, I potentially
retry the same migrations over and over again wasting time and effort not
to mention that I keep pages isolated for much longer than necessary and
that could cause stalling problems. I should be calling putback_lru_pages()
when migrate_pages() tells me it failed to migrate pages.

I'll revisit this one. Thanks

> Looks good otherwise.
> 
> Acked-by: Christoph Lameter <clameter@sgi.com>

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/7] Introduce a means of compacting memory within a zone
  2007-06-19 12:54   ` Yasunori Goto
@ 2007-06-19 16:49     ` Mel Gorman
  0 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-19 16:49 UTC (permalink / raw)
  To: Yasunori Goto; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu, clameter

On (19/06/07 21:54), Yasunori Goto didst pronounce:
> Hi Mel-san.
> This is very interesting feature.
> 
> Now, I'm testing your patches.
> 
> > +static int isolate_migratepages(struct zone *zone,
> > +					struct compact_control *cc)
> > +{
> > +	unsigned long high_pfn, low_pfn, end_pfn, start_pfn;
> 
> (snip)
> 
> > +	/* Time to isolate some pages for migration */
> > +	spin_lock_irq(&zone->lru_lock);
> > +	for (; low_pfn < end_pfn; low_pfn++) {
> > +		if (!pfn_valid_within(low_pfn))
> > +			continue;
> > +
> > +		/* Get the page and skip if free */
> > +		page = pfn_to_page(low_pfn);
> 
> I met panic at here on my tiger4.
> 

How annoying.

> I compiled with CONFIG_SPARSEMEM. So, CONFIG_HOLES_IN_ZONE is not set.
> pfn_valid_within() returns 1 every time on this configuration.

As it should.

> (This config is for only virtual memmap)
> But, my tiger4 box has memory holes in normal zone.
> 
> When it is changed to normal pfn_valid(), no panic occurs.
> 

It's because I never check if the MAX_ORDER block is valid before
isolating. This needs to be implemented just like what
isolate_freepages() and isolate_freepages_block() does. Change it to
pfn_valid() for the moment and I'll have this one fixed up properly in
the next version.

> Hmmm.
> 
> Bye.

Thanks for testing.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails
  2007-06-18 17:22   ` Christoph Lameter
@ 2007-06-19 16:50     ` Mel Gorman
  0 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-19 16:50 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On (18/06/07 10:22), Christoph Lameter didst pronounce:
> You are amazing.
> 

Thanks! 

There are still knots that need ironing out but I believe the core idea
is solid and can be built into something useful.

Thanks for reviewing.

> Acked-by: Christoph Lameter <clameter@sgi.com>
> 

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Memory Compaction v2
  2007-06-18 17:24 ` [PATCH 0/7] Memory Compaction v2 Christoph Lameter
@ 2007-06-19 16:58   ` Mel Gorman
  2007-06-19 19:22     ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2007-06-19 16:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On (18/06/07 10:24), Christoph Lameter didst pronounce:
> On Mon, 18 Jun 2007, Mel Gorman wrote:
> 
> > The patchset implements memory compaction for the page allocator reducing
> > external fragmentation so that free memory exists as fewer, but larger
> > contiguous blocks. Instead of being a full defragmentation solution,
> > this focuses exclusively on pages that are movable via the page migration
> > mechanism.
> 
> We need an additional facility at some point that allows the moving of 
> pages that are not on the LRU. Such support seems to be possible
> for page table pages and slab pages.

Agreed. When I put this together first, I felt I would be able to isolate
pages of different types on migratelist but that is not the case as migration
would not be able to tell the difference between a LRU page and a pagetable
page. I'll rename cc->migratelist to cc->migratelist_lru with the view to
potentially adding cc->migratelist_pagetable or cc->migratelist_slab later.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/7] Introduce a means of compacting memory within a zone
  2007-06-19 16:36     ` Mel Gorman
@ 2007-06-19 19:20       ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2007-06-19 19:20 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On Tue, 19 Jun 2007, Mel Gorman wrote:

> When other mechanisms exist, they would be added here. Right now,
> isolate_lru_page() is the only one I am aware of.

Did you have a look at kmem_cache_vacate in the slab defrag patchset?

> > You do not need to check the result of migration? Page migration is a best 
> > effort that may fail.

> You're right. I used to check it for debugging purposes to make sure migration
> was actually occuring. It is not unusual still for a fair number of pages
> to fail to migrate. migration already uses a retry logic and I shouldn't
> be replicating it.
> 
> More importantly, by leaving the pages on the migratelist, I potentially
> retry the same migrations over and over again wasting time and effort not
> to mention that I keep pages isolated for much longer than necessary and
> that could cause stalling problems. I should be calling putback_lru_pages()
> when migrate_pages() tells me it failed to migrate pages.

No the putback_lru is done for you.
 
> I'll revisit this one. Thanks

You could simply ignore it if you do not care if its migrated or not.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Memory Compaction v2
  2007-06-19 16:58   ` Mel Gorman
@ 2007-06-19 19:22     ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2007-06-19 19:22 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu

On Tue, 19 Jun 2007, Mel Gorman wrote:

> Agreed. When I put this together first, I felt I would be able to isolate
> pages of different types on migratelist but that is not the case as migration
> would not be able to tell the difference between a LRU page and a pagetable
> page. I'll rename cc->migratelist to cc->migratelist_lru with the view to
> potentially adding cc->migratelist_pagetable or cc->migratelist_slab later.

Right. The particular issue with moving page table pages or slab pages is 
that you do not have a LRU. The page state needs to be established in a 
different way and there needs to be mechanism to ensure that the page is 
not currently being setup or torn down. For the slab pages I have relied 
on page->inuse > 0 to signify a page in use. I am not sure how one would 
realize that for page table pages.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails
  2007-06-18  9:30 ` [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails Mel Gorman
  2007-06-18 17:22   ` Christoph Lameter
@ 2007-06-21 12:28   ` Andrew Morton
  2007-06-21 13:26     ` Mel Gorman
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2007-06-21 12:28 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel, mel, kamezawa.hiroyu, clameter

> On Mon, 18 Jun 2007 10:30:42 +0100 (IST) Mel Gorman <mel@csn.ul.ie> wrote:
> +
> +			/*
> +			 * It's a race if compaction frees a suitable page but
> +			 * someone else allocates it
> +			 */
> +			count_vm_event(COMPACTRACE);
> +		}

Could perhaps cause arbitrarily long starvation.  A fix would be to free
the synchronously-compacted higher-order page into somewhere which is
private to this task (a new field in task_struct would be one such place).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails
  2007-06-21 12:28   ` Andrew Morton
@ 2007-06-21 13:26     ` Mel Gorman
  0 siblings, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2007-06-21 13:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, kamezawa.hiroyu, clameter

Andrew Morton wrote:
>> On Mon, 18 Jun 2007 10:30:42 +0100 (IST) Mel Gorman <mel@csn.ul.ie> wrote:
>> +
>> +			/*
>> +			 * It's a race if compaction frees a suitable page but
>> +			 * someone else allocates it
>> +			 */
>> +			count_vm_event(COMPACTRACE);
>> +		}
> 
> Could perhaps cause arbitrarily long starvation. 

More likely it will just fail allocations where it could have succeeded.
I knew the situation would occur so I thought I would count how often it
happens before doing.

> A fix would be to free
> the synchronously-compacted higher-order page into somewhere which is
> private to this task (a new field in task_struct would be one such place).

There used to be such fields and a process flag PF_FREE_PAGES for a
similar purpose. I'll look into reintroducing it. Thanks

-- 
Mel Gorman

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-06-21 13:26 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-18  9:28 [PATCH 0/7] Memory Compaction v2 Mel Gorman
2007-06-18  9:28 ` [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches Mel Gorman
2007-06-18 16:56   ` Christoph Lameter
2007-06-19 15:52     ` Mel Gorman
2007-06-18  9:29 ` [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA Mel Gorman
2007-06-18 17:04   ` Christoph Lameter
2007-06-19 15:59     ` Mel Gorman
2007-06-18  9:29 ` [PATCH 3/7] Introduce isolate_lru_page_nolock() as a lockless version of isolate_lru_page() Mel Gorman
2007-06-18 17:05   ` Christoph Lameter
2007-06-18  9:29 ` [PATCH 4/7] Provide metrics on the extent of fragmentation in zones Mel Gorman
2007-06-18 17:07   ` Christoph Lameter
2007-06-18  9:30 ` [PATCH 5/7] Introduce a means of compacting memory within a zone Mel Gorman
2007-06-18 17:18   ` Christoph Lameter
2007-06-19 16:36     ` Mel Gorman
2007-06-19 19:20       ` Christoph Lameter
2007-06-19 12:54   ` Yasunori Goto
2007-06-19 16:49     ` Mel Gorman
2007-06-18  9:30 ` [PATCH 6/7] Add /proc/sys/vm/compact_node for the explicit compaction of a node Mel Gorman
2007-06-18  9:30 ` [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails Mel Gorman
2007-06-18 17:22   ` Christoph Lameter
2007-06-19 16:50     ` Mel Gorman
2007-06-21 12:28   ` Andrew Morton
2007-06-21 13:26     ` Mel Gorman
2007-06-18 17:24 ` [PATCH 0/7] Memory Compaction v2 Christoph Lameter
2007-06-19 16:58   ` Mel Gorman
2007-06-19 19:22     ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).