linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
@ 2010-11-17 16:22 Mel Gorman
  2010-11-17 16:22 ` [PATCH 1/8] mm: compaction: Add trace events for memory compaction activity Mel Gorman
                   ` (8 more replies)
  0 siblings, 9 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

Huge page allocations are not expected to be cheap but lumpy reclaim
is still very disruptive. While it is far better than reclaiming random
order-0 pages, it ignores the reference bit of pages near the reference
page selected from the LRU. Memory compaction was merged in 2.6.35 to use
less lumpy reclaim by moving pages around instead of reclaiming when there
were enough pages free. It has been tested fairly heavily at this point.
This is a prototype series to use compaction more aggressively.

When CONFIG_COMPACTION is set, lumpy reclaim is no longer used. What it
does instead is reclaim a number of order-0 pages and then compact the zone
to try and satisfy the allocation. This keeps a larger number of active
pages in memory at the cost of increased use of migration and compaction
scanning. With the full series applied, latencies when allocating huge pages
are significantly reduced. By the end of the series, hints are taken from
the LRU on where the best place to start migrating from might be.

Six kernels are tested

lumpyreclaim-traceonly	This kernel is not using compaction but has the
			first patch related to tracepoints applied. It acts
			as a comparison point.

traceonly		This kernel is using compaction and has the
			tracepoints applied.

blindcompact		First three patches. A number of order-0 pages
			are applied and then the zone is compacted. This
			replaces lumpy reclaim but lumpy reclaim is still
			available if compaction is unset.

obeysync		First four patches. Migration will happen
			asynchronously if requested by the caller.
			This reduces the latency of compaction at a time
			when it is not willing to call wait_on_page_writeback

fastscan		First six patches applied. try_to_compact_pages()
			uses shortcuts in the faster compaction path to
			reduce latency.

compacthint		First seven patches applied. The migration scanner
			takes a hint from the LRU on where to start instead
			of always starting from the beginning of the zone.
			If the hint does not work, the full zone is still
			scanned.

The final patch is just a rename so it is not reported.  The target test was
a high-order allocation stress test. Testing was based on kernel 2.6.37-rc1
with commit d88c0922 applied which fixes an important bug related to page
reference counting. The test machine was x86-64 with 3G of RAM.

STRESS-HIGHALLOC
               lumpyreclaim
               traceonly-v2r21   traceonly	    blindcompact      obeysync          fastscan          compacthint
Pass 1          76.00 ( 0.00%)    91.00 (15.00%)    90.00 (14.00%)    86.00 (10.00%)    89.00 (13.00%)    88.00 (12.00%)
Pass 2          92.00 ( 0.00%)    92.00 ( 0.00%)    91.00 (-1.00%)    89.00 (-3.00%)    89.00 (-3.00%)    90.00 (-2.00%)
At Rest         95.00 ( 0.00%)    95.00 ( 0.00%)    96.00 ( 1.00%)    94.00 (-1.00%)    94.00 (-1.00%)    95.00 ( 0.00%)

As you'd expect, using compaction in any form improves the allocation
success rates. This is no surprise but I know that the results for ppc64
are a lot more dramatic. Otherwise, the series does not significantly
affect success rates - this is expected.

MMTests Statistics: duration
User/Sys Time Running Test (seconds)       3339.94   3356.03   3301.15   3297.02   3277.88   3278.23
Total Elapsed Time (seconds)               2226.20   1962.12   2066.27   1573.86   1416.15   1474.68

Using compaction completes the test faster - no surprise there. Otherwise,
the series reduces the total time it takes to complete the test. The savings
from the vanilla kernel using compaction to the full series is over 8 minutes
which is fairly significant. Typically I'd expect the duration of the test
to vary by up to 2 minutes so 8 minutes is well outside the noise.

FTrace Reclaim Statistics: vmscan
                                       lumpyreclaim
                                          traceonly traceonly blindcompact obeysync   fastscan compacthint
Direct reclaims                               1388        537        376        488        430        480 
Direct reclaim pages scanned                205098      74810     287899     364595     313537     419062 
Direct reclaim pages reclaimed              110395      47344     129716     153689     139506     164719 
Direct reclaim write file async I/O           5703       1463       3313       4425       5257       6658 
Direct reclaim write anon async I/O          42539       8631      17326      25676      12942      25786 
Direct reclaim write file sync I/O               0          0          0          0          0          0 
Direct reclaim write anon sync I/O             339         45          4          3          1          4 
Wake kswapd requests                           855        755        764        814        822        876 
Kswapd wakeups                                 523        573        381        308        328        280 
Kswapd pages scanned                       4231634    4268032    3804355    2907194    2593046    2430099 
Kswapd pages reclaimed                     2200266    2221518    2161870    1826345    1722521    1705105 
Kswapd reclaim write file async I/O          51070      52174      35718      32378      25862      25292 
Kswapd reclaim write anon async I/O         770924     667264     147534      73974      29785      25709 
Kswapd reclaim write file sync I/O               0          0          0          0          0          0 
Kswapd reclaim write anon sync I/O               0          0          0          0          0          0 
Time stalled direct reclaim (seconds)      1035.70     113.12     190.79     292.82     111.68     165.71 
Time kswapd awake (seconds)                 885.31     772.61     786.08     484.38     339.97     405.29 

Total pages scanned                        4436732   4342842   4092254   3271789   2906583   2849161
Total pages reclaimed                      2310661   2268862   2291586   1980034   1862027   1869824
%age total pages scanned/reclaimed          52.08%    52.24%    56.00%    60.52%    64.06%    65.63%
%age total pages scanned/written            19.62%    16.80%     4.98%     4.17%     2.54%     2.93%
%age  file pages scanned/written             1.28%     1.24%     0.95%     1.12%     1.07%     1.12%
Percentage Time Spent Direct Reclaim        23.67%     3.26%     5.46%     8.16%     3.29%     4.81%
Percentage Time kswapd Awake                39.77%    39.38%    38.04%    30.78%    24.01%    27.48%

These are the reclaim statistics. Compaction reduces the time spent in
direct reclaim and kswapd awake - no surprise there again. The time spent in
direct reclaim appears to increase once blindcompact and further patches
are applied. This is due to compaction now taking place within reclaim so
there is more going on.

The series overall though reduces the time kswapd spends awake and once
compaction is used within reclaim, the later patches in the series reduces
the time spent. Overall, the series significantly reduces the number of
pages scanned and reclaimed reducing the level of disruption to the system.

FTrace Reclaim Statistics: compaction
                                      lumpyreclaim
                                         traceonly  traceonly blindcompact obeysync   fastscan compacthint
Migrate Pages Scanned                            0   71353874  238633502  264640773  261021041  206180024 
Migrate Pages Isolated                           0     269123     573527     675472     728335    1070987 
Free    Pages Scanned                            0   28821923   86306036  100851634  104049634  148208575 
Free    Pages Isolated                           0     344335     693444     908822     942124    1299588 
Migrated Pages                                   0     265478     565774     652310     707870    1048643 
Migration Failures                               0       3645       7753      23162      20465      22344 

These are some statistics on compaction activity. Obviously with compaction
disabled, nothing happens. Using compaction from within reclaim drastically
increases the amount of compaction activity which is expected - it's offset
by the reduced amount of pages that get reclaimed but there is room for
improvement in how compaction is implemented. I guess the most interesting
part of this result is that "compacthint" initialising the compaction
migration scanner based on the LRU drastically reduces the number of pages
scanned for migration even though the impact on latencies is not obvious.

Judging from the raw figures here, it's tricky to tell if things are really
better or not as they are aggregate figures for the duration of the test. This
brings me to the average latencies.

X86-64
http://www.csn.ul.ie/~mel/postings/memorycompact-20101117/highalloc-interlatency-hydra-mean.ps
http://www.csn.ul.ie/~mel/postings/memorycompact-20101117/highalloc-interlatency-hydra-stddev.ps

The mean latencies are pushed *way* down implying that the amount of work
to allocate each huge page is drastically reduced. As one would expect,
lumpy reclaim has terrible latencies but using compaction pushes it
down. Always using compaction (blindcompact) pushes them further down and
"obeysync" drops them close to the absolute minium latency that can be
achieved. "fastscan" and "compacthint" slightly improve the allocation
success rates while reducing the amount of work performed by the kernel.

For completeness, here are the graphs for a similar test on PPC64. I won't
go into the raw figures because the conclusions are more or less the same.

PPC64
http://www.csn.ul.ie/~mel/postings/memorycompact-20101117/highalloc-interlatency-powyah-mean.ps
http://www.csn.ul.ie/~mel/postings/memorycompact-20101117/highalloc-interlatency-powyah-stddev.ps

PPC64 has to work a lot harder (16M huge pages instead of 2M) The
success rates without compaction are pretty dire due to the large delay
when using lumpy reclaim but with compaction the success rates are all
comparable. Similar to X86-64, the latencies are pushed way down. They are
above the ideal performance but are still drastically improved.

I haven't pushed hard on the concept of lumpy compaction yet and right
now I don't intend to during this cycle. The initial prototypes did not
behave as well as expected and this series improves the current situation
a lot without introducing new algorithms. Hence, I'd like this series to
be considered for merging. I'm hoping that this series also removes the
necessity for the "delete lumpy reclaim" patch from the THP tree.

 include/linux/compaction.h        |    9 ++-
 include/linux/kernel.h            |    7 ++
 include/linux/migrate.h           |   12 ++-
 include/linux/mmzone.h            |    2 +
 include/trace/events/compaction.h |   74 ++++++++++++++++
 include/trace/events/vmscan.h     |    6 +-
 mm/compaction.c                   |  171 ++++++++++++++++++++++++++++---------
 mm/memory-failure.c               |    3 +-
 mm/memory_hotplug.c               |    3 +-
 mm/mempolicy.c                    |    6 +-
 mm/migrate.c                      |   24 +++--
 mm/vmscan.c                       |   90 ++++++++++++-------
 12 files changed, 313 insertions(+), 94 deletions(-)
 create mode 100644 include/trace/events/compaction.h


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/8] mm: compaction: Add trace events for memory compaction activity
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
@ 2010-11-17 16:22 ` Mel Gorman
  2010-11-17 16:22 ` [PATCH 2/8] mm: vmscan: Convert lumpy_mode into a bitmask Mel Gorman
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

In preparation for a patches promoting the use of memory compaction over lumpy
reclaim, this patch adds trace points for memory compaction activity. Using
them, we can monitor the scanning activity of the migration and free page
scanners as well as the number and success rates of pages passed to page
migration.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/compaction.h |   74 +++++++++++++++++++++++++++++++++++++
 mm/compaction.c                   |   14 ++++++-
 2 files changed, 87 insertions(+), 1 deletions(-)
 create mode 100644 include/trace/events/compaction.h

diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
new file mode 100644
index 0000000..388bcdd
--- /dev/null
+++ b/include/trace/events/compaction.h
@@ -0,0 +1,74 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM compaction
+
+#if !defined(_TRACE_COMPACTION_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_COMPACTION_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+#include "gfpflags.h"
+
+DECLARE_EVENT_CLASS(mm_compaction_isolate_template,
+
+	TP_PROTO(unsigned long nr_scanned,
+		unsigned long nr_taken),
+
+	TP_ARGS(nr_scanned, nr_taken),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, nr_scanned)
+		__field(unsigned long, nr_taken)
+	),
+
+	TP_fast_assign(
+		__entry->nr_scanned = nr_scanned;
+		__entry->nr_taken = nr_taken;
+	),
+
+	TP_printk("nr_scanned=%lu nr_taken=%lu",
+		__entry->nr_scanned,
+		__entry->nr_taken)
+);
+
+DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_migratepages,
+
+	TP_PROTO(unsigned long nr_scanned,
+		unsigned long nr_taken),
+
+	TP_ARGS(nr_scanned, nr_taken)
+);
+
+DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_freepages,
+	TP_PROTO(unsigned long nr_scanned,
+		unsigned long nr_taken),
+
+	TP_ARGS(nr_scanned, nr_taken)
+);
+
+TRACE_EVENT(mm_compaction_migratepages,
+
+	TP_PROTO(unsigned long nr_migrated,
+		unsigned long nr_failed),
+
+	TP_ARGS(nr_migrated, nr_failed),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, nr_migrated)
+		__field(unsigned long, nr_failed)
+	),
+
+	TP_fast_assign(
+		__entry->nr_migrated = nr_migrated;
+		__entry->nr_failed = nr_failed;
+	),
+
+	TP_printk("nr_migrated=%lu nr_failed=%lu",
+		__entry->nr_migrated,
+		__entry->nr_failed)
+);
+
+
+#endif /* _TRACE_COMPACTION_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d709ee..bc8eb8a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -16,6 +16,9 @@
 #include <linux/sysfs.h>
 #include "internal.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/compaction.h>
+
 /*
  * compact_control is used to track pages being migrated and the free pages
  * they are being migrated to during memory compaction. The free_pfn starts
@@ -60,7 +63,7 @@ static unsigned long isolate_freepages_block(struct zone *zone,
 				struct list_head *freelist)
 {
 	unsigned long zone_end_pfn, end_pfn;
-	int total_isolated = 0;
+	int nr_scanned = 0, total_isolated = 0;
 	struct page *cursor;
 
 	/* Get the last PFN we should scan for free pages at */
@@ -81,6 +84,7 @@ static unsigned long isolate_freepages_block(struct zone *zone,
 
 		if (!pfn_valid_within(blockpfn))
 			continue;
+		nr_scanned++;
 
 		if (!PageBuddy(page))
 			continue;
@@ -100,6 +104,7 @@ static unsigned long isolate_freepages_block(struct zone *zone,
 		}
 	}
 
+	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
 	return total_isolated;
 }
 
@@ -234,6 +239,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 					struct compact_control *cc)
 {
 	unsigned long low_pfn, end_pfn;
+	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
 
 	/* Do not scan outside zone boundaries */
@@ -266,6 +272,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 		struct page *page;
 		if (!pfn_valid_within(low_pfn))
 			continue;
+		nr_scanned++;
 
 		/* Get the page and skip if free */
 		page = pfn_to_page(low_pfn);
@@ -281,6 +288,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 		list_add(&page->lru, migratelist);
 		mem_cgroup_del_lru(page);
 		cc->nr_migratepages++;
+		nr_isolated++;
 
 		/* Avoid isolating too much */
 		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
@@ -292,6 +300,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
 	spin_unlock_irq(&zone->lru_lock);
 	cc->migrate_pfn = low_pfn;
 
+	trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
+
 	return cc->nr_migratepages;
 }
 
@@ -402,6 +412,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		count_vm_events(COMPACTPAGES, nr_migrate - nr_remaining);
 		if (nr_remaining)
 			count_vm_events(COMPACTPAGEFAILED, nr_remaining);
+		trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
+						nr_remaining);
 
 		/* Release LRU pages not migrated */
 		if (!list_empty(&cc->migratepages)) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/8] mm: vmscan: Convert lumpy_mode into a bitmask
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
  2010-11-17 16:22 ` [PATCH 1/8] mm: compaction: Add trace events for memory compaction activity Mel Gorman
@ 2010-11-17 16:22 ` Mel Gorman
  2010-11-17 16:22 ` [PATCH 3/8] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim Mel Gorman
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
syncronous or asyncronous. In preparation for using compaction instead of
lumpy reclaim, this patch converts the flags into a bitmap.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/vmscan.h |    6 ++--
 mm/vmscan.c                   |   46 +++++++++++++++++++++++++----------------
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index c255fcc..be76429 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,13 +25,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
 			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b8a6fdc..37d4f0e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -51,11 +51,20 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
-enum lumpy_mode {
-	LUMPY_MODE_NONE,
-	LUMPY_MODE_ASYNC,
-	LUMPY_MODE_SYNC,
-};
+/*
+ * lumpy_mode determines how the inactive list is shrunk
+ * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
+ * LUMPY_MODE_ASYNC:  Do not block
+ * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
+ * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
+ *			page from the LRU and reclaim all pages within a
+ *			naturally aligned range
+ */
+typedef unsigned __bitwise__ lumpy_mode;
+#define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
+#define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
+#define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
+#define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -88,7 +97,7 @@ struct scan_control {
 	 * Intend to reclaim enough continuous memory rather than reclaim
 	 * enough amount of memory. i.e, mode for high order allocation.
 	 */
-	enum lumpy_mode lumpy_reclaim_mode;
+	lumpy_mode lumpy_reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -274,13 +283,13 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+	lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
 
 	/*
 	 * Some reclaim have alredy been failed. No worth to try synchronous
 	 * lumpy reclaim.
 	 */
-	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
+	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
 		return;
 
 	/*
@@ -288,17 +297,18 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 	 * trouble getting a small set of contiguous pages, we
 	 * will reclaim both active and inactive pages.
 	 */
+	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode = mode;
+		sc->lumpy_reclaim_mode |= mode;
 	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode = mode;
+		sc->lumpy_reclaim_mode |= mode;
 	else
-		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
 }
 
 static void disable_lumpy_reclaim_mode(struct scan_control *sc)
 {
-	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+	sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
 }
 
 static inline int is_page_cache_freeable(struct page *page)
@@ -429,7 +439,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 		 * first attempt to free a range of pages fails.
 		 */
 		if (PageWriteback(page) &&
-		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
+		    (sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC))
 			wait_on_page_writeback(page);
 
 		if (!PageWriteback(page)) {
@@ -615,7 +625,7 @@ static enum page_references page_check_references(struct page *page,
 	referenced_page = TestClearPageReferenced(page);
 
 	/* Lumpy reclaim - ignore references */
-	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM)
 		return PAGEREF_RECLAIM;
 
 	/*
@@ -732,7 +742,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 * for any page for which writeback has already
 			 * started.
 			 */
-			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
+			if ((sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC) &&
 			    may_enter_fs)
 				wait_on_page_writeback(page);
 			else {
@@ -1317,7 +1327,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 		return false;
 
 	/* Only stall on lumpy reclaim */
-	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
 		return false;
 
 	/* If we have relaimed everything on the isolated list, no stall */
@@ -1368,7 +1378,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
 					ISOLATE_INACTIVE : ISOLATE_BOTH,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
@@ -1381,7 +1391,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
 					ISOLATE_INACTIVE : ISOLATE_BOTH,
 			zone, sc->mem_cgroup,
 			0, file);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/8] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
  2010-11-17 16:22 ` [PATCH 1/8] mm: compaction: Add trace events for memory compaction activity Mel Gorman
  2010-11-17 16:22 ` [PATCH 2/8] mm: vmscan: Convert lumpy_mode into a bitmask Mel Gorman
@ 2010-11-17 16:22 ` Mel Gorman
  2010-11-18 18:09   ` Andrea Arcangeli
  2010-11-17 16:22 ` [PATCH 4/8] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path Mel Gorman
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores
the age of the pages it reclaims. This can incur significant stalls and
potentially increase the number of major faults.

Compaction has reached the point where it is considered reasonably stable
(meaning it has passed a lot of testing) and is a potential candidate for
displacing lumpy reclaim. This patch uses memory compaction where available
and lumpy reclaim otherwise. The basic operation is very simple - instead
of selecting a contiguous range of pages to reclaim, a number of order-0
pages are reclaimed and then compaction is called for the zone.  If the
watermarks are not met, another reclaim+compaction cycle occurs.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/compaction.h |    9 ++++-
 include/linux/kernel.h     |    7 +++
 mm/compaction.c            |   96 +++++++++++++++++++++++++++++---------------
 mm/vmscan.c                |   40 +++++++++++++-----
 4 files changed, 106 insertions(+), 46 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 5ac5155..9ebbc12 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -22,7 +22,8 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask);
-
+extern unsigned long reclaimcompact_zone_order(struct zone *zone,
+			int order, gfp_t gfp_mask);
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
 
@@ -59,6 +60,12 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	return COMPACT_CONTINUE;
 }
 
+static inline unsigned long reclaimcompact_zone_order(struct zone *zone,
+			int order, gfp_t gfp_mask)
+{
+	return 0;
+}
+
 static inline void defer_compaction(struct zone *zone)
 {
 }
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 450092c..c00c5d1 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -826,6 +826,13 @@ struct sysinfo {
 #define NUMA_BUILD 0
 #endif
 
+/* This helps us avoid #ifdef CONFIG_COMPACTION */
+#ifdef CONFIG_COMPACTION
+#define COMPACTION_BUILD 1
+#else
+#define COMPACTION_BUILD 0
+#endif
+
 /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
 # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD
diff --git a/mm/compaction.c b/mm/compaction.c
index bc8eb8a..3c37c52 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -385,10 +385,55 @@ static int compact_finished(struct zone *zone,
 	return COMPACT_CONTINUE;
 }
 
+static unsigned long compaction_suitable(struct zone *zone, int order)
+{
+	int fragindex;
+	unsigned long watermark;
+
+	/*
+	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
+	 * This is because during migration, copies of pages need to be
+	 * allocated and for a short time, the footprint is higher
+	 */
+	watermark = low_wmark_pages(zone) + (2UL << order);
+	if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+		return COMPACT_SKIPPED;
+
+	/*
+	 * fragmentation index determines if allocation failures are due to
+	 * low memory or external fragmentation
+	 *
+	 * index of -1 implies allocations might succeed dependingon watermarks
+	 * index towards 0 implies failure is due to lack of memory
+	 * index towards 1000 implies failure is due to fragmentation
+	 *
+	 * Only compact if a failure would be due to fragmentation.
+	 */
+	fragindex = fragmentation_index(zone, order);
+	if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)
+		return COMPACT_SKIPPED;
+
+	if (fragindex == -1 && zone_watermark_ok(zone, order, watermark, 0, 0))
+		return COMPACT_PARTIAL;
+
+	return COMPACT_CONTINUE;
+}
+
 static int compact_zone(struct zone *zone, struct compact_control *cc)
 {
 	int ret;
 
+	ret = compaction_suitable(zone, cc->order);
+	switch (ret) {
+	case COMPACT_PARTIAL:
+	case COMPACT_SKIPPED:
+		/* Compaction is likely to fail */
+		return ret;
+	case COMPACT_CONTINUE:
+		/* Fall through to compaction */
+		;
+	}
+
 	/* Setup to move all movable pages to the end of the zone */
 	cc->migrate_pfn = zone->zone_start_pfn;
 	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
@@ -446,6 +491,22 @@ static unsigned long compact_zone_order(struct zone *zone,
 	return compact_zone(zone, &cc);
 }
 
+unsigned long reclaimcompact_zone_order(struct zone *zone,
+						int order, gfp_t gfp_mask)
+{
+	struct compact_control cc = {
+		.nr_freepages = 0,
+		.nr_migratepages = 0,
+		.order = order,
+		.migratetype = allocflags_to_migratetype(gfp_mask),
+		.zone = zone,
+	};
+	INIT_LIST_HEAD(&cc.freepages);
+	INIT_LIST_HEAD(&cc.migratepages);
+
+	return compact_zone(zone, &cc);
+}
+
 int sysctl_extfrag_threshold = 500;
 
 /**
@@ -463,7 +524,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	int may_enter_fs = gfp_mask & __GFP_FS;
 	int may_perform_io = gfp_mask & __GFP_IO;
-	unsigned long watermark;
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
@@ -481,43 +541,13 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
 								nodemask) {
-		int fragindex;
 		int status;
 
-		/*
-		 * Watermarks for order-0 must be met for compaction. Note
-		 * the 2UL. This is because during migration, copies of
-		 * pages need to be allocated and for a short time, the
-		 * footprint is higher
-		 */
-		watermark = low_wmark_pages(zone) + (2UL << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
-			continue;
-
-		/*
-		 * fragmentation index determines if allocation failures are
-		 * due to low memory or external fragmentation
-		 *
-		 * index of -1 implies allocations might succeed depending
-		 * 	on watermarks
-		 * index towards 0 implies failure is due to lack of memory
-		 * index towards 1000 implies failure is due to fragmentation
-		 *
-		 * Only compact if a failure would be due to fragmentation.
-		 */
-		fragindex = fragmentation_index(zone, order);
-		if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)
-			continue;
-
-		if (fragindex == -1 && zone_watermark_ok(zone, order, watermark, 0, 0)) {
-			rc = COMPACT_PARTIAL;
-			break;
-		}
-
 		status = compact_zone_order(zone, order, gfp_mask);
 		rc = max(status, rc);
 
-		if (zone_watermark_ok(zone, order, watermark, 0, 0))
+		/* If a normal allocation would succeed, stop compacting */
+		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
 	}
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 37d4f0e..ca108ce 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -32,6 +32,7 @@
 #include <linux/topology.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
+#include <linux/compaction.h>
 #include <linux/notifier.h>
 #include <linux/rwsem.h>
 #include <linux/delay.h>
@@ -59,12 +60,15 @@
  * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
  *			page from the LRU and reclaim all pages within a
  *			naturally aligned range
+ * LUMPY_MODE_COMPACTION: For high-order allocations, reclaim a number of
+ *			order-0 pages and then compact the zone
  */
 typedef unsigned __bitwise__ lumpy_mode;
 #define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
 #define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
 #define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
 #define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
+#define LUMPY_MODE_COMPACTION		((__force lumpy_mode)0x10u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -283,25 +287,27 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
 
 	/*
-	 * Some reclaim have alredy been failed. No worth to try synchronous
-	 * lumpy reclaim.
+	 * Initially assume we are entering either lumpy reclaim or lumpy
+	 * compaction. Depending on the order, we will either set the sync
+	 * mode or just reclaim order-0 pages later.
 	 */
-	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
-		return;
+	if (COMPACTION_BUILD)
+		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
+	else
+		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 
 	/*
 	 * If we need a large contiguous chunk of memory, or have
 	 * trouble getting a small set of contiguous pages, we
 	 * will reclaim both active and inactive pages.
 	 */
-	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode |= mode;
+		sc->lumpy_reclaim_mode |= syncmode;
 	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode |= mode;
+		sc->lumpy_reclaim_mode |= syncmode;
 	else
 		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
 }
@@ -1375,11 +1381,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 
+	/*
+	 * If we are lumpy compacting, we bump nr_to_scan to at least
+	 * the size of the page we are trying to allocate
+	 */
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
+		nr_to_scan = max(nr_to_scan, (1UL << sc->order));
+
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
-					ISOLATE_INACTIVE : ISOLATE_BOTH,
+			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
 		if (current_is_kswapd())
@@ -1391,8 +1404,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
-					ISOLATE_INACTIVE : ISOLATE_BOTH,
+			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, sc->mem_cgroup,
 			0, file);
 		/*
@@ -1425,6 +1438,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
+		reclaimcompact_zone_order(zone, sc->order, sc->gfp_mask);
+
 	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/8] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
                   ` (2 preceding siblings ...)
  2010-11-17 16:22 ` [PATCH 3/8] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim Mel Gorman
@ 2010-11-17 16:22 ` Mel Gorman
  2010-11-18 18:21   ` Andrea Arcangeli
  2010-11-17 16:22 ` [PATCH 5/8] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync Mel Gorman
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

Migration synchronously waits for writeback if the initial passes fails.
try_to_compact_pages() does not want this behaviour. It's in a faster
allocation path where no pages have been freed yet. If compaction does not
succeed quickly, synchronous migration is not going to help and unnecessarily
delays a process.

This patch adds a sync parameter to migrate_pages() allowing the caller
to indicate if wait_on_page_writeback() is allowed within migration or
not. Only try_to_compact_pages() uses asynchronous migration with direct
compaction using the synchronous version within the direct reclaim path.
All other callers use synchronous migration to preserve existing
behaviour.

In tests, this reduces latency when allocating huge pages as the faster
path is avoiding stalls and postponing synchronous migration until pages
had been freed.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/migrate.h |   12 ++++++++----
 mm/compaction.c         |    6 +++++-
 mm/memory-failure.c     |    3 ++-
 mm/memory_hotplug.c     |    3 ++-
 mm/mempolicy.c          |    4 ++--
 mm/migrate.c            |   24 ++++++++++++++----------
 6 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 085527f..fa31902 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -13,9 +13,11 @@ extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
 			struct page *, struct page *);
 extern int migrate_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining);
+			unsigned long private, int offlining,
+			bool sync);
 extern int migrate_huge_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining);
+			unsigned long private, int offlining,
+			bool sync);
 
 extern int fail_migrate_page(struct address_space *,
 			struct page *, struct page *);
@@ -33,9 +35,11 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining) { return -ENOSYS; }
+		unsigned long private, int offlining,
+		bool sync) { return -ENOSYS; }
 static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining) { return -ENOSYS; }
+		unsigned long private, int offlining,
+		bool sync) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 3c37c52..b8e27cc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -33,6 +33,7 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	bool sync;			/* Synchronous migration */
 
 	/* Account for isolated anon and file pages */
 	unsigned long nr_anon;
@@ -449,7 +450,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = cc->nr_migratepages;
 		migrate_pages(&cc->migratepages, compaction_alloc,
-						(unsigned long)cc, 0);
+				(unsigned long)cc, 0,
+				cc->sync);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -484,6 +486,7 @@ static unsigned long compact_zone_order(struct zone *zone,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
+		.sync = false,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
@@ -500,6 +503,7 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
+		.sync = true,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1243241..ebc2a1b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1413,7 +1413,8 @@ int soft_offline_page(struct page *page, int flags)
 		LIST_HEAD(pagelist);
 
 		list_add(&page->lru, &pagelist);
-		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0);
+		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
+									0, true);
 		if (ret) {
 			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 				pfn, ret, page->flags);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9260314..221178b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -716,7 +716,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			goto out;
 		}
 		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								1, true);
 		if (ret)
 			putback_lru_pages(&source);
 	}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4a57f13..8b1a490 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -935,7 +935,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 		return PTR_ERR(vma);
 
 	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_node_page, dest, 0);
+		err = migrate_pages(&pagelist, new_node_page, dest, 0, true);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
@@ -1155,7 +1155,7 @@ static long do_mbind(unsigned long start, unsigned long len,
 
 		if (!list_empty(&pagelist)) {
 			nr_failed = migrate_pages(&pagelist, new_vma_page,
-						(unsigned long)vma, 0);
+						(unsigned long)vma, 0, true);
 			if (nr_failed)
 				putback_lru_pages(&pagelist);
 		}
diff --git a/mm/migrate.c b/mm/migrate.c
index fe5a3c6..ea684ab 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -612,7 +612,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
  * to the newly allocated page in newpage.
  */
 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force, int offlining)
+			struct page *page, int force, int offlining, bool sync)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -635,7 +635,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 	rc = -EAGAIN;
 
 	if (!trylock_page(page)) {
-		if (!force)
+		if (!force || !sync)
 			goto move_newpage;
 		lock_page(page);
 	}
@@ -663,7 +663,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 	BUG_ON(charge);
 
 	if (PageWriteback(page)) {
-		if (!force)
+		if (!force || !sync)
 			goto uncharge;
 		wait_on_page_writeback(page);
 	}
@@ -808,7 +808,7 @@ move_newpage:
  */
 static int unmap_and_move_huge_page(new_page_t get_new_page,
 				unsigned long private, struct page *hpage,
-				int force, int offlining)
+				int force, int offlining, bool sync)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -822,7 +822,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	rc = -EAGAIN;
 
 	if (!trylock_page(hpage)) {
-		if (!force)
+		if (!force || !sync)
 			goto out;
 		lock_page(hpage);
 	}
@@ -890,7 +890,8 @@ out:
  * Return: Number of pages not migrated or error code.
  */
 int migrate_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, int offlining)
+		new_page_t get_new_page, unsigned long private, int offlining,
+		bool sync)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -910,7 +911,8 @@ int migrate_pages(struct list_head *from,
 			cond_resched();
 
 			rc = unmap_and_move(get_new_page, private,
-						page, pass > 2, offlining);
+						page, pass > 2, offlining,
+						sync);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -939,7 +941,8 @@ out:
 }
 
 int migrate_huge_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, int offlining)
+		new_page_t get_new_page, unsigned long private, int offlining,
+		bool sync)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -955,7 +958,8 @@ int migrate_huge_pages(struct list_head *from,
 			cond_resched();
 
 			rc = unmap_and_move_huge_page(get_new_page,
-					private, page, pass > 2, offlining);
+					private, page, pass > 2, offlining,
+					sync);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -1088,7 +1092,7 @@ set_status:
 	err = 0;
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_page_node,
-				(unsigned long)pm, 0);
+				(unsigned long)pm, 0, true);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/8] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
                   ` (3 preceding siblings ...)
  2010-11-17 16:22 ` [PATCH 4/8] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path Mel Gorman
@ 2010-11-17 16:22 ` Mel Gorman
  2010-11-17 16:22 ` [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages() Mel Gorman
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

With the introduction of the boolean sync parameter, the API looks a
little inconsistent as offlining is still an int. Convert offlining to a
bool for the sake of being tidy.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/migrate.h |    8 ++++----
 mm/compaction.c         |    2 +-
 mm/mempolicy.c          |    6 ++++--
 mm/migrate.c            |    8 ++++----
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index fa31902..e39aeec 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -13,10 +13,10 @@ extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
 			struct page *, struct page *);
 extern int migrate_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining,
+			unsigned long private, bool offlining,
 			bool sync);
 extern int migrate_huge_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining,
+			unsigned long private, bool offlining,
 			bool sync);
 
 extern int fail_migrate_page(struct address_space *,
@@ -35,10 +35,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining,
+		unsigned long private, bool offlining,
 		bool sync) { return -ENOSYS; }
 static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining,
+		unsigned long private, bool offlining,
 		bool sync) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index b8e27cc..75d46d8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -450,7 +450,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = cc->nr_migratepages;
 		migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, 0,
+				(unsigned long)cc, false,
 				cc->sync);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8b1a490..9beb008 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -935,7 +935,8 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 		return PTR_ERR(vma);
 
 	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_node_page, dest, 0, true);
+		err = migrate_pages(&pagelist, new_node_page, dest,
+								false, true);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
@@ -1155,7 +1156,8 @@ static long do_mbind(unsigned long start, unsigned long len,
 
 		if (!list_empty(&pagelist)) {
 			nr_failed = migrate_pages(&pagelist, new_vma_page,
-						(unsigned long)vma, 0, true);
+						(unsigned long)vma,
+						false, true);
 			if (nr_failed)
 				putback_lru_pages(&pagelist);
 		}
diff --git a/mm/migrate.c b/mm/migrate.c
index ea684ab..c30c847 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -612,7 +612,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
  * to the newly allocated page in newpage.
  */
 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force, int offlining, bool sync)
+			struct page *page, int force, bool offlining, bool sync)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -808,7 +808,7 @@ move_newpage:
  */
 static int unmap_and_move_huge_page(new_page_t get_new_page,
 				unsigned long private, struct page *hpage,
-				int force, int offlining, bool sync)
+				int force, bool offlining, bool sync)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -890,7 +890,7 @@ out:
  * Return: Number of pages not migrated or error code.
  */
 int migrate_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, int offlining,
+		new_page_t get_new_page, unsigned long private, bool offlining,
 		bool sync)
 {
 	int retry = 1;
@@ -941,7 +941,7 @@ out:
 }
 
 int migrate_huge_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, int offlining,
+		new_page_t get_new_page, unsigned long private, bool offlining,
 		bool sync)
 {
 	int retry = 1;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages()
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
                   ` (4 preceding siblings ...)
  2010-11-17 16:22 ` [PATCH 5/8] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync Mel Gorman
@ 2010-11-17 16:22 ` Mel Gorman
  2010-11-18 18:34   ` Andrea Arcangeli
  2010-11-17 16:22 ` [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start Mel Gorman
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

try_to_compact_pages() is the faster compaction option available to the
allocator. It is optimistically called before direct reclaim is entered.
As there is a higher chance try_to_compact_pages() will fail than direct
reclaim, it's important to complete the work as quickly as possible to
minimise stalls.

This patch introduces a migrate_fast_scan to memory compaction. When set
by try_to_compact_pages(), only MIGRATE_MOVABLE pageblocks are considered
as migration candidates and migration is asynchronous. This reduces stalls
when allocating huge pages while not impairing allocation success rates as
the direct reclaim path will perform the full compaction.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/compaction.c |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 75d46d8..686db84 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -33,7 +33,10 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	bool migrate_fast_scan;		/* If true, only MIGRATE_MOVABLE blocks
+					 * are scanned for pages to migrate and
+					 * migration is asynchronous
+					 */
 
 	/* Account for isolated anon and file pages */
 	unsigned long nr_anon;
@@ -240,6 +243,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 					struct compact_control *cc)
 {
 	unsigned long low_pfn, end_pfn;
+	unsigned long last_pageblock_nr = 0, pageblock_nr;
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
 
@@ -280,6 +284,17 @@ static unsigned long isolate_migratepages(struct zone *zone,
 		if (PageBuddy(page))
 			continue;
 
+		/* When fast scanning, only scan in MOVABLE blocks */
+		pageblock_nr = low_pfn >> pageblock_order;
+		if (cc->migrate_fast_scan &&
+				last_pageblock_nr != pageblock_nr &&
+				get_pageblock_migratetype(page) != MIGRATE_MOVABLE) {
+			low_pfn += pageblock_nr_pages;
+			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
+			last_pageblock_nr = pageblock_nr;
+			continue;
+		}
+
 		/* Try isolate the page */
 		if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
 			continue;
@@ -451,7 +466,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		nr_migrate = cc->nr_migratepages;
 		migrate_pages(&cc->migratepages, compaction_alloc,
 				(unsigned long)cc, false,
-				cc->sync);
+				cc->migrate_fast_scan ? false : true);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -485,8 +500,8 @@ static unsigned long compact_zone_order(struct zone *zone,
 		.nr_migratepages = 0,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
+		.migrate_fast_scan = true,
 		.zone = zone,
-		.sync = false,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
@@ -502,8 +517,8 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
 		.nr_migratepages = 0,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
+		.migrate_fast_scan = false,
 		.zone = zone,
-		.sync = true,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
                   ` (5 preceding siblings ...)
  2010-11-17 16:22 ` [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages() Mel Gorman
@ 2010-11-17 16:22 ` Mel Gorman
  2010-11-18  9:10   ` KAMEZAWA Hiroyuki
  2010-11-18 18:46   ` Andrea Arcangeli
  2010-11-17 16:22 ` [PATCH 8/8] mm: vmscan: Rename lumpy_mode to reclaim_mode Mel Gorman
  2010-11-17 23:46 ` [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Andrew Morton
  8 siblings, 2 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

The end of the LRU stores the oldest known page. Compaction on the other
hand always starts scanning from the start of the zone. This patch uses
the LRU to hint to compaction where it should start scanning from. This
means that compaction will at least start with some old pages reducing
the impact on running processes and reducing the amount of scanning. The
check it makes is racy as the LRU lock is not taken but it should be
harmless as we are not manipulating the lists without the lock.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/mmzone.h |    2 ++
 mm/compaction.c        |   42 +++++++++++++++++++++++++++++++++++++-----
 mm/vmscan.c            |    2 --
 3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 39c24eb..2b7e237 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -142,6 +142,8 @@ enum lru_list {
 
 #define for_each_evictable_lru(l) for (l = 0; l <= LRU_ACTIVE_FILE; l++)
 
+#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
+
 static inline int is_file_lru(enum lru_list l)
 {
 	return (l == LRU_INACTIVE_FILE || l == LRU_ACTIVE_FILE);
diff --git a/mm/compaction.c b/mm/compaction.c
index 686db84..03bd878 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -37,6 +37,9 @@ struct compact_control {
 					 * are scanned for pages to migrate and
 					 * migration is asynchronous
 					 */
+	unsigned long abort_migrate_pfn;/* Finish compaction when the migration
+					 * scanner reaches this PFN
+					 */
 
 	/* Account for isolated anon and file pages */
 	unsigned long nr_anon;
@@ -380,6 +383,10 @@ static int compact_finished(struct zone *zone,
 	if (cc->free_pfn <= cc->migrate_pfn)
 		return COMPACT_COMPLETE;
 
+	/* Compaction run completes if migration reaches abort_migrate_pfn */
+	if (cc->abort_migrate_pfn && cc->migrate_pfn >= cc->abort_migrate_pfn)
+		return COMPACT_COMPLETE;
+
 	/* Compaction run is not finished if the watermark is not met */
 	if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0))
 		return COMPACT_CONTINUE;
@@ -450,10 +457,17 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		;
 	}
 
-	/* Setup to move all movable pages to the end of the zone */
-	cc->migrate_pfn = zone->zone_start_pfn;
-	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
-	cc->free_pfn &= ~(pageblock_nr_pages-1);
+	/*
+	 * Setup to move all movable pages to the end of the zone. If the
+	 * caller does not specify starting points for the scanners,
+	 * initialise them
+	 */
+	if (!cc->migrate_pfn)
+		cc->migrate_pfn = zone->zone_start_pfn;
+	if (!cc->free_pfn) {
+		cc->free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+		cc->free_pfn &= ~(pageblock_nr_pages-1);
+	}
 
 	migrate_prep_local();
 
@@ -512,6 +526,8 @@ static unsigned long compact_zone_order(struct zone *zone,
 unsigned long reclaimcompact_zone_order(struct zone *zone,
 						int order, gfp_t gfp_mask)
 {
+	unsigned long start_migrate_pfn, ret;
+	struct page *anon_page, *file_page;
 	struct compact_control cc = {
 		.nr_freepages = 0,
 		.nr_migratepages = 0,
@@ -523,7 +539,23 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	/* Get a hint on where to start compacting from the LRU */
+	anon_page = lru_to_page(&zone->lru[LRU_BASE + LRU_INACTIVE_ANON].list);
+	file_page = lru_to_page(&zone->lru[LRU_BASE + LRU_INACTIVE_FILE].list);
+	cc.migrate_pfn = min(page_to_pfn(anon_page), page_to_pfn(file_page));
+	cc.migrate_pfn = ALIGN(cc.migrate_pfn, pageblock_nr_pages);
+	start_migrate_pfn = cc.migrate_pfn;
+
+	ret = compact_zone(zone, &cc);
+
+	/* Restart migration from the start of zone if the hint did not work */
+	if (!zone_watermark_ok(zone, cc.order, low_wmark_pages(zone), 0, 0)) {
+		cc.migrate_pfn = 0;
+		cc.abort_migrate_pfn = start_migrate_pfn;
+		ret = compact_zone(zone, &cc);
+	}
+
+	return ret;
 }
 
 int sysctl_extfrag_threshold = 500;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ca108ce..9a0fa57 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -113,8 +113,6 @@ struct scan_control {
 	nodemask_t	*nodemask;
 };
 
-#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
-
 #ifdef ARCH_HAS_PREFETCH
 #define prefetch_prev_lru_page(_page, _base, _field)			\
 	do {								\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 8/8] mm: vmscan: Rename lumpy_mode to reclaim_mode
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
                   ` (6 preceding siblings ...)
  2010-11-17 16:22 ` [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start Mel Gorman
@ 2010-11-17 16:22 ` Mel Gorman
  2010-11-17 23:46 ` [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Andrew Morton
  8 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-17 16:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

With compaction being used instead of lumpy reclaim, the name lumpy_mode
and associated variables is a bit misleading. Rename lumpy_mode to
reclaim_mode which is a better fit. There is no functional change.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/vmscan.h |    6 ++--
 mm/vmscan.c                   |   72 ++++++++++++++++++++--------------------
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index be76429..ea422aa 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,13 +25,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \
 			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9a0fa57..52a0f0c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,22 +53,22 @@
 #include <trace/events/vmscan.h>
 
 /*
- * lumpy_mode determines how the inactive list is shrunk
- * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
- * LUMPY_MODE_ASYNC:  Do not block
- * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
- * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
+ * reclaim_mode determines how the inactive list is shrunk
+ * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
+ * RECLAIM_MODE_ASYNC:  Do not block
+ * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
+ * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, take a reference
  *			page from the LRU and reclaim all pages within a
  *			naturally aligned range
- * LUMPY_MODE_COMPACTION: For high-order allocations, reclaim a number of
+ * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
-typedef unsigned __bitwise__ lumpy_mode;
-#define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
-#define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
-#define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
-#define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
-#define LUMPY_MODE_COMPACTION		((__force lumpy_mode)0x10u)
+typedef unsigned __bitwise__ reclaim_mode;
+#define RECLAIM_MODE_SINGLE		((__force reclaim_mode)0x01u)
+#define RECLAIM_MODE_ASYNC		((__force reclaim_mode)0x02u)
+#define RECLAIM_MODE_SYNC		((__force reclaim_mode)0x04u)
+#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode)0x08u)
+#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode)0x10u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -101,7 +101,7 @@ struct scan_control {
 	 * Intend to reclaim enough continuous memory rather than reclaim
 	 * enough amount of memory. i.e, mode for high order allocation.
 	 */
-	lumpy_mode lumpy_reclaim_mode;
+	reclaim_mode reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -282,10 +282,10 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	return ret;
 }
 
-static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
+static void set_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+	reclaim_mode syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
 
 	/*
 	 * Initially assume we are entering either lumpy reclaim or lumpy
@@ -293,9 +293,9 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 	 * mode or just reclaim order-0 pages later.
 	 */
 	if (COMPACTION_BUILD)
-		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
+		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
 	else
-		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
+		sc->reclaim_mode = RECLAIM_MODE_LUMPYRECLAIM;
 
 	/*
 	 * If we need a large contiguous chunk of memory, or have
@@ -303,16 +303,16 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 	 * will reclaim both active and inactive pages.
 	 */
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode |= syncmode;
+		sc->reclaim_mode |= syncmode;
 	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode |= syncmode;
+		sc->reclaim_mode |= syncmode;
 	else
-		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
+		sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
 }
 
-static void disable_lumpy_reclaim_mode(struct scan_control *sc)
+static void reset_reclaim_mode(struct scan_control *sc)
 {
-	sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
+	sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
 }
 
 static inline int is_page_cache_freeable(struct page *page)
@@ -443,7 +443,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 		 * first attempt to free a range of pages fails.
 		 */
 		if (PageWriteback(page) &&
-		    (sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC))
+		    (sc->reclaim_mode & RECLAIM_MODE_SYNC))
 			wait_on_page_writeback(page);
 
 		if (!PageWriteback(page)) {
@@ -451,7 +451,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			ClearPageReclaim(page);
 		}
 		trace_mm_vmscan_writepage(page,
-			trace_reclaim_flags(page, sc->lumpy_reclaim_mode));
+			trace_reclaim_flags(page, sc->reclaim_mode));
 		inc_zone_page_state(page, NR_VMSCAN_WRITE);
 		return PAGE_SUCCESS;
 	}
@@ -629,7 +629,7 @@ static enum page_references page_check_references(struct page *page,
 	referenced_page = TestClearPageReferenced(page);
 
 	/* Lumpy reclaim - ignore references */
-	if (sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM)
+	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
 		return PAGEREF_RECLAIM;
 
 	/*
@@ -746,7 +746,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 * for any page for which writeback has already
 			 * started.
 			 */
-			if ((sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC) &&
+			if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
 			    may_enter_fs)
 				wait_on_page_writeback(page);
 			else {
@@ -902,7 +902,7 @@ cull_mlocked:
 			try_to_free_swap(page);
 		unlock_page(page);
 		putback_lru_page(page);
-		disable_lumpy_reclaim_mode(sc);
+		reset_reclaim_mode(sc);
 		continue;
 
 activate_locked:
@@ -915,7 +915,7 @@ activate_locked:
 keep_locked:
 		unlock_page(page);
 keep:
-		disable_lumpy_reclaim_mode(sc);
+		reset_reclaim_mode(sc);
 keep_lumpy:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
@@ -1331,7 +1331,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 		return false;
 
 	/* Only stall on lumpy reclaim */
-	if (sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
+	if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
 		return false;
 
 	/* If we have relaimed everything on the isolated list, no stall */
@@ -1375,7 +1375,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 			return SWAP_CLUSTER_MAX;
 	}
 
-	set_lumpy_reclaim_mode(priority, sc, false);
+	set_reclaim_mode(priority, sc, false);
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 
@@ -1383,13 +1383,13 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	 * If we are lumpy compacting, we bump nr_to_scan to at least
 	 * the size of the page we are trying to allocate
 	 */
-	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
+	if (sc->reclaim_mode & RECLAIM_MODE_COMPACTION)
 		nr_to_scan = max(nr_to_scan, (1UL << sc->order));
 
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+			sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
 					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
@@ -1402,7 +1402,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+			sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
 					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, sc->mem_cgroup,
 			0, file);
@@ -1425,7 +1425,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		set_lumpy_reclaim_mode(priority, sc, true);
+		set_reclaim_mode(priority, sc, true);
 		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
 	}
 
@@ -1436,14 +1436,14 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
-	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
+	if (sc->reclaim_mode & RECLAIM_MODE_COMPACTION)
 		reclaimcompact_zone_order(zone, sc->order, sc->gfp_mask);
 
 	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
 		priority,
-		trace_shrink_flags(file, sc->lumpy_reclaim_mode));
+		trace_shrink_flags(file, sc->reclaim_mode));
 	return nr_reclaimed;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
                   ` (7 preceding siblings ...)
  2010-11-17 16:22 ` [PATCH 8/8] mm: vmscan: Rename lumpy_mode to reclaim_mode Mel Gorman
@ 2010-11-17 23:46 ` Andrew Morton
  2010-11-18  2:03   ` Rik van Riel
  2010-11-18  8:12   ` Mel Gorman
  8 siblings, 2 replies; 34+ messages in thread
From: Andrew Morton @ 2010-11-17 23:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Wed, 17 Nov 2010 16:22:41 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> Huge page allocations are not expected to be cheap but lumpy reclaim
> is still very disruptive.

Huge pages are boring.  Can we expect any benefit for the
stupid-nic-driver-which-does-order-4-GFP_ATOMIC-allocations problem?

>
> ...
>
> I haven't pushed hard on the concept of lumpy compaction yet and right
> now I don't intend to during this cycle. The initial prototypes did not
> behave as well as expected and this series improves the current situation
> a lot without introducing new algorithms. Hence, I'd like this series to
> be considered for merging.

Translation: "Andrew, wait for the next version"? :)

> I'm hoping that this series also removes the
> necessity for the "delete lumpy reclaim" patch from the THP tree.

Now I'm sad.  I read all that and was thinking "oh goody, we get to
delete something for once".  But no :(

If you can get this stuff to work nicely, why can't we remove lumpy
reclaim?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-17 23:46 ` [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Andrew Morton
@ 2010-11-18  2:03   ` Rik van Riel
  2010-11-18  8:12   ` Mel Gorman
  1 sibling, 0 replies; 34+ messages in thread
From: Rik van Riel @ 2010-11-18  2:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Andrea Arcangeli, KOSAKI Motohiro, Johannes Weiner,
	linux-mm, linux-kernel

On 11/17/2010 06:46 PM, Andrew Morton wrote:
> On Wed, 17 Nov 2010 16:22:41 +0000
> Mel Gorman<mel@csn.ul.ie>  wrote:

>> I'm hoping that this series also removes the
>> necessity for the "delete lumpy reclaim" patch from the THP tree.
>
> Now I'm sad.  I read all that and was thinking "oh goody, we get to
> delete something for once".  But no :(
>
> If you can get this stuff to work nicely, why can't we remove lumpy
> reclaim?

I seem to remember there being some resistance against
removing lumpy reclaim, but I do not remember from
where or why.

IMHO some code deletion would be nice :)

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-17 23:46 ` [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Andrew Morton
  2010-11-18  2:03   ` Rik van Riel
@ 2010-11-18  8:12   ` Mel Gorman
  2010-11-18  8:26     ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-18  8:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Wed, Nov 17, 2010 at 03:46:41PM -0800, Andrew Morton wrote:
> On Wed, 17 Nov 2010 16:22:41 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > Huge page allocations are not expected to be cheap but lumpy reclaim
> > is still very disruptive.
> 
> Huge pages are boring.  Can we expect any benefit for the
> stupid-nic-driver-which-does-order-4-GFP_ATOMIC-allocations problem?
> 

Yes. Specifically, while GFP_ATOMIC allocations still cannot enter compaction
(although with asynchronous migration, it's closer), kswapd will react
faster. As a result, it should be harder to trigger allocation failures.

Huge pages are simply the worst case in terms of allocation latency which
is why I tend to focus testing on them. That, and I don't have a suitable
pair of machines with one of these order-4-atomic-stupid-nics.

> > I haven't pushed hard on the concept of lumpy compaction yet and right
> > now I don't intend to during this cycle. The initial prototypes did not
> > behave as well as expected and this series improves the current situation
> > a lot without introducing new algorithms. Hence, I'd like this series to
> > be considered for merging.
> 
> Translation: "Andrew, wait for the next version"? :)
> 

Preferably do not wait unless review reveals a major flaw. Lumpy compaction
in its initial prototype versions simply did not work out as a good policy
modification and requires much deeper thought. This series was effective
at getting latencies down to the level I expected lumpy compaction to.
If I do make lumpy compaction work properly, its effect will be to reduce
scanning rates but the latencies are likely to be similar.

> > I'm hoping that this series also removes the
> > necessity for the "delete lumpy reclaim" patch from the THP tree.
> 
> Now I'm sad.  I read all that and was thinking "oh goody, we get to
> delete something for once".  But no :(
> 
> If you can get this stuff to work nicely, why can't we remove lumpy
> reclaim?

Ultimately we should be able to. Lumpy reclaim is still there for the
!CONFIG_COMPACTION case and to have an option if we find that compaction
behaves badly for some reason.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-18  8:12   ` Mel Gorman
@ 2010-11-18  8:26     ` KAMEZAWA Hiroyuki
  2010-11-18  8:38       ` Johannes Weiner
  2010-11-18  8:44       ` Mel Gorman
  0 siblings, 2 replies; 34+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-18  8:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Andrea Arcangeli, KOSAKI Motohiro, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

On Thu, 18 Nov 2010 08:12:54 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> > > I'm hoping that this series also removes the
> > > necessity for the "delete lumpy reclaim" patch from the THP tree.
> > 
> > Now I'm sad.  I read all that and was thinking "oh goody, we get to
> > delete something for once".  But no :(
> > 
> > If you can get this stuff to work nicely, why can't we remove lumpy
> > reclaim?
> 
> Ultimately we should be able to. Lumpy reclaim is still there for the
> !CONFIG_COMPACTION case and to have an option if we find that compaction
> behaves badly for some reason.
> 

Hmm. CONFIG_COMPACTION depends on CONFIG_MMU. lumpy reclaim will be for NOMMU,
finally ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-18  8:26     ` KAMEZAWA Hiroyuki
@ 2010-11-18  8:38       ` Johannes Weiner
  2010-11-18  9:20         ` Mel Gorman
  2010-11-18  8:44       ` Mel Gorman
  1 sibling, 1 reply; 34+ messages in thread
From: Johannes Weiner @ 2010-11-18  8:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Mel Gorman, Andrew Morton, Andrea Arcangeli, KOSAKI Motohiro,
	Rik van Riel, linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 05:26:27PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 18 Nov 2010 08:12:54 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > > > I'm hoping that this series also removes the
> > > > necessity for the "delete lumpy reclaim" patch from the THP tree.
> > > 
> > > Now I'm sad.  I read all that and was thinking "oh goody, we get to
> > > delete something for once".  But no :(
> > > 
> > > If you can get this stuff to work nicely, why can't we remove lumpy
> > > reclaim?
> > 
> > Ultimately we should be able to. Lumpy reclaim is still there for the
> > !CONFIG_COMPACTION case and to have an option if we find that compaction
> > behaves badly for some reason.
> > 
> 
> Hmm. CONFIG_COMPACTION depends on CONFIG_MMU. lumpy reclaim will be for NOMMU,
> finally ?

It's because migration depends on MMU.  But we should be able to make
a NOMMU version of migration that just does page cache, which is all
that is reclaimable on NOMMU anyway.

At this point, the MMU dependency can go away, and so can lumpy
reclaim.

	Hannes

PS: I'm recovering from a cold, will catch up with the backlog later

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-18  8:26     ` KAMEZAWA Hiroyuki
  2010-11-18  8:38       ` Johannes Weiner
@ 2010-11-18  8:44       ` Mel Gorman
  1 sibling, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-18  8:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Andrea Arcangeli, KOSAKI Motohiro, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 05:26:27PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 18 Nov 2010 08:12:54 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > > > I'm hoping that this series also removes the
> > > > necessity for the "delete lumpy reclaim" patch from the THP tree.
> > > 
> > > Now I'm sad.  I read all that and was thinking "oh goody, we get to
> > > delete something for once".  But no :(
> > > 
> > > If you can get this stuff to work nicely, why can't we remove lumpy
> > > reclaim?
> > 
> > Ultimately we should be able to. Lumpy reclaim is still there for the
> > !CONFIG_COMPACTION case and to have an option if we find that compaction
> > behaves badly for some reason.
> > 
> 
> Hmm. CONFIG_COMPACTION depends on CONFIG_MMU. lumpy reclaim will be for NOMMU,
> finally ?
> 

Also true. As it is, lumpy reclaim is still there but it's never called
if CONFIG_COMPACTION is set so it's already side-lined.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start
  2010-11-17 16:22 ` [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start Mel Gorman
@ 2010-11-18  9:10   ` KAMEZAWA Hiroyuki
  2010-11-18  9:28     ` Mel Gorman
  2010-11-18 18:46   ` Andrea Arcangeli
  1 sibling, 1 reply; 34+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-11-18  9:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

On Wed, 17 Nov 2010 16:22:48 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> The end of the LRU stores the oldest known page. Compaction on the other
> hand always starts scanning from the start of the zone. This patch uses
> the LRU to hint to compaction where it should start scanning from. This
> means that compaction will at least start with some old pages reducing
> the impact on running processes and reducing the amount of scanning. The
> check it makes is racy as the LRU lock is not taken but it should be
> harmless as we are not manipulating the lists without the lock.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Hmm, does this patch make a noticable difference ?
Isn't it better to start scan from the biggest free chunk in a zone ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-18  8:38       ` Johannes Weiner
@ 2010-11-18  9:20         ` Mel Gorman
  2010-11-18 19:49           ` Andrew Morton
  0 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-18  9:20 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Andrea Arcangeli,
	KOSAKI Motohiro, Rik van Riel, linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 09:38:28AM +0100, Johannes Weiner wrote:
> On Thu, Nov 18, 2010 at 05:26:27PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 18 Nov 2010 08:12:54 +0000
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > > > I'm hoping that this series also removes the
> > > > > necessity for the "delete lumpy reclaim" patch from the THP tree.
> > > > 
> > > > Now I'm sad.  I read all that and was thinking "oh goody, we get to
> > > > delete something for once".  But no :(
> > > > 
> > > > If you can get this stuff to work nicely, why can't we remove lumpy
> > > > reclaim?
> > > 
> > > Ultimately we should be able to. Lumpy reclaim is still there for the
> > > !CONFIG_COMPACTION case and to have an option if we find that compaction
> > > behaves badly for some reason.
> > > 
> > 
> > Hmm. CONFIG_COMPACTION depends on CONFIG_MMU. lumpy reclaim will be for NOMMU,
> > finally ?
> 
> It's because migration depends on MMU.  But we should be able to make
> a NOMMU version of migration that just does page cache, which is all
> that is reclaimable on NOMMU anyway.
> 

Conceivably, but I see little problem leaving them with lumpy reclaim. As
page cache and anon pages are mixed together in MIGRATE_MOVABLE but only one
set of pages can be discarded, the success rates of either lumpy reclaim or
compaction is doubtful. It'd require a specific investigation.

> At this point, the MMU dependency can go away, and so can lumpy
> reclaim.
> 

The series never calls lumpy reclaim once CONFIG_COMPACTION is set. The code
could be shrunk with the below patch but the saving to vmlinux is minimal
(288 bytes for me on x86-64). My preference is still to have lumpy reclaim
available as a comparison point with compaction for a development cycle or two.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 52a0f0c..7488983 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1048,7 +1048,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			BUG();
 		}
 
-		if (!order)
+		if (!order || COMPACTION_BUILD)
 			continue;
 
 		/*

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start
  2010-11-18  9:10   ` KAMEZAWA Hiroyuki
@ 2010-11-18  9:28     ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-18  9:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 06:10:48PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 17 Nov 2010 16:22:48 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > The end of the LRU stores the oldest known page. Compaction on the other
> > hand always starts scanning from the start of the zone. This patch uses
> > the LRU to hint to compaction where it should start scanning from. This
> > means that compaction will at least start with some old pages reducing
> > the impact on running processes and reducing the amount of scanning. The
> > check it makes is racy as the LRU lock is not taken but it should be
> > harmless as we are not manipulating the lists without the lock.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> 
> Hmm, does this patch make a noticable difference ?

To scanning rates - yes.

> Isn't it better to start scan from the biggest free chunk in a zone ?
> 

Not necessarily. The biggest free chunk does not necessarily contain old
pages so one could stall a process by migrating a very active page. The same
applies for selecting the pageblock with the oldest LRU page of course but
it is less likely.

I prototyped a a patch that constantly used the buddy lists to select the
next pageblock to migrate from. The problem was that it was possible for it
to infinite loop because it could migrate from the same block more than once
in a migration cycle. To resolve that, I'd have to keep track of visited
pageblocks but I didn't want to require additional memory unless it was
absolutly necessary. I think the concept can be perfected and its impact
would be a reduction of scanning rates but it's not something that is
anywhere near merging yet.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/8] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim
  2010-11-17 16:22 ` [PATCH 3/8] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim Mel Gorman
@ 2010-11-18 18:09   ` Andrea Arcangeli
  2010-11-18 18:30     ` Mel Gorman
  0 siblings, 1 reply; 34+ messages in thread
From: Andrea Arcangeli @ 2010-11-18 18:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Wed, Nov 17, 2010 at 04:22:44PM +0000, Mel Gorman wrote:
> +	 */
> +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
> +		nr_to_scan = max(nr_to_scan, (1UL << sc->order));

Just one nitpick: I'm not sure this is a good idea. We can scan quite
some pages and we may do nothing on them. First I guess for symmetry
this should be 2UL << sc->oder to match the 2UL << order in the
watermark checks in compaction.c (maybe it should be 3UL if something
considering the probability at least one page is mapped and won't be
freed is quite high). But SWAP_CLUSTER_MAX is only 32 pages.. not even
close to 1UL << 9 (hugepage order 9). So I think this can safely be
removed... it only makes a difference for the stack with order 2. And
for order 2 when we take the spinlocks we can take all 32 pages
without screwing the "free" levels in any significant way, considering
maybe only 4 pages are really freed in the end, and if all 32 pages
are really freed (i.e. all plain clean cache), all that matters to
avoid freeing more cache is to stick to compaction next time around
(i.e. at the next allocation). And if compaction fails again next time
around, then it's all right to shrink 32 more pages even for order
2...

In short I'd delete the above chunk and to run the shrinker unmodified
as this is a too lowlevel idea, and the only real life effect is to
decrease VM scalability for kernel stack allocation a tiny bit, with
no benefit whatsoever.

It's subtle because the difference it'd makes it so infinitesimal and
I can only imagine it's a worsening overall difference.
> @@ -1425,6 +1438,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  
>  	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
>  
> +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
> +		reclaimcompact_zone_order(zone, sc->order, sc->gfp_mask);
> +
>  	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
>  		zone_idx(zone),
>  		nr_scanned, nr_reclaimed,

I'm worried about this one as the objective here is to increase the
amount of free pages, and the loop won't stop until we reach
nr_reclaimed >= nr_to_reclaim. I'm afraid it'd lead sometime to be
doing an overwork of compaction here for no good. In short there is no
feedback check into the loop to verify if this newly introduced
compaction work in the shrinker lead us to get the hugepage and stop
the loop. It sounds some pretty random compaction invocation here just
to run it more frequently.

nr_to_reclaim is only 32 anyway. So my suggestion is to remove it and
let the shrinker do its thing without interleaving compaction inside
the shrinker, without feedback check if the compaction actually
succeeded (maybe 100% of free ram is contiguous already), and then try
compaction again outside of the shrinker interleaving it with the
shrinker as usual if the watermarks aren't satisfied yet after
shrinker freed nr_to_reclaim pages.

I prefer we keep separated the job of freeing more pages from the job
of compacting the single free pages into higher order pages. It's only
32 pages being freed we're talking about here so no need to calling
compaction more frequently (if something we should increase
nr_to_reclaim to 512 and to call compaction less frequently). If the
interleaving of the caller isn't ok then fix it in the caller and also
update the nr_to_reclaim, but I think keeping those separated is way
cleaner and the mixture is unnecessary.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/8] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path
  2010-11-17 16:22 ` [PATCH 4/8] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path Mel Gorman
@ 2010-11-18 18:21   ` Andrea Arcangeli
  2010-11-18 18:34     ` Mel Gorman
  0 siblings, 1 reply; 34+ messages in thread
From: Andrea Arcangeli @ 2010-11-18 18:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Wed, Nov 17, 2010 at 04:22:45PM +0000, Mel Gorman wrote:
> @@ -484,6 +486,7 @@ static unsigned long compact_zone_order(struct zone *zone,
>  		.order = order,
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
>  		.zone = zone,
> +		.sync = false,
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);

I like this because I'm very afraid to avoid wait-I/O latencies
introduced into hugepage allocations that I prefer to fail quickly and
be handled later by khugepaged ;).

But I could have khugepaged call this with sync=true... so I'd need a
__GFP_ flag that only khugepaged would use to notify compaction should
be synchronous for khugepaged (not for the regular allocations in page
faults). Can we do this through gfp_mask only?

> @@ -500,6 +503,7 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
>  		.order = order,
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
>  		.zone = zone,
> +		.sync = true,
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);

Is this intentional? That inner compaction invocation is
equivalent to the one one interleaved with the shrinker tried before
invoking the shrinker. So I don't see why they should differ (one sync
and one async).

Anyway I'd prefer the inner invocation to be removed as a whole and to
keep only going with the interleaving and to keep the two jobs of
compaction and shrinking memory fully separated and to stick to the
interleaving. If this reclaimcompact_zone_order helps maybe it means
compact_zone_order isn't doing the right thing and we're hiding it by
randomly calling it more frequently...

I can see a point however in doing:

compaction async
shrink (may wait) (scan 500 pages, freed 32 pages)
compaction sync (may wait)

to:

compaction async
shrink (scan 32 pages, freed 0 pages)
compaction sync (hugepage generated nobody noticed)
shrink (scan 32 pages, freed 0 pages)
compaction sync
shrink (scan 32 pages, freed 0 pages)
[..]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/8] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim
  2010-11-18 18:09   ` Andrea Arcangeli
@ 2010-11-18 18:30     ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-18 18:30 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 07:09:56PM +0100, Andrea Arcangeli wrote:
> On Wed, Nov 17, 2010 at 04:22:44PM +0000, Mel Gorman wrote:
> > +	 */
> > +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
> > +		nr_to_scan = max(nr_to_scan, (1UL << sc->order));
> 
> Just one nitpick: I'm not sure this is a good idea. We can scan quite
> some pages and we may do nothing on them.

True, I could loop based on nr_taken taking care to not infinite loop in
there.

> First I guess for symmetry
> this should be 2UL << sc->oder to match the 2UL << order in the
> watermark checks in compaction.c (maybe it should be 3UL if something
> considering the probability at least one page is mapped and won't be
> freed is quite high). But SWAP_CLUSTER_MAX is only 32 pages.. not even
> close to 1UL << 9 (hugepage order 9).

True again, the scan rate gets bumped up for compaction recognising that
more pages are required.

> So I think this can safely be
> removed... it only makes a difference for the stack with order 2. And
> for order 2 when we take the spinlocks we can take all 32 pages
> without screwing the "free" levels in any significant way, considering
> maybe only 4 pages are really freed in the end, and if all 32 pages
> are really freed (i.e. all plain clean cache), all that matters to
> avoid freeing more cache is to stick to compaction next time around
> (i.e. at the next allocation). And if compaction fails again next time
> around, then it's all right to shrink 32 more pages even for order
> 2...
> 

Well, I'm expecting the exit of direct reclaim and another full
allocation loop so this is taken into account.

> In short I'd delete the above chunk and to run the shrinker unmodified
> as this is a too lowlevel idea, and the only real life effect is to
> decrease VM scalability for kernel stack allocation a tiny bit, with
> no benefit whatsoever.
> 

I'm not sure I get this. If it reclaims too few pages then compaction
will just fail the next time so we'll take the larger loop more
frequently. This in itself is not too bad although it interferes with
the patches later in the series that has try_to_compact_pages() do a
faster scan than this inner compaction loop.

> It's subtle because the difference it'd makes it so infinitesimal and
> I can only imagine it's a worsening overall difference.

I can try it out to be sure but right now I'm not convinced. Then again,
I'm burned out after reviewing THP so I'm not at my best either :)

> > @@ -1425,6 +1438,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> >  
> >  	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
> >  
> > +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
> > +		reclaimcompact_zone_order(zone, sc->order, sc->gfp_mask);
> > +
> >  	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
> >  		zone_idx(zone),
> >  		nr_scanned, nr_reclaimed,
> 
> I'm worried about this one as the objective here is to increase the
> amount of free pages, and the loop won't stop until we reach
> nr_reclaimed >= nr_to_reclaim.

Which remains at SWAP_CLUSTER_MAX. shrink_inactive_list is doing more
work than requested to satisfy the compaction requirements. It could be
fed directly into nr_to_reclaim though if necessary.

> I'm afraid it'd lead sometime to be
> doing an overwork of compaction here for no good. In short there is no
> feedback check into the loop to verify if this newly introduced
> compaction work in the shrinker lead us to get the hugepage and stop
> the loop. It sounds some pretty random compaction invocation here just
> to run it more frequently.
> 

While we enter compaction, we also use compaction_suitable() to predict
if compaction would be a waste of time. If it would be, we don't compact
and instead go all the way out to the allocator again. From this
perspective, it makes more sense to have altered nr_to_reclaim than
nr_to_scan.

> nr_to_reclaim is only 32 anyway. So my suggestion is to remove it and
> let the shrinker do its thing without interleaving compaction inside
> the shrinker, without feedback check if the compaction actually
> succeeded (maybe 100% of free ram is contiguous already), and then try
> compaction again outside of the shrinker interleaving it with the
> shrinker as usual if the watermarks aren't satisfied yet after
> shrinker freed nr_to_reclaim pages.
> 

I'll try it but again it busts the idea of try_to_compact_page() doing an
optimistic compaction of a subset of pages. It also puts a bit of a hole in
the idea of developing lumpy compaction later because the outer allocation
loop is unsuitable and I'm not keen on the idea of putting reclaim logic
in mm/compaction.c

> I prefer we keep separated the job of freeing more pages from the job
> of compacting the single free pages into higher order pages. It's only
> 32 pages being freed we're talking about here so no need to calling
> compaction more frequently

Compaction doesn't happen if enough pages are not free. Yes, I call into
compaction but it shouldn't do any heavy work. It made more sense to do
it that way than embed more compaction awareness into vmscan.c

> (if something we should increase
> nr_to_reclaim to 512 and to call compaction less frequently). If the
> interleaving of the caller isn't ok then fix it in the caller and also
> update the nr_to_reclaim, but I think keeping those separated is way
> cleaner and the mixture is unnecessary.
> 

I'll look closer at altering nr_to_reclaim instead of nr_to_scan. Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/8] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path
  2010-11-18 18:21   ` Andrea Arcangeli
@ 2010-11-18 18:34     ` Mel Gorman
  2010-11-18 19:00       ` Andrea Arcangeli
  0 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-18 18:34 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 07:21:06PM +0100, Andrea Arcangeli wrote:
> On Wed, Nov 17, 2010 at 04:22:45PM +0000, Mel Gorman wrote:
> > @@ -484,6 +486,7 @@ static unsigned long compact_zone_order(struct zone *zone,
> >  		.order = order,
> >  		.migratetype = allocflags_to_migratetype(gfp_mask),
> >  		.zone = zone,
> > +		.sync = false,
> >  	};
> >  	INIT_LIST_HEAD(&cc.freepages);
> >  	INIT_LIST_HEAD(&cc.migratepages);
> 
> I like this because I'm very afraid to avoid wait-I/O latencies
> introduced into hugepage allocations that I prefer to fail quickly and
> be handled later by khugepaged ;).
> 

As you can see from the graphs in the leader, it makes a big difference to
latency as well to avoid sync migration where possible.

> But I could have khugepaged call this with sync=true... so I'd need a
> __GFP_ flag that only khugepaged would use to notify compaction should
> be synchronous for khugepaged (not for the regular allocations in page
> faults). Can we do this through gfp_mask only?
> 

We could pass gfp flags in I guess and abuse __GFP_NO_KSWAPD (from the THP
series obviously)?

> > @@ -500,6 +503,7 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
> >  		.order = order,
> >  		.migratetype = allocflags_to_migratetype(gfp_mask),
> >  		.zone = zone,
> > +		.sync = true,
> >  	};
> >  	INIT_LIST_HEAD(&cc.freepages);
> >  	INIT_LIST_HEAD(&cc.migratepages);
> 
> Is this intentional?

Yes, it's the "slower" path where we've already reclaim pages and more
willing to wait for the compaction to occur as the alternative is failing
the allocation.

> That inner compaction invocation is
> equivalent to the one one interleaved with the shrinker tried before
> invoking the shrinker. So I don't see why they should differ (one sync
> and one async).
> 

The async one later in the series becomes very light with the heavier
work being done within reclaim if necessary.

> Anyway I'd prefer the inner invocation to be removed as a whole and to
> keep only going with the interleaving and to keep the two jobs of
> compaction and shrinking memory fully separated and to stick to the
> interleaving. If this reclaimcompact_zone_order helps maybe it means
> compact_zone_order isn't doing the right thing and we're hiding it by
> randomly calling it more frequently...
> 

I'll think about it more. I could just leave it at try_to_compact_pages
doing the zonelist scan although it's not immediately occuring to me how I
should decide between sync and async other than "async the first time and
sync after that". The allocator does not have the same "reclaim priority"
awareness that reclaim does.

> I can see a point however in doing:
> 
> compaction async
> shrink (may wait) (scan 500 pages, freed 32 pages)
> compaction sync (may wait)
> 
> to:
> 
> compaction async
> shrink (scan 32 pages, freed 0 pages)
> compaction sync (hugepage generated nobody noticed)
> shrink (scan 32 pages, freed 0 pages)
> compaction sync
> shrink (scan 32 pages, freed 0 pages)
> [..]
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages()
  2010-11-17 16:22 ` [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages() Mel Gorman
@ 2010-11-18 18:34   ` Andrea Arcangeli
  2010-11-18 18:50     ` Mel Gorman
  0 siblings, 1 reply; 34+ messages in thread
From: Andrea Arcangeli @ 2010-11-18 18:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Wed, Nov 17, 2010 at 04:22:47PM +0000, Mel Gorman wrote:
> @@ -485,8 +500,8 @@ static unsigned long compact_zone_order(struct zone *zone,
>  		.nr_migratepages = 0,
>  		.order = order,
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
> +		.migrate_fast_scan = true,
>  		.zone = zone,
> -		.sync = false,
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
> @@ -502,8 +517,8 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
>  		.nr_migratepages = 0,
>  		.order = order,
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
> +		.migrate_fast_scan = false,
>  		.zone = zone,
> -		.sync = true,
>  	};

Same as for the previous feature (sync/async migrate) I'd prefer if
this was a __GFP_ flag (khugepaged will do the no-fast-scan version,
page fault will only run compaction in fast scan mode) and if we
removed the reclaimcompact_zone_order and we stick with the
interleaving of shrinker and try_to_compact_pages from the alloc_pages
caller, with no nesting of compaction inside the shrinker.

Another possibility would be to not have those as __GFP flags, and to
always do the first try_to_compact_pages with async+fast_scan, then
call the shrinker and then all next try_to_compact_pages would be
called with sync+no_fast_scan mode.

But I love if we can further decrease the risk of too long page
hugepage page fault before the normal page fallback, and to have a
__GFP_ flag for these two. Even the same __GFP flag could work for
both...

So my preference would be to nuke reclaimcompact_zone_order, only
stick to compact_zone_order and the current interleaving, and add a
__GFP_COMPACT_FAST used by the hugepmd page fault (that will enable
both async migrate and fast-scan). khugepaged and hugetlbfs won't use
__GFP_COMPACT_FAST.

I'm undecided if a __GFP_ flag is needed to differentiate the callers,
or if we should just run the first try_to_compact_pages in
"optimistic" mode by default.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start
  2010-11-17 16:22 ` [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start Mel Gorman
  2010-11-18  9:10   ` KAMEZAWA Hiroyuki
@ 2010-11-18 18:46   ` Andrea Arcangeli
  2010-11-19 11:08     ` Mel Gorman
  1 sibling, 1 reply; 34+ messages in thread
From: Andrea Arcangeli @ 2010-11-18 18:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Wed, Nov 17, 2010 at 04:22:48PM +0000, Mel Gorman wrote:
> +	if (!cc->migrate_pfn)
> +		cc->migrate_pfn = zone->zone_start_pfn;

wouldn't it remove a branch if the caller always set migrate_pfn?

> +	if (!cc->free_pfn) {
> +		cc->free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +		cc->free_pfn &= ~(pageblock_nr_pages-1);
> +	}

Who sets free_pfn to zero? Previously this was always initialized.

> @@ -523,7 +539,23 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
> -	return compact_zone(zone, &cc);
> +	/* Get a hint on where to start compacting from the LRU */
> +	anon_page = lru_to_page(&zone->lru[LRU_BASE + LRU_INACTIVE_ANON].list);
> +	file_page = lru_to_page(&zone->lru[LRU_BASE + LRU_INACTIVE_FILE].list);
> +	cc.migrate_pfn = min(page_to_pfn(anon_page), page_to_pfn(file_page));
> +	cc.migrate_pfn = ALIGN(cc.migrate_pfn, pageblock_nr_pages);
> +	start_migrate_pfn = cc.migrate_pfn;
> +
> +	ret = compact_zone(zone, &cc);
> +
> +	/* Restart migration from the start of zone if the hint did not work */
> +	if (!zone_watermark_ok(zone, cc.order, low_wmark_pages(zone), 0, 0)) {
> +		cc.migrate_pfn = 0;
> +		cc.abort_migrate_pfn = start_migrate_pfn;
> +		ret = compact_zone(zone, &cc);
> +	}
> +

I doubt it works ok if the list is empty... Maybe it's safer to
validate the migrate_pfn against the zone pfn start/end before
setting it in the migrate_pfn.

Interesting this heuristic slowed down the benchmark, it should lead
to the exact opposite thanks to saving some cpu. So I guess maybe it's
not worth it. I see it increases the ratio of compaction of a tiny
bit, but if a tiny bit of better compaction comes at the expenses of
an increased runtime I don't like it and I'd drop it... It's not
making enough difference, further we could extend it to check the
"second" page in the list and so on... so we can just go blind. All it
matters is that we use a clock algorithm and I guess this screwes it
and this is why it leads to increased time.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages()
  2010-11-18 18:34   ` Andrea Arcangeli
@ 2010-11-18 18:50     ` Mel Gorman
  2010-11-18 19:08       ` Andrea Arcangeli
  0 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-18 18:50 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 07:34:48PM +0100, Andrea Arcangeli wrote:
> On Wed, Nov 17, 2010 at 04:22:47PM +0000, Mel Gorman wrote:
> > @@ -485,8 +500,8 @@ static unsigned long compact_zone_order(struct zone *zone,
> >  		.nr_migratepages = 0,
> >  		.order = order,
> >  		.migratetype = allocflags_to_migratetype(gfp_mask),
> > +		.migrate_fast_scan = true,
> >  		.zone = zone,
> > -		.sync = false,
> >  	};
> >  	INIT_LIST_HEAD(&cc.freepages);
> >  	INIT_LIST_HEAD(&cc.migratepages);
> > @@ -502,8 +517,8 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
> >  		.nr_migratepages = 0,
> >  		.order = order,
> >  		.migratetype = allocflags_to_migratetype(gfp_mask),
> > +		.migrate_fast_scan = false,
> >  		.zone = zone,
> > -		.sync = true,
> >  	};
> 
> Same as for the previous feature (sync/async migrate) I'd prefer if
> this was a __GFP_ flag (khugepaged will do the no-fast-scan version,
> page fault will only run compaction in fast scan mode) and if we

For THP in general, I think we can abuse __GFP_NO_KSWAPD. For other callers,
I'm not sure it's fair to push the responsibility of async/sync to them. We
don't do it for reclaim for example and I'd worry the wrong decisions would
be made or that they'd always select async for "performance" and then bitch
about an allocation failure.

> removed the reclaimcompact_zone_order and we stick with the
> interleaving of shrinker and try_to_compact_pages from the alloc_pages
> caller, with no nesting of compaction inside the shrinker.
> 
> Another possibility would be to not have those as __GFP flags, and to
> always do the first try_to_compact_pages with async+fast_scan, then
> call the shrinker and then all next try_to_compact_pages would be
> called with sync+no_fast_scan mode.
> 

I'll investigate this.

> But I love if we can further decrease the risk of too long page
> hugepage page fault before the normal page fallback, and to have a
> __GFP_ flag for these two. Even the same __GFP flag could work for
> both...
> 
> So my preference would be to nuke reclaimcompact_zone_order, only
> stick to compact_zone_order and the current interleaving, and add a
> __GFP_COMPACT_FAST used by the hugepmd page fault (that will enable
> both async migrate and fast-scan). khugepaged and hugetlbfs won't use
> __GFP_COMPACT_FAST.

My only whinge about the lack of reclaimcompact_zone_order is that it
makes it harder to even contemplate lumpy compaction in the future but
it could always be reintroduced if absolutely necessary.

> 
> I'm undecided if a __GFP_ flag is needed to differentiate the callers,
> or if we should just run the first try_to_compact_pages in
> "optimistic" mode by default.
> 

GFP flags would be my last preference. 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/8] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path
  2010-11-18 18:34     ` Mel Gorman
@ 2010-11-18 19:00       ` Andrea Arcangeli
  0 siblings, 0 replies; 34+ messages in thread
From: Andrea Arcangeli @ 2010-11-18 19:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 06:34:38PM +0000, Mel Gorman wrote:
> On Thu, Nov 18, 2010 at 07:21:06PM +0100, Andrea Arcangeli wrote:
> > On Wed, Nov 17, 2010 at 04:22:45PM +0000, Mel Gorman wrote:
> > > @@ -484,6 +486,7 @@ static unsigned long compact_zone_order(struct zone *zone,
> > >  		.order = order,
> > >  		.migratetype = allocflags_to_migratetype(gfp_mask),
> > >  		.zone = zone,
> > > +		.sync = false,
> > >  	};
> > >  	INIT_LIST_HEAD(&cc.freepages);
> > >  	INIT_LIST_HEAD(&cc.migratepages);
> > 
> > I like this because I'm very afraid to avoid wait-I/O latencies
> > introduced into hugepage allocations that I prefer to fail quickly and
> > be handled later by khugepaged ;).
> > 
> 
> As you can see from the graphs in the leader, it makes a big difference to
> latency as well to avoid sync migration where possible.

Yep, amazing benchmarking work you did! Great job indeed.

I thought of this sync wait in migrate myself as being troublesome a
few days ago as I was reviewing the btrfs migration bug that I helped
track down this week (triggering only with THP because it exercises
compaction and in turn migration more often than upstream, it's rare
to get any order > 4 allocation with upstream that would exercise
compaction and trip on the btrfs fs corruption, it really had nothing
to do with THP as I expected).

> We could pass gfp flags in I guess and abuse __GFP_NO_KSWAPD (from the THP
> series obviously)?

That would work for me... :)

> Yes, it's the "slower" path where we've already reclaim pages and more
> willing to wait for the compaction to occur as the alternative is failing
> the allocation.

I've noticed, which is why I think it's equivalent to invoking the
second try_to_compact_pages with (fast_scan=false, sync=true) (and the
first of course with (fast_scan=true, sync=false)).

> I'll think about it more. I could just leave it at try_to_compact_pages
> doing the zonelist scan although it's not immediately occuring to me how I
> should decide between sync and async other than "async the first time and
> sync after that". The allocator does not have the same "reclaim priority"
> awareness that reclaim does.

I think the "migrate async & fast scan first, migrate sync and full
scan later" is a simpler heuristic we can do and I expect it to work
fine and equivalent (if not better).

I'm undecided if it worth to run the hugepage page fault with "async &
fast scan always" by abusing __GFP_NO_KSWAPD or by adding a
__GFP_COMPACT_FAST. Of course it would only make a difference mostly
if the hugepage allocation has to fail often (like 95% of ram in
hugepages with slab spread over 10% of ram) so that is a corner case
that not everybody experiences... Probably not worth it.

Increasing nr_to_reclaim to 1<<order only when the compaction_suitable
checks are not satisfied and compaction becomes a noop, may also be
worth investigating (as long as there are enough cond_resched() inside
those loops ;). But hey I'm not sure if it's really needed...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages()
  2010-11-18 18:50     ` Mel Gorman
@ 2010-11-18 19:08       ` Andrea Arcangeli
  2010-11-19 11:16         ` Mel Gorman
  0 siblings, 1 reply; 34+ messages in thread
From: Andrea Arcangeli @ 2010-11-18 19:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 06:50:46PM +0000, Mel Gorman wrote:
> For THP in general, I think we can abuse __GFP_NO_KSWAPD. For other callers,
> I'm not sure it's fair to push the responsibility of async/sync to them. We
> don't do it for reclaim for example and I'd worry the wrong decisions would
> be made or that they'd always select async for "performance" and then bitch
> about an allocation failure.

Ok, let's leave the __GFP and let's stick to the simplest for now
without alloc_pages caller knowledge.

> My only whinge about the lack of reclaimcompact_zone_order is that it
> makes it harder to even contemplate lumpy compaction in the future but
> it could always be reintroduced if absolutely necessary.

Ok. I don't know the plan of lumpy compaction and that's probably why
I didn't appreciate it...

So my preference as usual would be to remove lumpy. BTW, everything up
to patch 3 included should work fine with THP and solve my problem
with lumpy, thanks!

> GFP flags would be my last preference. 

yep. I'm just probably too paranoid at being lowlatency in the
hugepage allocation because I know it's the only spot where THP may
actually introduce a regression for short lived tasks if we do too
much work to create the hugepage. OTOH even for short lived allocation
on my westmire a bzero(1g) runs 250% (not 50% faster like in the older
hardware I was using) faster just thanks to the page being huge and
I'm talking about super short lived allocation here (the troublesome
one if we spend too much time in compaction and reclaim before
failing). Plus it only makes a difference when hugepages are so spread
across the whole system and it's still doing purely short lived
allocations. So again let's worry about the GFP flag later if
something... this is already an huge latency improvement (very
appreciated) compared to current upstream even without GPF flag ;)
like your .ps files show clearly.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-18  9:20         ` Mel Gorman
@ 2010-11-18 19:49           ` Andrew Morton
  2010-11-19 10:48             ` Mel Gorman
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Morton @ 2010-11-18 19:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Johannes Weiner, KAMEZAWA Hiroyuki, Andrea Arcangeli,
	KOSAKI Motohiro, Rik van Riel, linux-mm, linux-kernel

On Thu, 18 Nov 2010 09:20:44 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> > It's because migration depends on MMU.  But we should be able to make
> > a NOMMU version of migration that just does page cache, which is all
> > that is reclaimable on NOMMU anyway.
> > 
> 
> Conceivably, but I see little problem leaving them with lumpy reclaim.

I see a really big problem: we'll need to maintain lumpy reclaim for
ever!

We keep on piling in more and more stuff, we're getting less sure that
the old stuff is still effective.  It's becoming more and more
important to move some of our attention over to simplification, and
to rejustification of earlier decisions.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-18 19:49           ` Andrew Morton
@ 2010-11-19 10:48             ` Mel Gorman
  2010-11-19 12:43               ` Theodore Tso
  0 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-19 10:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, KAMEZAWA Hiroyuki, Andrea Arcangeli,
	KOSAKI Motohiro, Rik van Riel, linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 11:49:28AM -0800, Andrew Morton wrote:
> On Thu, 18 Nov 2010 09:20:44 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > > It's because migration depends on MMU.  But we should be able to make
> > > a NOMMU version of migration that just does page cache, which is all
> > > that is reclaimable on NOMMU anyway.
> > > 
> > 
> > Conceivably, but I see little problem leaving them with lumpy reclaim.
> 
> I see a really big problem: we'll need to maintain lumpy reclaim for
> ever!
> 

At least as long as !CONFIG_COMPACTION exists. That will be a while because
bear in mind CONFIG_COMPACTION is disabled by default (although I believe
some distros are enabling it at least). Maybe we should choose to deprecate
it in 2.6.40 and delete it at the infamous time of 2.6.42? That would give
ample time to iron out any issues that crop up with reclaim/compaction
(what this series has turned into).

Bear in mind that lumpy reclaim is heavily isolated these days. The logic
is almost entirely contained in isolate_lru_pages() in the block starting
with the comment "Attempt to take all pages in the order aligned region
surrounding the tag page". As disruptive as lumpy reclaim is, it's basically
just a linear scanner at the end of the day and there are a few examples of
that in the kernel. If we break it, it'll be obvious.

> We keep on piling in more and more stuff, we're getting less sure that
> the old stuff is still effective. It's becoming more and more
> important to move some of our attention over to simplification, and
> to rejustification of earlier decisions.
> 

I'm open to its ultimate deletion but think it's rash to do on day 1 of
reclaim/compaction. I do recognise that I might be entirely on my own with
this opinion though :)

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start
  2010-11-18 18:46   ` Andrea Arcangeli
@ 2010-11-19 11:08     ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-19 11:08 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 07:46:59PM +0100, Andrea Arcangeli wrote:
> On Wed, Nov 17, 2010 at 04:22:48PM +0000, Mel Gorman wrote:
> > +	if (!cc->migrate_pfn)
> > +		cc->migrate_pfn = zone->zone_start_pfn;
> 
> wouldn't it remove a branch if the caller always set migrate_pfn?
> 

If try_to_compact_pages() used it, it would migrate old pages and then
later enter direct reclaim where it reclaimed the pages it just
migrated. It was double work. That said, I neglected to check if
migration affects the age of pages. Offhand, I think it does so the
problem wouldn't apply so it was flawed reasoning.

> > +	if (!cc->free_pfn) {
> > +		cc->free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> > +		cc->free_pfn &= ~(pageblock_nr_pages-1);
> > +	}
> 
> Who sets free_pfn to zero? Previously this was always initialized.
> 

Initialised on stack.

> > @@ -523,7 +539,23 @@ unsigned long reclaimcompact_zone_order(struct zone *zone,
> >  	INIT_LIST_HEAD(&cc.freepages);
> >  	INIT_LIST_HEAD(&cc.migratepages);
> >  
> > -	return compact_zone(zone, &cc);
> > +	/* Get a hint on where to start compacting from the LRU */
> > +	anon_page = lru_to_page(&zone->lru[LRU_BASE + LRU_INACTIVE_ANON].list);
> > +	file_page = lru_to_page(&zone->lru[LRU_BASE + LRU_INACTIVE_FILE].list);
> > +	cc.migrate_pfn = min(page_to_pfn(anon_page), page_to_pfn(file_page));
> > +	cc.migrate_pfn = ALIGN(cc.migrate_pfn, pageblock_nr_pages);
> > +	start_migrate_pfn = cc.migrate_pfn;
> > +
> > +	ret = compact_zone(zone, &cc);
> > +
> > +	/* Restart migration from the start of zone if the hint did not work */
> > +	if (!zone_watermark_ok(zone, cc.order, low_wmark_pages(zone), 0, 0)) {
> > +		cc.migrate_pfn = 0;
> > +		cc.abort_migrate_pfn = start_migrate_pfn;
> > +		ret = compact_zone(zone, &cc);
> > +	}
> > +
> 
> I doubt it works ok if the list is empty... Maybe it's safer to
> validate the migrate_pfn against the zone pfn start/end before
> setting it in the migrate_pfn.
> 

You're right, this could be unsafe.

> Interesting this heuristic slowed down the benchmark, it should lead
> to the exact opposite thanks to saving some cpu. So I guess maybe it's
> not worth it. I see it increases the ratio of compaction of a tiny
> bit, but if a tiny bit of better compaction comes at the expenses of
> an increased runtime I don't like it and I'd drop it... It's not
> making enough difference, further we could extend it to check the
> "second" page in the list and so on... so we can just go blind. All it
> matters is that we use a clock algorithm and I guess this screwes it
> and this is why it leads to increased time.
> 

The variation was within the noise but yes, maybe this is not such a great
idea and the figures are not very compelling. I'm going to drop it from the
series.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages()
  2010-11-18 19:08       ` Andrea Arcangeli
@ 2010-11-19 11:16         ` Mel Gorman
  0 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2010-11-19 11:16 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Thu, Nov 18, 2010 at 08:08:39PM +0100, Andrea Arcangeli wrote:
> On Thu, Nov 18, 2010 at 06:50:46PM +0000, Mel Gorman wrote:
> > For THP in general, I think we can abuse __GFP_NO_KSWAPD. For other callers,
> > I'm not sure it's fair to push the responsibility of async/sync to them. We
> > don't do it for reclaim for example and I'd worry the wrong decisions would
> > be made or that they'd always select async for "performance" and then bitch
> > about an allocation failure.
> 
> Ok, let's leave the __GFP and let's stick to the simplest for now
> without alloc_pages caller knowledge.
> 

Ok.

> > My only whinge about the lack of reclaimcompact_zone_order is that it
> > makes it harder to even contemplate lumpy compaction in the future but
> > it could always be reintroduced if absolutely necessary.
> 
> Ok. I don't know the plan of lumpy compaction and that's probably why
> I didn't appreciate it...
> 

You're not a mind-reader :) . What it'd get should be a reduction in
scanning rates but there are other means that should be considered too.

> So my preference as usual would be to remove lumpy. BTW, everything up
> to patch 3 included should work fine with THP and solve my problem
> with lumpy, thanks!
> 

Great. I'd still like to push the rest of the series if it can be shown the
latencies decrease each time. It'll reduce the motivation for introducing
GFP flags to avoid compaction overhead.

> > GFP flags would be my last preference. 
> 
> yep. I'm just probably too paranoid at being lowlatency in the
> hugepage allocation because I know it's the only spot where THP may
> actually introduce a regression for short lived tasks if we do too
> much work to create the hugepage. OTOH even for short lived allocation
> on my westmire a bzero(1g) runs 250% (not 50% faster like in the older
> hardware I was using) faster just thanks to the page being huge and
> I'm talking about super short lived allocation here (the troublesome
> one if we spend too much time in compaction and reclaim before
> failing). Plus it only makes a difference when hugepages are so spread
> across the whole system and it's still doing purely short lived
> allocations. So again let's worry about the GFP flag later if
> something...

Sounds like a plan.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-19 10:48             ` Mel Gorman
@ 2010-11-19 12:43               ` Theodore Tso
  2010-11-19 14:05                 ` Mel Gorman
  0 siblings, 1 reply; 34+ messages in thread
From: Theodore Tso @ 2010-11-19 12:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrea Arcangeli, KOSAKI Motohiro, Rik van Riel, linux-mm,
	linux-kernel


On Nov 19, 2010, at 5:48 AM, Mel Gorman wrote:

> At least as long as !CONFIG_COMPACTION exists. That will be a while because
> bear in mind CONFIG_COMPACTION is disabled by default (although I believe
> some distros are enabling it at least). Maybe we should choose to deprecate
> it in 2.6.40 and delete it at the infamous time of 2.6.42? That would give
> ample time to iron out any issues that crop up with reclaim/compaction
> (what this series has turned into).

How about making the default before 2.6.40, as an initial step?

-Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-19 12:43               ` Theodore Tso
@ 2010-11-19 14:05                 ` Mel Gorman
  2010-11-19 15:45                   ` Ted Ts'o
  0 siblings, 1 reply; 34+ messages in thread
From: Mel Gorman @ 2010-11-19 14:05 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Andrew Morton, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrea Arcangeli, KOSAKI Motohiro, Rik van Riel, linux-mm,
	linux-kernel

On Fri, Nov 19, 2010 at 07:43:02AM -0500, Theodore Tso wrote:
> 
> On Nov 19, 2010, at 5:48 AM, Mel Gorman wrote:
> 
> > At least as long as !CONFIG_COMPACTION exists. That will be a while because
> > bear in mind CONFIG_COMPACTION is disabled by default (although I believe
> > some distros are enabling it at least). Maybe we should choose to deprecate
> > it in 2.6.40 and delete it at the infamous time of 2.6.42? That would give
> > ample time to iron out any issues that crop up with reclaim/compaction
> > (what this series has turned into).
> 
> How about making the default before 2.6.40, as an initial step?
> 

It'd be a reasonable way of ensuring it's being tested everywhere
and not by those that are interested or using distro kernel configs.
I guess we'd set to "default y" in the same patch that adds the note to
feature-removal-schedule.txt.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations
  2010-11-19 14:05                 ` Mel Gorman
@ 2010-11-19 15:45                   ` Ted Ts'o
  0 siblings, 0 replies; 34+ messages in thread
From: Ted Ts'o @ 2010-11-19 15:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Johannes Weiner, KAMEZAWA Hiroyuki,
	Andrea Arcangeli, KOSAKI Motohiro, Rik van Riel, linux-mm,
	linux-kernel

On Fri, Nov 19, 2010 at 02:05:32PM +0000, Mel Gorman wrote:
> > 
> > How about making the default before 2.6.40, as an initial step?
> > 
> 
> It'd be a reasonable way of ensuring it's being tested everywhere
> and not by those that are interested or using distro kernel configs.
> I guess we'd set to "default y" in the same patch that adds the note to
> feature-removal-schedule.txt.

I'd suggest doing it now (or soon, before 2.6.40), just to make sure
there aren't massive complaints about performance regressions, etc.,
and then deprecating it at say 2.6.42, and then waiting 6-9 months
before removing it.  But, I'm a bit more conservative about making
such changes.

(Said the person who has reluctantly agreed to keep the minixdf mount
option after we found users when we tried deprecating it.  :-)

       	     	      	    	    	  - Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2010-11-19 15:46 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-17 16:22 [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Mel Gorman
2010-11-17 16:22 ` [PATCH 1/8] mm: compaction: Add trace events for memory compaction activity Mel Gorman
2010-11-17 16:22 ` [PATCH 2/8] mm: vmscan: Convert lumpy_mode into a bitmask Mel Gorman
2010-11-17 16:22 ` [PATCH 3/8] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim Mel Gorman
2010-11-18 18:09   ` Andrea Arcangeli
2010-11-18 18:30     ` Mel Gorman
2010-11-17 16:22 ` [PATCH 4/8] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path Mel Gorman
2010-11-18 18:21   ` Andrea Arcangeli
2010-11-18 18:34     ` Mel Gorman
2010-11-18 19:00       ` Andrea Arcangeli
2010-11-17 16:22 ` [PATCH 5/8] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync Mel Gorman
2010-11-17 16:22 ` [PATCH 6/8] mm: compaction: Perform a faster scan in try_to_compact_pages() Mel Gorman
2010-11-18 18:34   ` Andrea Arcangeli
2010-11-18 18:50     ` Mel Gorman
2010-11-18 19:08       ` Andrea Arcangeli
2010-11-19 11:16         ` Mel Gorman
2010-11-17 16:22 ` [PATCH 7/8] mm: compaction: Use the LRU to get a hint on where compaction should start Mel Gorman
2010-11-18  9:10   ` KAMEZAWA Hiroyuki
2010-11-18  9:28     ` Mel Gorman
2010-11-18 18:46   ` Andrea Arcangeli
2010-11-19 11:08     ` Mel Gorman
2010-11-17 16:22 ` [PATCH 8/8] mm: vmscan: Rename lumpy_mode to reclaim_mode Mel Gorman
2010-11-17 23:46 ` [PATCH 0/8] Use memory compaction instead of lumpy reclaim during high-order allocations Andrew Morton
2010-11-18  2:03   ` Rik van Riel
2010-11-18  8:12   ` Mel Gorman
2010-11-18  8:26     ` KAMEZAWA Hiroyuki
2010-11-18  8:38       ` Johannes Weiner
2010-11-18  9:20         ` Mel Gorman
2010-11-18 19:49           ` Andrew Morton
2010-11-19 10:48             ` Mel Gorman
2010-11-19 12:43               ` Theodore Tso
2010-11-19 14:05                 ` Mel Gorman
2010-11-19 15:45                   ` Ted Ts'o
2010-11-18  8:44       ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).