linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2
@ 2010-11-22 15:43 Mel Gorman
  2010-11-22 15:43 ` [PATCH 1/7] mm: compaction: Add trace events for memory compaction activity Mel Gorman
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Mel Gorman @ 2010-11-22 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

Changelog since V1
  o Drop patch that takes a scanning hint from LRU
  o Loop in reclaim until it is known that enough pages are reclaimed for
    compaction to make forward progress or that progress is no longer
    possible
  o Do not call compaction from within reclaim. Instead have the allocator
    or kswapd call it as necessary
  o Obeying sync in migration now means just avoiding wait_on_page_writeback

Huge page allocations are not expected to be cheap but lumpy reclaim
is still very disruptive. While it is far better than reclaiming random
order-0 pages, it ignores the reference bit of pages near the reference
page selected from the LRU. Memory compaction was merged in 2.6.35 to use
less lumpy reclaim by moving pages around instead of reclaiming when there
were enough pages free. It has been tested fairly heavily at this point.
This is a prototype series to use compaction more aggressively.

When CONFIG_COMPACTION is set, lumpy reclaim is no longer used. Instead,
a mechanism called reclaim/compaction is used where a number of order-0
pages are reclaimed and later the caller uses compaction to satisfy the
allocation. This keeps a larger number of active pages in memory at the cost
of increased use of migration and compaction scanning. With the full series
applied, latencies when allocating huge pages are significantly reduced. By
the end of the series, hints are taken from the LRU on where the best place
to start migrating from might be.

Andrea, this version calls compaction from the callers instead of within
reclaim. Your main concern before was that compaction was being called after
a blind reclaim without checking if enough reclaim work had occurred. This
version is better at checking if enough work has been done but the callers
of compaction are a little awkward. I'm wondering if it really does make
more sense to call compact_zone_order() if should_continue_reclaim() returns
false and indications are that compaction would have a successful outcome.

Four kernels are tested

traceonly		This kernel is using compaction and has the
			tracepoints applied.

reclaimcompact		First three patches. A number of order-0 pages
			are applied and then the zone is compacted. This
			replaces lumpy reclaim but lumpy reclaim is still
			available if compaction is unset.

obeysync		First five patches. Migration will avoid the use
			of wait_on_page_writeback() if requested by the
			caller.

fastscan		First six patches applied. try_to_compact_pages()
			uses shortcuts in the faster compaction path to
			reduce latency.

The final patch is just a rename so it is not reported.  The target test was
a high-order allocation stress test. Testing was based on kernel 2.6.37-rc2.
The test machine was x86-64 with 3G of RAM.

STRESS-HIGHALLOC
                traceonly         reclaimcompact     obeysync         fastscan
Pass 1          90.00 ( 0.00%)    80.00 (-10.00%)    84.00 (-6.00%)   82.00 (-8.00%)
Pass 2          92.00 ( 0.00%)    82.00 (-10.00%)    86.00 (-6.00%)   86.00 (-6.00%)
At Rest         94.00 ( 0.00%)    93.00 (-1.00%)     95.00 ( 1.00%)   93.00 (-1.00%)

MMTests Statistics: duration
User/Sys Time Running Test (seconds)       3359.07   3284.68    3299.3   3292.66
Total Elapsed Time (seconds)               2120.23   1329.19   1314.64   1312.75


Success rates are slightly down at the gain of faster completion times. This
is related to the patches reducing the amount of latency and the work
performed by reclaim. The success figures can be matched but the system
gets hammered more. As the success rates are still very high, it's not
worth the overhead. All in all, the test completes 15 minutes faster which
is a pretty decent improvement.

FTrace Reclaim Statistics: vmscan
                                         traceonly reclaimcompact obeysync fastscan
Direct reclaims                                403        704        757        648 
Direct reclaim pages scanned                 62655     734125     718325     621864 
Direct reclaim pages reclaimed               36445     186805     214376     187671 
Direct reclaim write file async I/O           2090        748        517        561 
Direct reclaim write anon async I/O           9850       8089       5704       4307 
Direct reclaim write file sync I/O               1          0          0          0 
Direct reclaim write anon sync I/O              70          1          1          0 
Wake kswapd requests                           768       1061        890        979 
Kswapd wakeups                                 581        439        451        423 
Kswapd pages scanned                       4566808    2421272    2284775    2349758 
Kswapd pages reclaimed                     2338283    1580849    1558239    1559380 
Kswapd reclaim write file async I/O          48287        858        673        649 
Kswapd reclaim write anon async I/O         755369       3327       3964       4037 
Kswapd reclaim write file sync I/O               0          0          0          0 
Kswapd reclaim write anon sync I/O               0          0          0          0 
Time stalled direct reclaim (seconds)       104.13      41.53      71.18      53.77 
Time kswapd awake (seconds)                 891.88     233.58     199.42     212.52 

Total pages scanned                        4629463   3155397   3003100   2971622
Total pages reclaimed                      2374728   1767654   1772615   1747051
%age total pages scanned/reclaimed          51.30%    56.02%    59.03%    58.79%
%age total pages scanned/written            17.62%     0.41%     0.36%     0.32%
%age  file pages scanned/written             1.09%     0.05%     0.04%     0.04%
Percentage Time Spent Direct Reclaim         3.01%     1.25%     2.11%     1.61%
Percentage Time kswapd Awake                42.07%    17.57%    15.17%    16.19%

These are the reclaim statistics. The time spent in direct reclaim and
with kswapd is reduced as well as less overall reclaim activity (2.4G less
worth of pages reclaimed). It looks like obeysync increases the stall time
for direct reclaimers.  This could be reduced by having kswapd use sync
compaction but the preceived ideal was that it is better for kswapd to
continually make forward progress.

FTrace Reclaim Statistics: compaction
                                        traceonly reclaimcompact obeysync  fastscan
Migrate Pages Scanned                     83190294 1277116960  955517979  927209597 
Migrate Pages Isolated                      245208    4068555    3173644    3920101 
Free    Pages Scanned                     25488658  597156637  668273710  927901903 
Free    Pages Isolated                      335004    4575669    3597552    4408042 
Migrated Pages                              241260    4018215    3123549    3865212 
Migration Failures                            3948      50340      50095      54863 

The patch series increases the amount of compaction activity but this is not
surprising as there are more callers. Once reclaim/compaction is introduced,
the remainder of the series reduces the work slightly. This work doesn't
show up in the latency figures as such but it's trashing cache. Future work
may look at reducing the amount of scanning that is performed by compaction.

The raw figures are convincing enough in terms of the test completes faster
but we really care about latencies so here are the average latencies when
allocating huge pages.

X86-64
http://www.csn.ul.ie/~mel/postings/memorycompact-20101122/highalloc-interlatency-hydra-mean.ps
http://www.csn.ul.ie/~mel/postings/memorycompact-20101122/highalloc-interlatency-hydra-stddev.ps

The mean latencies are pushed *way* down implying that the amount of work
to allocate each huge page is drastically reduced. 

 include/linux/compaction.h        |   20 ++++-
 include/linux/kernel.h            |    7 ++
 include/linux/migrate.h           |   12 ++-
 include/trace/events/compaction.h |   74 +++++++++++++++++
 include/trace/events/vmscan.h     |    6 +-
 mm/compaction.c                   |  132 ++++++++++++++++++++++---------
 mm/memory-failure.c               |    3 +-
 mm/memory_hotplug.c               |    3 +-
 mm/mempolicy.c                    |    6 +-
 mm/migrate.c                      |   22 +++--
 mm/page_alloc.c                   |   32 +++++++-
 mm/vmscan.c                       |  157 ++++++++++++++++++++++++++++---------
 12 files changed, 371 insertions(+), 103 deletions(-)
 create mode 100644 include/trace/events/compaction.h


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/7] mm: compaction: Add trace events for memory compaction activity
  2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
@ 2010-11-22 15:43 ` Mel Gorman
  2010-11-22 15:43 ` [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask Mel Gorman
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2010-11-22 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

In preparation for a patches promoting the use of memory compaction over lumpy
reclaim, this patch adds trace points for memory compaction activity. Using
them, we can monitor the scanning activity of the migration and free page
scanners as well as the number and success rates of pages passed to page
migration.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/compaction.h |   74 +++++++++++++++++++++++++++++++++++++
 mm/compaction.c                   |   14 ++++++-
 2 files changed, 87 insertions(+), 1 deletions(-)
 create mode 100644 include/trace/events/compaction.h

diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
new file mode 100644
index 0000000..388bcdd
--- /dev/null
+++ b/include/trace/events/compaction.h
@@ -0,0 +1,74 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM compaction
+
+#if !defined(_TRACE_COMPACTION_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_COMPACTION_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+#include "gfpflags.h"
+
+DECLARE_EVENT_CLASS(mm_compaction_isolate_template,
+
+	TP_PROTO(unsigned long nr_scanned,
+		unsigned long nr_taken),
+
+	TP_ARGS(nr_scanned, nr_taken),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, nr_scanned)
+		__field(unsigned long, nr_taken)
+	),
+
+	TP_fast_assign(
+		__entry->nr_scanned = nr_scanned;
+		__entry->nr_taken = nr_taken;
+	),
+
+	TP_printk("nr_scanned=%lu nr_taken=%lu",
+		__entry->nr_scanned,
+		__entry->nr_taken)
+);
+
+DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_migratepages,
+
+	TP_PROTO(unsigned long nr_scanned,
+		unsigned long nr_taken),
+
+	TP_ARGS(nr_scanned, nr_taken)
+);
+
+DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_freepages,
+	TP_PROTO(unsigned long nr_scanned,
+		unsigned long nr_taken),
+
+	TP_ARGS(nr_scanned, nr_taken)
+);
+
+TRACE_EVENT(mm_compaction_migratepages,
+
+	TP_PROTO(unsigned long nr_migrated,
+		unsigned long nr_failed),
+
+	TP_ARGS(nr_migrated, nr_failed),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, nr_migrated)
+		__field(unsigned long, nr_failed)
+	),
+
+	TP_fast_assign(
+		__entry->nr_migrated = nr_migrated;
+		__entry->nr_failed = nr_failed;
+	),
+
+	TP_printk("nr_migrated=%lu nr_failed=%lu",
+		__entry->nr_migrated,
+		__entry->nr_failed)
+);
+
+
+#endif /* _TRACE_COMPACTION_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d709ee..bc8eb8a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -16,6 +16,9 @@
 #include <linux/sysfs.h>
 #include "internal.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/compaction.h>
+
 /*
  * compact_control is used to track pages being migrated and the free pages
  * they are being migrated to during memory compaction. The free_pfn starts
@@ -60,7 +63,7 @@ static unsigned long isolate_freepages_block(struct zone *zone,
 				struct list_head *freelist)
 {
 	unsigned long zone_end_pfn, end_pfn;
-	int total_isolated = 0;
+	int nr_scanned = 0, total_isolated = 0;
 	struct page *cursor;
 
 	/* Get the last PFN we should scan for free pages at */
@@ -81,6 +84,7 @@ static unsigned long isolate_freepages_block(struct zone *zone,
 
 		if (!pfn_valid_within(blockpfn))
 			continue;
+		nr_scanned++;
 
 		if (!PageBuddy(page))
 			continue;
@@ -100,6 +104,7 @@ static unsigned long isolate_freepages_block(struct zone *zone,
 		}
 	}
 
+	trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
 	return total_isolated;
 }
 
@@ -234,6 +239,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 					struct compact_control *cc)
 {
 	unsigned long low_pfn, end_pfn;
+	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
 
 	/* Do not scan outside zone boundaries */
@@ -266,6 +272,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 		struct page *page;
 		if (!pfn_valid_within(low_pfn))
 			continue;
+		nr_scanned++;
 
 		/* Get the page and skip if free */
 		page = pfn_to_page(low_pfn);
@@ -281,6 +288,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 		list_add(&page->lru, migratelist);
 		mem_cgroup_del_lru(page);
 		cc->nr_migratepages++;
+		nr_isolated++;
 
 		/* Avoid isolating too much */
 		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
@@ -292,6 +300,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
 	spin_unlock_irq(&zone->lru_lock);
 	cc->migrate_pfn = low_pfn;
 
+	trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
+
 	return cc->nr_migratepages;
 }
 
@@ -402,6 +412,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		count_vm_events(COMPACTPAGES, nr_migrate - nr_remaining);
 		if (nr_remaining)
 			count_vm_events(COMPACTPAGEFAILED, nr_remaining);
+		trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
+						nr_remaining);
 
 		/* Release LRU pages not migrated */
 		if (!list_empty(&cc->migratepages)) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask
  2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
  2010-11-22 15:43 ` [PATCH 1/7] mm: compaction: Add trace events for memory compaction activity Mel Gorman
@ 2010-11-22 15:43 ` Mel Gorman
  2010-12-01 10:27   ` Johannes Weiner
  2010-11-22 15:43 ` [PATCH 3/7] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim Mel Gorman
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-11-22 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
syncronous or asyncronous. In preparation for using compaction instead of
lumpy reclaim, this patch converts the flags into a bitmap.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/vmscan.h |    6 ++--
 mm/vmscan.c                   |   46 +++++++++++++++++++++++++----------------
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index c255fcc..be76429 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,13 +25,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
 			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d31d7ce..e5eda92 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -51,11 +51,20 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
-enum lumpy_mode {
-	LUMPY_MODE_NONE,
-	LUMPY_MODE_ASYNC,
-	LUMPY_MODE_SYNC,
-};
+/*
+ * lumpy_mode determines how the inactive list is shrunk
+ * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
+ * LUMPY_MODE_ASYNC:  Do not block
+ * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
+ * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
+ *			page from the LRU and reclaim all pages within a
+ *			naturally aligned range
+ */
+typedef unsigned __bitwise__ lumpy_mode;
+#define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
+#define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
+#define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
+#define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -88,7 +97,7 @@ struct scan_control {
 	 * Intend to reclaim enough continuous memory rather than reclaim
 	 * enough amount of memory. i.e, mode for high order allocation.
 	 */
-	enum lumpy_mode lumpy_reclaim_mode;
+	lumpy_mode lumpy_reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -274,13 +283,13 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
 
 	/*
 	 * Some reclaim have alredy been failed. No worth to try synchronous
 	 * lumpy reclaim.
 	 */
-	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
+	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
 		return;
 
 	/*
@@ -288,17 +297,18 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 	 * trouble getting a small set of contiguous pages, we
 	 * will reclaim both active and inactive pages.
 	 */
+	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode = mode;
+		sc->lumpy_reclaim_mode |= syncmode;
 	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode = mode;
+		sc->lumpy_reclaim_mode |= syncmode;
 	else
-		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
 }
 
 static void disable_lumpy_reclaim_mode(struct scan_control *sc)
 {
-	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+	sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
 }
 
 static inline int is_page_cache_freeable(struct page *page)
@@ -429,7 +439,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 		 * first attempt to free a range of pages fails.
 		 */
 		if (PageWriteback(page) &&
-		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
+		    (sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC))
 			wait_on_page_writeback(page);
 
 		if (!PageWriteback(page)) {
@@ -615,7 +625,7 @@ static enum page_references page_check_references(struct page *page,
 	referenced_page = TestClearPageReferenced(page);
 
 	/* Lumpy reclaim - ignore references */
-	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM)
 		return PAGEREF_RECLAIM;
 
 	/*
@@ -732,7 +742,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 * for any page for which writeback has already
 			 * started.
 			 */
-			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
+			if ((sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC) &&
 			    may_enter_fs)
 				wait_on_page_writeback(page);
 			else {
@@ -1317,7 +1327,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 		return false;
 
 	/* Only stall on lumpy reclaim */
-	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
 		return false;
 
 	/* If we have relaimed everything on the isolated list, no stall */
@@ -1368,7 +1378,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
 					ISOLATE_INACTIVE : ISOLATE_BOTH,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
@@ -1381,7 +1391,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
 					ISOLATE_INACTIVE : ISOLATE_BOTH,
 			zone, sc->mem_cgroup,
 			0, file);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/7] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim
  2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
  2010-11-22 15:43 ` [PATCH 1/7] mm: compaction: Add trace events for memory compaction activity Mel Gorman
  2010-11-22 15:43 ` [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask Mel Gorman
@ 2010-11-22 15:43 ` Mel Gorman
  2010-12-01 10:27   ` Johannes Weiner
  2010-11-22 15:43 ` [PATCH 4/7] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path Mel Gorman
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-11-22 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores
the age of the pages it reclaims. This can incur significant stalls and
potentially increase the number of major faults.

Compaction has reached the point where it is considered reasonably stable
(meaning it has passed a lot of testing) and is a potential candidate for
displacing lumpy reclaim. This patch introduces an alternative to lumpy
reclaim whe compaction is available called reclaim/compaction. The basic
operation is very simple - instead of selecting a contiguous range of pages
to reclaim, a number of order-0 pages are reclaimed and then compaction is
later by either kswapd (compact_zone_order()) or direct compaction
(__alloc_pages_direct_compact()).

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/compaction.h |   14 ++++++
 include/linux/kernel.h     |    7 +++
 mm/compaction.c            |   89 +++++++++++++++++++++++--------------
 mm/page_alloc.c            |   13 ++++++
 mm/vmscan.c                |  103 +++++++++++++++++++++++++++++++++++++------
 5 files changed, 177 insertions(+), 49 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 5ac5155..e082cf9 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -22,6 +22,9 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask);
+extern unsigned long compaction_suitable(struct zone *zone, int order);
+extern unsigned long compact_zone_order(struct zone *zone, int order,
+						gfp_t gfp_mask);
 
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
@@ -59,6 +62,17 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	return COMPACT_CONTINUE;
 }
 
+static inline unsigned long compaction_suitable(struct zone *zone, int order)
+{
+	return COMPACT_SKIPPED;
+}
+
+extern unsigned long compact_zone_order(struct zone *zone, int order,
+						gfp_t gfp_mask)
+{
+	return 0;
+}
+
 static inline void defer_compaction(struct zone *zone)
 {
 }
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index b6de9a6..7ee8a21 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -584,6 +584,13 @@ struct sysinfo {
 #define NUMA_BUILD 0
 #endif
 
+/* This helps us avoid #ifdef CONFIG_COMPACTION */
+#ifdef CONFIG_COMPACTION
+#define COMPACTION_BUILD 1
+#else
+#define COMPACTION_BUILD 0
+#endif
+
 /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
 # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD
diff --git a/mm/compaction.c b/mm/compaction.c
index bc8eb8a..384fa71 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -385,10 +385,62 @@ static int compact_finished(struct zone *zone,
 	return COMPACT_CONTINUE;
 }
 
+/*
+ * compaction_suitable: Is this suitable to run compaction on this zone now?
+ * Returns
+ *   COMPACT_SKIPPED  - If there are too few free pages for compaction
+ *   COMPACT_PARTIAL  - If the allocation would succeed without compaction
+ *   COMPACT_CONTINUE - If compaction should run now
+ */
+unsigned long compaction_suitable(struct zone *zone, int order)
+{
+	int fragindex;
+	unsigned long watermark;
+
+	/*
+	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
+	 * This is because during migration, copies of pages need to be
+	 * allocated and for a short time, the footprint is higher
+	 */
+	watermark = low_wmark_pages(zone) + (2UL << order);
+	if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+		return COMPACT_SKIPPED;
+
+	/*
+	 * fragmentation index determines if allocation failures are due to
+	 * low memory or external fragmentation
+	 *
+	 * index of -1 implies allocations might succeed dependingon watermarks
+	 * index towards 0 implies failure is due to lack of memory
+	 * index towards 1000 implies failure is due to fragmentation
+	 *
+	 * Only compact if a failure would be due to fragmentation.
+	 */
+	fragindex = fragmentation_index(zone, order);
+	if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)
+		return COMPACT_SKIPPED;
+
+	if (fragindex == -1 && zone_watermark_ok(zone, order, watermark, 0, 0))
+		return COMPACT_PARTIAL;
+
+	return COMPACT_CONTINUE;
+}
+
 static int compact_zone(struct zone *zone, struct compact_control *cc)
 {
 	int ret;
 
+	ret = compaction_suitable(zone, cc->order);
+	switch (ret) {
+	case COMPACT_PARTIAL:
+	case COMPACT_SKIPPED:
+		/* Compaction is likely to fail */
+		return ret;
+	case COMPACT_CONTINUE:
+		/* Fall through to compaction */
+		;
+	}
+
 	/* Setup to move all movable pages to the end of the zone */
 	cc->migrate_pfn = zone->zone_start_pfn;
 	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
@@ -430,7 +482,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	return ret;
 }
 
-static unsigned long compact_zone_order(struct zone *zone,
+unsigned long compact_zone_order(struct zone *zone,
 						int order, gfp_t gfp_mask)
 {
 	struct compact_control cc = {
@@ -463,7 +515,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	int may_enter_fs = gfp_mask & __GFP_FS;
 	int may_perform_io = gfp_mask & __GFP_IO;
-	unsigned long watermark;
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
@@ -481,43 +532,13 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
 								nodemask) {
-		int fragindex;
 		int status;
 
-		/*
-		 * Watermarks for order-0 must be met for compaction. Note
-		 * the 2UL. This is because during migration, copies of
-		 * pages need to be allocated and for a short time, the
-		 * footprint is higher
-		 */
-		watermark = low_wmark_pages(zone) + (2UL << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
-			continue;
-
-		/*
-		 * fragmentation index determines if allocation failures are
-		 * due to low memory or external fragmentation
-		 *
-		 * index of -1 implies allocations might succeed depending
-		 * 	on watermarks
-		 * index towards 0 implies failure is due to lack of memory
-		 * index towards 1000 implies failure is due to fragmentation
-		 *
-		 * Only compact if a failure would be due to fragmentation.
-		 */
-		fragindex = fragmentation_index(zone, order);
-		if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)
-			continue;
-
-		if (fragindex == -1 && zone_watermark_ok(zone, order, watermark, 0, 0)) {
-			rc = COMPACT_PARTIAL;
-			break;
-		}
-
 		status = compact_zone_order(zone, order, gfp_mask);
 		rc = max(status, rc);
 
-		if (zone_watermark_ok(zone, order, watermark, 0, 0))
+		/* If a normal allocation would succeed, stop compacting */
+		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
 	}
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 07a6544..2c88655 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2097,6 +2097,19 @@ rebalance:
 		/* Wait for some write requests to complete then retry */
 		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
 		goto rebalance;
+	} else {
+		/*
+		 * High-order allocations do not necessarily loop after
+		 * direct reclaim and reclaim/compaction depends on compaction
+		 * being called after reclaim so call directly if necessary
+		 */
+		page = __alloc_pages_direct_compact(gfp_mask, order,
+					zonelist, high_zoneidx,
+					nodemask,
+					alloc_flags, preferred_zone,
+					migratetype, &did_some_progress);
+		if (page)
+			goto got_pg;
 	}
 
 nopage:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e5eda92..3fb7a76 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -32,6 +32,7 @@
 #include <linux/topology.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
+#include <linux/compaction.h>
 #include <linux/notifier.h>
 #include <linux/rwsem.h>
 #include <linux/delay.h>
@@ -59,12 +60,15 @@
  * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
  *			page from the LRU and reclaim all pages within a
  *			naturally aligned range
+ * LUMPY_MODE_COMPACTION: For high-order allocations, reclaim a number of
+ *			order-0 pages and then compact the zone
  */
 typedef unsigned __bitwise__ lumpy_mode;
 #define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
 #define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
 #define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
 #define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
+#define LUMPY_MODE_COMPACTION		((__force lumpy_mode)0x10u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -286,18 +290,20 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
 
 	/*
-	 * Some reclaim have alredy been failed. No worth to try synchronous
-	 * lumpy reclaim.
+	 * Initially assume we are entering either lumpy reclaim or
+	 * reclaim/compaction.Depending on the order, we will either set the
+	 * sync mode or just reclaim order-0 pages later.
 	 */
-	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
-		return;
+	if (COMPACTION_BUILD)
+		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
+	else
+		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 
 	/*
-	 * If we need a large contiguous chunk of memory, or have
-	 * trouble getting a small set of contiguous pages, we
-	 * will reclaim both active and inactive pages.
+	 * Avoid using lumpy reclaim or reclaim/compaction if possible by
+	 * restricting when its set to either costly allocations or when
+	 * under memory pressure
 	 */
-	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
 		sc->lumpy_reclaim_mode |= syncmode;
 	else if (sc->order && priority < DEF_PRIORITY - 2)
@@ -1378,8 +1384,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
-					ISOLATE_INACTIVE : ISOLATE_BOTH,
+			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
 		if (current_is_kswapd())
@@ -1391,8 +1397,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
-					ISOLATE_INACTIVE : ISOLATE_BOTH,
+			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, sc->mem_cgroup,
 			0, file);
 		/*
@@ -1808,6 +1814,57 @@ out:
 }
 
 /*
+ * Reclaim/compaction depends on a number of pages being freed. To avoid
+ * disruption to the system, a small number of order-0 pages continue to be
+ * rotated and reclaimed in the normal fashion. However, by the time we get
+ * back to the allocator and call try_to_compact_zone(), we ensure that
+ * there are enough free pages for it to be likely successful
+ */
+static inline bool should_continue_reclaim(struct zone *zone,
+					unsigned long nr_reclaimed,
+					unsigned long nr_scanned,
+					struct scan_control *sc)
+{
+	unsigned long pages_for_compaction;
+	unsigned long inactive_lru_pages;
+
+	/* If not in reclaim/compaction mode, stop */
+	if (!(sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION))
+		return false;
+
+	/*
+	 * If we failed to reclaim and have scanned the full list, stop.
+	 * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far
+	 *       faster but obviously would be less likely to succeed
+	 *       allocation. If this is desirable, use GFP_REPEAT to decide
+	 *       if both reclaimed and scanned should be checked or just
+	 *       reclaimed
+	 */
+	if (!nr_reclaimed && !nr_scanned)
+		return false;
+
+	/*
+	 * If we have not reclaimed enough pages for compaction and the
+	 * inactive lists are large enough, continue reclaiming
+	 */
+	pages_for_compaction = (2UL << sc->order);
+	inactive_lru_pages = zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON) +
+				zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
+	if (sc->nr_reclaimed < pages_for_compaction &&
+			inactive_lru_pages > pages_for_compaction)
+		return true;
+
+	/* If compaction would go ahead or the allocation would succeed, stop */
+	switch (compaction_suitable(zone, sc->order)) {
+	case COMPACT_PARTIAL:
+	case COMPACT_CONTINUE:
+		return false;
+	default:
+		return true;
+	}
+}
+
+/*
  * This is a basic per-zone page freer.  Used by both kswapd and direct reclaim.
  */
 static void shrink_zone(int priority, struct zone *zone,
@@ -1816,9 +1873,12 @@ static void shrink_zone(int priority, struct zone *zone,
 	unsigned long nr[NR_LRU_LISTS];
 	unsigned long nr_to_scan;
 	enum lru_list l;
-	unsigned long nr_reclaimed = sc->nr_reclaimed;
+	unsigned long nr_reclaimed;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
+	unsigned long nr_scanned = sc->nr_scanned;
 
+restart:
+	nr_reclaimed = 0;
 	get_scan_count(zone, sc, nr, priority);
 
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
@@ -1844,8 +1904,7 @@ static void shrink_zone(int priority, struct zone *zone,
 		if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
 			break;
 	}
-
-	sc->nr_reclaimed = nr_reclaimed;
+	sc->nr_reclaimed += nr_reclaimed;
 
 	/*
 	 * Even if we did not try to evict anon pages at all, we want to
@@ -1854,6 +1913,11 @@ static void shrink_zone(int priority, struct zone *zone,
 	if (inactive_anon_is_low(zone, sc))
 		shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
 
+	/* reclaim/compaction might need reclaim to continue */
+	if (should_continue_reclaim(zone, nr_reclaimed,
+					sc->nr_scanned - nr_scanned, sc))
+		goto restart;
+
 	throttle_vm_writeout(sc->gfp_mask);
 }
 
@@ -2300,6 +2364,15 @@ loop_again:
 			    total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
 				sc.may_writepage = 1;
 
+			/*
+			 * Compact the zone for higher orders to reduce
+			 * latencies for higher-order allocations that
+			 * would ordinarily call try_to_compact_pages()
+			 */
+			if (sc.order > PAGE_ALLOC_COSTLY_ORDER)
+				compact_zone_order(zone, sc.order,
+						sc.gfp_mask);
+
 			if (!zone_watermark_ok(zone, order,
 					high_wmark_pages(zone), end_zone, 0)) {
 				all_zones_ok = 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/7] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path
  2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
                   ` (2 preceding siblings ...)
  2010-11-22 15:43 ` [PATCH 3/7] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim Mel Gorman
@ 2010-11-22 15:43 ` Mel Gorman
  2010-12-01 10:28   ` Johannes Weiner
  2010-11-22 15:43 ` [PATCH 5/7] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync Mel Gorman
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-11-22 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

Migration synchronously waits for writeback if the initial passes fails.
Callers of memory compaction do not necessarily want this behaviour if the
caller is latency sensitive or expects that synchronous migration is not
going to have a significantly better success rate.

This patch adds a sync parameter to migrate_pages() allowing the caller to
indicate if wait_on_page_writeback() is allowed within migration or not. For
reclaim/compaction, try_to_compact_pages() is first called asynchronously,
direct reclaim runs and then try_to_compact_pages() is called synchronously
as there is a greater expectation that it'll succeed.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/compaction.h |   10 ++++++----
 include/linux/migrate.h    |   12 ++++++++----
 mm/compaction.c            |   14 ++++++++++----
 mm/memory-failure.c        |    3 ++-
 mm/memory_hotplug.c        |    3 ++-
 mm/mempolicy.c             |    4 ++--
 mm/migrate.c               |   22 +++++++++++++---------
 mm/page_alloc.c            |   21 +++++++++++++++------
 mm/vmscan.c                |    2 +-
 9 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index e082cf9..d0aeffd 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -21,10 +21,11 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
-			int order, gfp_t gfp_mask, nodemask_t *mask);
+			int order, gfp_t gfp_mask, nodemask_t *mask,
+			bool sync);
 extern unsigned long compaction_suitable(struct zone *zone, int order);
 extern unsigned long compact_zone_order(struct zone *zone, int order,
-						gfp_t gfp_mask);
+						gfp_t gfp_mask, bool sync);
 
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
@@ -57,7 +58,8 @@ static inline bool compaction_deferred(struct zone *zone)
 
 #else
 static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
-			int order, gfp_t gfp_mask, nodemask_t *nodemask)
+			int order, gfp_t gfp_mask, nodemask_t *nodemask,
+			bool sync)
 {
 	return COMPACT_CONTINUE;
 }
@@ -68,7 +70,7 @@ static inline unsigned long compaction_suitable(struct zone *zone, int order)
 }
 
 extern unsigned long compact_zone_order(struct zone *zone, int order,
-						gfp_t gfp_mask)
+						gfp_t gfp_mask, bool sync)
 {
 	return 0;
 }
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 085527f..fa31902 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -13,9 +13,11 @@ extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
 			struct page *, struct page *);
 extern int migrate_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining);
+			unsigned long private, int offlining,
+			bool sync);
 extern int migrate_huge_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining);
+			unsigned long private, int offlining,
+			bool sync);
 
 extern int fail_migrate_page(struct address_space *,
 			struct page *, struct page *);
@@ -33,9 +35,11 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining) { return -ENOSYS; }
+		unsigned long private, int offlining,
+		bool sync) { return -ENOSYS; }
 static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining) { return -ENOSYS; }
+		unsigned long private, int offlining,
+		bool sync) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 384fa71..03bd8f9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -33,6 +33,7 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	bool sync;			/* Synchronous migration */
 
 	/* Account for isolated anon and file pages */
 	unsigned long nr_anon;
@@ -456,7 +457,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = cc->nr_migratepages;
 		migrate_pages(&cc->migratepages, compaction_alloc,
-						(unsigned long)cc, 0);
+				(unsigned long)cc, 0,
+				cc->sync);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -483,7 +485,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 }
 
 unsigned long compact_zone_order(struct zone *zone,
-						int order, gfp_t gfp_mask)
+						int order, gfp_t gfp_mask,
+						bool sync)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -491,6 +494,7 @@ unsigned long compact_zone_order(struct zone *zone,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
+		.sync = sync,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
@@ -506,11 +510,13 @@ int sysctl_extfrag_threshold = 500;
  * @order: The order of the current allocation
  * @gfp_mask: The GFP mask of the current allocation
  * @nodemask: The allowed nodes to allocate from
+ * @sync: Whether migration is synchronous or not
  *
  * This is the main entry point for direct page compaction.
  */
 unsigned long try_to_compact_pages(struct zonelist *zonelist,
-			int order, gfp_t gfp_mask, nodemask_t *nodemask)
+			int order, gfp_t gfp_mask, nodemask_t *nodemask,
+			bool sync)
 {
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	int may_enter_fs = gfp_mask & __GFP_FS;
@@ -534,7 +540,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask);
+		status = compact_zone_order(zone, order, gfp_mask, sync);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1243241..188294e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1413,7 +1413,8 @@ int soft_offline_page(struct page *page, int flags)
 		LIST_HEAD(pagelist);
 
 		list_add(&page->lru, &pagelist);
-		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0);
+		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
+								0, true);
 		if (ret) {
 			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
 				pfn, ret, page->flags);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9260314..221178b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -716,7 +716,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			goto out;
 		}
 		/* this function returns # of failed pages */
-		ret = migrate_pages(&source, hotremove_migrate_alloc, 0, 1);
+		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
+								1, true);
 		if (ret)
 			putback_lru_pages(&source);
 	}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4a57f13..8b1a490 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -935,7 +935,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 		return PTR_ERR(vma);
 
 	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_node_page, dest, 0);
+		err = migrate_pages(&pagelist, new_node_page, dest, 0, true);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
@@ -1155,7 +1155,7 @@ static long do_mbind(unsigned long start, unsigned long len,
 
 		if (!list_empty(&pagelist)) {
 			nr_failed = migrate_pages(&pagelist, new_vma_page,
-						(unsigned long)vma, 0);
+						(unsigned long)vma, 0, true);
 			if (nr_failed)
 				putback_lru_pages(&pagelist);
 		}
diff --git a/mm/migrate.c b/mm/migrate.c
index fe5a3c6..678a84a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -612,7 +612,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
  * to the newly allocated page in newpage.
  */
 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force, int offlining)
+			struct page *page, int force, int offlining, bool sync)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -663,7 +663,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 	BUG_ON(charge);
 
 	if (PageWriteback(page)) {
-		if (!force)
+		if (!force || !sync)
 			goto uncharge;
 		wait_on_page_writeback(page);
 	}
@@ -808,7 +808,7 @@ move_newpage:
  */
 static int unmap_and_move_huge_page(new_page_t get_new_page,
 				unsigned long private, struct page *hpage,
-				int force, int offlining)
+				int force, int offlining, bool sync)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -822,7 +822,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	rc = -EAGAIN;
 
 	if (!trylock_page(hpage)) {
-		if (!force)
+		if (!force || !sync)
 			goto out;
 		lock_page(hpage);
 	}
@@ -890,7 +890,8 @@ out:
  * Return: Number of pages not migrated or error code.
  */
 int migrate_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, int offlining)
+		new_page_t get_new_page, unsigned long private, int offlining,
+		bool sync)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -910,7 +911,8 @@ int migrate_pages(struct list_head *from,
 			cond_resched();
 
 			rc = unmap_and_move(get_new_page, private,
-						page, pass > 2, offlining);
+						page, pass > 2, offlining,
+						sync);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -939,7 +941,8 @@ out:
 }
 
 int migrate_huge_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, int offlining)
+		new_page_t get_new_page, unsigned long private, int offlining,
+		bool sync)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -955,7 +958,8 @@ int migrate_huge_pages(struct list_head *from,
 			cond_resched();
 
 			rc = unmap_and_move_huge_page(get_new_page,
-					private, page, pass > 2, offlining);
+					private, page, pass > 2, offlining,
+					sync);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -1088,7 +1092,7 @@ set_status:
 	err = 0;
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_page_node,
-				(unsigned long)pm, 0);
+				(unsigned long)pm, 0, true);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2c88655..c9e0fbe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1788,7 +1788,8 @@ static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress)
+	int migratetype, unsigned long *did_some_progress,
+	bool sync_migration)
 {
 	struct page *page;
 
@@ -1796,7 +1797,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		return NULL;
 
 	*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
-								nodemask);
+						nodemask, sync_migration);
 	if (*did_some_progress != COMPACT_SKIPPED) {
 
 		/* Page migration frees to the PCP lists but we want merging */
@@ -1832,7 +1833,8 @@ static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress)
+	int migratetype, unsigned long *did_some_progress,
+	bool sync_migration)
 {
 	return NULL;
 }
@@ -1974,6 +1976,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned long pages_reclaimed = 0;
 	unsigned long did_some_progress;
 	struct task_struct *p = current;
+	bool sync_migration = false;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -2036,14 +2039,19 @@ rebalance:
 	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
 		goto nopage;
 
-	/* Try direct compaction */
+	/*
+	 * Try direct compaction. The first pass is asynchronous. Subsequent
+	 * attempts after direct reclaim are synchronous
+	 */
 	page = __alloc_pages_direct_compact(gfp_mask, order,
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, &did_some_progress);
+					migratetype, &did_some_progress,
+					sync_migration);
 	if (page)
 		goto got_pg;
+	sync_migration = true;
 
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,
@@ -2107,7 +2115,8 @@ rebalance:
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, &did_some_progress);
+					migratetype, &did_some_progress,
+					sync_migration);
 		if (page)
 			goto got_pg;
 	}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3fb7a76..6a6aa7d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2371,7 +2371,7 @@ loop_again:
 			 */
 			if (sc.order > PAGE_ALLOC_COSTLY_ORDER)
 				compact_zone_order(zone, sc.order,
-						sc.gfp_mask);
+						sc.gfp_mask, false);
 
 			if (!zone_watermark_ok(zone, order,
 					high_wmark_pages(zone), end_zone, 0)) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/7] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync
  2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
                   ` (3 preceding siblings ...)
  2010-11-22 15:43 ` [PATCH 4/7] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path Mel Gorman
@ 2010-11-22 15:43 ` Mel Gorman
  2010-12-01 10:28   ` Johannes Weiner
  2010-11-22 15:43 ` [PATCH 6/7] mm: compaction: Perform a faster migration scan when migrating asynchronously Mel Gorman
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-11-22 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

With the introduction of the boolean sync parameter, the API looks a
little inconsistent as offlining is still an int. Convert offlining to a
bool for the sake of being tidy.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/migrate.h |    8 ++++----
 mm/compaction.c         |    2 +-
 mm/memory_hotplug.c     |    2 +-
 mm/mempolicy.c          |    6 ++++--
 mm/migrate.c            |    8 ++++----
 5 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index fa31902..e39aeec 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -13,10 +13,10 @@ extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
 			struct page *, struct page *);
 extern int migrate_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining,
+			unsigned long private, bool offlining,
 			bool sync);
 extern int migrate_huge_pages(struct list_head *l, new_page_t x,
-			unsigned long private, int offlining,
+			unsigned long private, bool offlining,
 			bool sync);
 
 extern int fail_migrate_page(struct address_space *,
@@ -35,10 +35,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining,
+		unsigned long private, bool offlining,
 		bool sync) { return -ENOSYS; }
 static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
-		unsigned long private, int offlining,
+		unsigned long private, bool offlining,
 		bool sync) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 03bd8f9..b6e589d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -457,7 +457,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = cc->nr_migratepages;
 		migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, 0,
+				(unsigned long)cc, false,
 				cc->sync);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 221178b..6178c80 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -717,7 +717,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		}
 		/* this function returns # of failed pages */
 		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								1, true);
+								true, true);
 		if (ret)
 			putback_lru_pages(&source);
 	}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8b1a490..9beb008 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -935,7 +935,8 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 		return PTR_ERR(vma);
 
 	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_node_page, dest, 0, true);
+		err = migrate_pages(&pagelist, new_node_page, dest,
+								false, true);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
@@ -1155,7 +1156,8 @@ static long do_mbind(unsigned long start, unsigned long len,
 
 		if (!list_empty(&pagelist)) {
 			nr_failed = migrate_pages(&pagelist, new_vma_page,
-						(unsigned long)vma, 0, true);
+						(unsigned long)vma,
+						false, true);
 			if (nr_failed)
 				putback_lru_pages(&pagelist);
 		}
diff --git a/mm/migrate.c b/mm/migrate.c
index 678a84a..2eb2243 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -612,7 +612,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
  * to the newly allocated page in newpage.
  */
 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force, int offlining, bool sync)
+			struct page *page, int force, bool offlining, bool sync)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -808,7 +808,7 @@ move_newpage:
  */
 static int unmap_and_move_huge_page(new_page_t get_new_page,
 				unsigned long private, struct page *hpage,
-				int force, int offlining, bool sync)
+				int force, bool offlining, bool sync)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -890,7 +890,7 @@ out:
  * Return: Number of pages not migrated or error code.
  */
 int migrate_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, int offlining,
+		new_page_t get_new_page, unsigned long private, bool offlining,
 		bool sync)
 {
 	int retry = 1;
@@ -941,7 +941,7 @@ out:
 }
 
 int migrate_huge_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private, int offlining,
+		new_page_t get_new_page, unsigned long private, bool offlining,
 		bool sync)
 {
 	int retry = 1;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/7] mm: compaction: Perform a faster migration scan when migrating asynchronously
  2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
                   ` (4 preceding siblings ...)
  2010-11-22 15:43 ` [PATCH 5/7] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync Mel Gorman
@ 2010-11-22 15:43 ` Mel Gorman
  2010-12-01 10:31   ` Johannes Weiner
  2010-11-22 15:43 ` [PATCH 7/7] mm: vmscan: Rename lumpy_mode to reclaim_mode Mel Gorman
  2010-11-22 16:01 ` [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Andrea Arcangeli
  7 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-11-22 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

try_to_compact_pages() is initially called to only migrate pages asychronously
and kswapd always compacts asynchronously. Both are being optimistic so it
is important to complete the work as quickly as possible to minimise stalls.

This patch alters the scanner when asynchronous to only consider
MIGRATE_MOVABLE pageblocks as migration candidates. This reduces stalls
when allocating huge pages while not impairing allocation success rates as
a full scan will be performed if necessary after direct reclaim.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/compaction.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b6e589d..50b0a90 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -240,6 +240,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
 					struct compact_control *cc)
 {
 	unsigned long low_pfn, end_pfn;
+	unsigned long last_pageblock_nr = 0, pageblock_nr;
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
 
@@ -280,6 +281,20 @@ static unsigned long isolate_migratepages(struct zone *zone,
 		if (PageBuddy(page))
 			continue;
 
+		/*
+		 * For async migration, also only scan in MOVABLE blocks. Async
+		 * migration is optimistic to see if the minimum amount of work
+		 * satisfies the allocation
+		 */
+		pageblock_nr = low_pfn >> pageblock_order;
+		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+				get_pageblock_migratetype(page) != MIGRATE_MOVABLE) {
+			low_pfn += pageblock_nr_pages;
+			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
+			last_pageblock_nr = pageblock_nr;
+			continue;
+		}
+
 		/* Try isolate the page */
 		if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
 			continue;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 7/7] mm: vmscan: Rename lumpy_mode to reclaim_mode
  2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
                   ` (5 preceding siblings ...)
  2010-11-22 15:43 ` [PATCH 6/7] mm: compaction: Perform a faster migration scan when migrating asynchronously Mel Gorman
@ 2010-11-22 15:43 ` Mel Gorman
  2010-12-01 10:34   ` Johannes Weiner
  2010-11-22 16:01 ` [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Andrea Arcangeli
  7 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-11-22 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, linux-mm, linux-kernel

With compaction being used instead of lumpy reclaim, the name lumpy_mode
and associated variables is a bit misleading. Rename lumpy_mode to
reclaim_mode which is a better fit. There is no functional change.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/vmscan.h |    6 ++--
 mm/vmscan.c                   |   70 ++++++++++++++++++++--------------------
 2 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index be76429..ea422aa 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,13 +25,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \
 			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6a6aa7d..92af572 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,22 +53,22 @@
 #include <trace/events/vmscan.h>
 
 /*
- * lumpy_mode determines how the inactive list is shrunk
- * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
- * LUMPY_MODE_ASYNC:  Do not block
- * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
- * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
+ * reclaim_mode determines how the inactive list is shrunk
+ * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
+ * RECLAIM_MODE_ASYNC:  Do not block
+ * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
+ * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, take a reference
  *			page from the LRU and reclaim all pages within a
  *			naturally aligned range
- * LUMPY_MODE_COMPACTION: For high-order allocations, reclaim a number of
+ * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
-typedef unsigned __bitwise__ lumpy_mode;
-#define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
-#define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
-#define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
-#define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
-#define LUMPY_MODE_COMPACTION		((__force lumpy_mode)0x10u)
+typedef unsigned __bitwise__ reclaim_mode;
+#define RECLAIM_MODE_SINGLE		((__force reclaim_mode)0x01u)
+#define RECLAIM_MODE_ASYNC		((__force reclaim_mode)0x02u)
+#define RECLAIM_MODE_SYNC		((__force reclaim_mode)0x04u)
+#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode)0x08u)
+#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode)0x10u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -101,7 +101,7 @@ struct scan_control {
 	 * Intend to reclaim enough continuous memory rather than reclaim
 	 * enough amount of memory. i.e, mode for high order allocation.
 	 */
-	lumpy_mode lumpy_reclaim_mode;
+	reclaim_mode reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -284,10 +284,10 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	return ret;
 }
 
-static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
+static void set_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+	reclaim_mode syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
 
 	/*
 	 * Initially assume we are entering either lumpy reclaim or
@@ -295,9 +295,9 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 	 * sync mode or just reclaim order-0 pages later.
 	 */
 	if (COMPACTION_BUILD)
-		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
+		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
 	else
-		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
+		sc->reclaim_mode = RECLAIM_MODE_LUMPYRECLAIM;
 
 	/*
 	 * Avoid using lumpy reclaim or reclaim/compaction if possible by
@@ -305,16 +305,16 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 	 * under memory pressure
 	 */
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode |= syncmode;
+		sc->reclaim_mode |= syncmode;
 	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode |= syncmode;
+		sc->reclaim_mode |= syncmode;
 	else
-		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
+		sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
 }
 
-static void disable_lumpy_reclaim_mode(struct scan_control *sc)
+static void reset_reclaim_mode(struct scan_control *sc)
 {
-	sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
+	sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
 }
 
 static inline int is_page_cache_freeable(struct page *page)
@@ -445,7 +445,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 		 * first attempt to free a range of pages fails.
 		 */
 		if (PageWriteback(page) &&
-		    (sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC))
+		    (sc->reclaim_mode & RECLAIM_MODE_SYNC))
 			wait_on_page_writeback(page);
 
 		if (!PageWriteback(page)) {
@@ -453,7 +453,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			ClearPageReclaim(page);
 		}
 		trace_mm_vmscan_writepage(page,
-			trace_reclaim_flags(page, sc->lumpy_reclaim_mode));
+			trace_reclaim_flags(page, sc->reclaim_mode));
 		inc_zone_page_state(page, NR_VMSCAN_WRITE);
 		return PAGE_SUCCESS;
 	}
@@ -631,7 +631,7 @@ static enum page_references page_check_references(struct page *page,
 	referenced_page = TestClearPageReferenced(page);
 
 	/* Lumpy reclaim - ignore references */
-	if (sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM)
+	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
 		return PAGEREF_RECLAIM;
 
 	/*
@@ -748,7 +748,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 * for any page for which writeback has already
 			 * started.
 			 */
-			if ((sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC) &&
+			if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
 			    may_enter_fs)
 				wait_on_page_writeback(page);
 			else {
@@ -904,7 +904,7 @@ cull_mlocked:
 			try_to_free_swap(page);
 		unlock_page(page);
 		putback_lru_page(page);
-		disable_lumpy_reclaim_mode(sc);
+		reset_reclaim_mode(sc);
 		continue;
 
 activate_locked:
@@ -917,7 +917,7 @@ activate_locked:
 keep_locked:
 		unlock_page(page);
 keep:
-		disable_lumpy_reclaim_mode(sc);
+		reset_reclaim_mode(sc);
 keep_lumpy:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
@@ -1333,7 +1333,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 		return false;
 
 	/* Only stall on lumpy reclaim */
-	if (sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
+	if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
 		return false;
 
 	/* If we have relaimed everything on the isolated list, no stall */
@@ -1377,14 +1377,14 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 			return SWAP_CLUSTER_MAX;
 	}
 
-	set_lumpy_reclaim_mode(priority, sc, false);
+	set_reclaim_mode(priority, sc, false);
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+			sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
 					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
@@ -1397,7 +1397,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+			sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
 					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, sc->mem_cgroup,
 			0, file);
@@ -1420,7 +1420,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		set_lumpy_reclaim_mode(priority, sc, true);
+		set_reclaim_mode(priority, sc, true);
 		nr_reclaimed += shrink_page_list(&page_list, zone, sc);
 	}
 
@@ -1435,7 +1435,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
 		priority,
-		trace_shrink_flags(file, sc->lumpy_reclaim_mode));
+		trace_shrink_flags(file, sc->reclaim_mode));
 	return nr_reclaimed;
 }
 
@@ -1829,7 +1829,7 @@ static inline bool should_continue_reclaim(struct zone *zone,
 	unsigned long inactive_lru_pages;
 
 	/* If not in reclaim/compaction mode, stop */
-	if (!(sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION))
+	if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION))
 		return false;
 
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2
  2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
                   ` (6 preceding siblings ...)
  2010-11-22 15:43 ` [PATCH 7/7] mm: vmscan: Rename lumpy_mode to reclaim_mode Mel Gorman
@ 2010-11-22 16:01 ` Andrea Arcangeli
  7 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2010-11-22 16:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

Hi Mel,

this looks great to me. I'll replace my patch 1/66 with this patchset
to test with THP. It shall work fine.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask
  2010-11-22 15:43 ` [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask Mel Gorman
@ 2010-12-01 10:27   ` Johannes Weiner
  2010-12-01 10:50     ` Mel Gorman
  2010-12-02 12:03     ` [PATCH] mm: vmscan: Rename lumpy_mode to reclaim_mode fix Mel Gorman
  0 siblings, 2 replies; 22+ messages in thread
From: Johannes Weiner @ 2010-12-01 10:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Mon, Nov 22, 2010 at 03:43:50PM +0000, Mel Gorman wrote:
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -51,11 +51,20 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/vmscan.h>
>  
> -enum lumpy_mode {
> -	LUMPY_MODE_NONE,
> -	LUMPY_MODE_ASYNC,
> -	LUMPY_MODE_SYNC,
> -};
> +/*
> + * lumpy_mode determines how the inactive list is shrunk
> + * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
> + * LUMPY_MODE_ASYNC:  Do not block
> + * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
> + * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
> + *			page from the LRU and reclaim all pages within a
> + *			naturally aligned range

I find those names terribly undescriptive.  It also strikes me as an
odd set of flags.  Can't this be represented with less?

	LUMPY_MODE_ENABLED
	LUMPY_MODE_SYNC

or, after the rename,

	RECLAIM_MODE_HIGHER	= 1
	RECLAIM_MODE_SYNC	= 2
	RECLAIM_MODE_LUMPY	= 4

where compaction mode is default if RECLAIM_MODE_HIGHER, and
RECLAIM_MODE_LUMPY will go away eventually.

Also, if you have a flag name for 'reclaim with extra efforts for
higher order pages' that is better than RECLAIM_MODE_HIGHER... ;)

> +typedef unsigned __bitwise__ lumpy_mode;

lumpy_mode_t / reclaim_mode_t?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/7] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim
  2010-11-22 15:43 ` [PATCH 3/7] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim Mel Gorman
@ 2010-12-01 10:27   ` Johannes Weiner
  2010-12-01 10:56     ` Mel Gorman
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2010-12-01 10:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Mon, Nov 22, 2010 at 03:43:51PM +0000, Mel Gorman wrote:
> Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores
> the age of the pages it reclaims. This can incur significant stalls and
> potentially increase the number of major faults.
> 
> Compaction has reached the point where it is considered reasonably stable
> (meaning it has passed a lot of testing) and is a potential candidate for
> displacing lumpy reclaim. This patch introduces an alternative to lumpy
> reclaim whe compaction is available called reclaim/compaction. The basic
> operation is very simple - instead of selecting a contiguous range of pages
> to reclaim, a number of order-0 pages are reclaimed and then compaction is
> later by either kswapd (compact_zone_order()) or direct compaction
> (__alloc_pages_direct_compact()).
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

> @@ -286,18 +290,20 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
>  	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
>  
>  	/*
> -	 * Some reclaim have alredy been failed. No worth to try synchronous
> -	 * lumpy reclaim.
> +	 * Initially assume we are entering either lumpy reclaim or
> +	 * reclaim/compaction.Depending on the order, we will either set the
> +	 * sync mode or just reclaim order-0 pages later.
>  	 */
> -	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
> -		return;
> +	if (COMPACTION_BUILD)
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> +	else
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;

Isn't this a regression for !COMPACTION_BUILD in that earlier kernels
would not do sync lumpy reclaim when somebody disabled it during the
async run?

If so, it should be trivial to fix.  Aside from that

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/7] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path
  2010-11-22 15:43 ` [PATCH 4/7] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path Mel Gorman
@ 2010-12-01 10:28   ` Johannes Weiner
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2010-12-01 10:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Mon, Nov 22, 2010 at 03:43:52PM +0000, Mel Gorman wrote:
> Migration synchronously waits for writeback if the initial passes fails.
> Callers of memory compaction do not necessarily want this behaviour if the
> caller is latency sensitive or expects that synchronous migration is not
> going to have a significantly better success rate.
> 
> This patch adds a sync parameter to migrate_pages() allowing the caller to
> indicate if wait_on_page_writeback() is allowed within migration or not. For
> reclaim/compaction, try_to_compact_pages() is first called asynchronously,
> direct reclaim runs and then try_to_compact_pages() is called synchronously
> as there is a greater expectation that it'll succeed.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/7] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync
  2010-11-22 15:43 ` [PATCH 5/7] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync Mel Gorman
@ 2010-12-01 10:28   ` Johannes Weiner
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2010-12-01 10:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Mon, Nov 22, 2010 at 03:43:53PM +0000, Mel Gorman wrote:
> With the introduction of the boolean sync parameter, the API looks a
> little inconsistent as offlining is still an int. Convert offlining to a
> bool for the sake of being tidy.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] mm: compaction: Perform a faster migration scan when migrating asynchronously
  2010-11-22 15:43 ` [PATCH 6/7] mm: compaction: Perform a faster migration scan when migrating asynchronously Mel Gorman
@ 2010-12-01 10:31   ` Johannes Weiner
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2010-12-01 10:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Mon, Nov 22, 2010 at 03:43:54PM +0000, Mel Gorman wrote:
> try_to_compact_pages() is initially called to only migrate pages asychronously
> and kswapd always compacts asynchronously. Both are being optimistic so it
> is important to complete the work as quickly as possible to minimise stalls.
> 
> This patch alters the scanner when asynchronous to only consider
> MIGRATE_MOVABLE pageblocks as migration candidates. This reduces stalls
> when allocating huge pages while not impairing allocation success rates as
> a full scan will be performed if necessary after direct reclaim.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/7] mm: vmscan: Rename lumpy_mode to reclaim_mode
  2010-11-22 15:43 ` [PATCH 7/7] mm: vmscan: Rename lumpy_mode to reclaim_mode Mel Gorman
@ 2010-12-01 10:34   ` Johannes Weiner
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2010-12-01 10:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Mon, Nov 22, 2010 at 03:43:55PM +0000, Mel Gorman wrote:
> With compaction being used instead of lumpy reclaim, the name lumpy_mode
> and associated variables is a bit misleading. Rename lumpy_mode to
> reclaim_mode which is a better fit. There is no functional change.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask
  2010-12-01 10:27   ` Johannes Weiner
@ 2010-12-01 10:50     ` Mel Gorman
  2010-12-01 11:21       ` Johannes Weiner
  2010-12-02 12:03     ` [PATCH] mm: vmscan: Rename lumpy_mode to reclaim_mode fix Mel Gorman
  1 sibling, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-12-01 10:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Wed, Dec 01, 2010 at 11:27:32AM +0100, Johannes Weiner wrote:
> On Mon, Nov 22, 2010 at 03:43:50PM +0000, Mel Gorman wrote:
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -51,11 +51,20 @@
> >  #define CREATE_TRACE_POINTS
> >  #include <trace/events/vmscan.h>
> >  
> > -enum lumpy_mode {
> > -	LUMPY_MODE_NONE,
> > -	LUMPY_MODE_ASYNC,
> > -	LUMPY_MODE_SYNC,
> > -};
> > +/*
> > + * lumpy_mode determines how the inactive list is shrunk
> > + * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
> > + * LUMPY_MODE_ASYNC:  Do not block
> > + * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
> > + * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
> > + *			page from the LRU and reclaim all pages within a
> > + *			naturally aligned range
> 
> I find those names terribly undescriptive.  It also strikes me as an
> odd set of flags.  Can't this be represented with less?
> 
> 	LUMPY_MODE_ENABLED
> 	LUMPY_MODE_SYNC
> 
> or, after the rename,
> 
> 	RECLAIM_MODE_HIGHER	= 1
> 	RECLAIM_MODE_SYNC	= 2
> 	RECLAIM_MODE_LUMPY	= 4
> 

My problem with that is you have to infer what the behaviour is from what the
flags "are not" as opposed to what they are. For example, !LUMPY_MODE_SYNC
implies LUMPY_MODE_ASYNC instead of specifying LUMPY_MODE_ASYNC. It also
looks very odd when trying to distinguish between order-0 standard reclaim,
lumpy reclaim and reclaim/compaction.

> where compaction mode is default if RECLAIM_MODE_HIGHER, and
> RECLAIM_MODE_LUMPY will go away eventually.
> 
> Also, if you have a flag name for 'reclaim with extra efforts for
> higher order pages' that is better than RECLAIM_MODE_HIGHER... ;)
> 
> > +typedef unsigned __bitwise__ lumpy_mode;
> 
> lumpy_mode_t / reclaim_mode_t?
> 

It can't hurt!

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/7] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim
  2010-12-01 10:27   ` Johannes Weiner
@ 2010-12-01 10:56     ` Mel Gorman
  2010-12-01 11:32       ` Johannes Weiner
  0 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-12-01 10:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Wed, Dec 01, 2010 at 11:27:45AM +0100, Johannes Weiner wrote:
> On Mon, Nov 22, 2010 at 03:43:51PM +0000, Mel Gorman wrote:
> > Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores
> > the age of the pages it reclaims. This can incur significant stalls and
> > potentially increase the number of major faults.
> > 
> > Compaction has reached the point where it is considered reasonably stable
> > (meaning it has passed a lot of testing) and is a potential candidate for
> > displacing lumpy reclaim. This patch introduces an alternative to lumpy
> > reclaim whe compaction is available called reclaim/compaction. The basic
> > operation is very simple - instead of selecting a contiguous range of pages
> > to reclaim, a number of order-0 pages are reclaimed and then compaction is
> > later by either kswapd (compact_zone_order()) or direct compaction
> > (__alloc_pages_direct_compact()).
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> 
> > @@ -286,18 +290,20 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> >  	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> >  
> >  	/*
> > -	 * Some reclaim have alredy been failed. No worth to try synchronous
> > -	 * lumpy reclaim.
> > +	 * Initially assume we are entering either lumpy reclaim or
> > +	 * reclaim/compaction.Depending on the order, we will either set the
> > +	 * sync mode or just reclaim order-0 pages later.
> >  	 */
> > -	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
> > -		return;
> > +	if (COMPACTION_BUILD)
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> > +	else
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> 
> Isn't this a regression for !COMPACTION_BUILD in that earlier kernels
> would not do sync lumpy reclaim when somebody disabled it during the
> async run?
> 

You'll need to clarify your question I'm afraid. In 2.6.36 for example,
if lumpy reclaim gets disabled then sync reclaim does not happen at all.
This was due to large stalls being observed when copying large amounts
of data to slow storage such as a USB external drive.

> If so, it should be trivial to fix.  Aside from that
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask
  2010-12-01 10:50     ` Mel Gorman
@ 2010-12-01 11:21       ` Johannes Weiner
  2010-12-01 11:56         ` Mel Gorman
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2010-12-01 11:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Wed, Dec 01, 2010 at 10:50:29AM +0000, Mel Gorman wrote:
> On Wed, Dec 01, 2010 at 11:27:32AM +0100, Johannes Weiner wrote:
> > On Mon, Nov 22, 2010 at 03:43:50PM +0000, Mel Gorman wrote:
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -51,11 +51,20 @@
> > >  #define CREATE_TRACE_POINTS
> > >  #include <trace/events/vmscan.h>
> > >  
> > > -enum lumpy_mode {
> > > -	LUMPY_MODE_NONE,
> > > -	LUMPY_MODE_ASYNC,
> > > -	LUMPY_MODE_SYNC,
> > > -};
> > > +/*
> > > + * lumpy_mode determines how the inactive list is shrunk
> > > + * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
> > > + * LUMPY_MODE_ASYNC:  Do not block
> > > + * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
> > > + * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
> > > + *			page from the LRU and reclaim all pages within a
> > > + *			naturally aligned range
> > 
> > I find those names terribly undescriptive.  It also strikes me as an
> > odd set of flags.  Can't this be represented with less?
> > 
> > 	LUMPY_MODE_ENABLED
> > 	LUMPY_MODE_SYNC
> > 
> > or, after the rename,
> > 
> > 	RECLAIM_MODE_HIGHER	= 1
> > 	RECLAIM_MODE_SYNC	= 2
> > 	RECLAIM_MODE_LUMPY	= 4
> > 
> 
> My problem with that is you have to infer what the behaviour is from what the
> flags "are not" as opposed to what they are. For example, !LUMPY_MODE_SYNC
> implies LUMPY_MODE_ASYNC instead of specifying LUMPY_MODE_ASYNC.

Sounds like a boolean value to me.  And it shows: you never actually
check for RECLAIM_MODE_ASYNC in the code, you just always set it to
the opposite of RECLAIM_MODE_SYNC - the flag which is actually read.

> It also looks very odd when trying to distinguish between order-0
> standard reclaim, lumpy reclaim and reclaim/compaction.

That is true, because this is still an actual tristate.  It's probably
better to defer until lumpy reclaim is gone and there is only one flag
for higher-order reclaim left.

> > > +typedef unsigned __bitwise__ lumpy_mode;
> > 
> > lumpy_mode_t / reclaim_mode_t?
> > 
> 
> It can't hurt!

Thanks :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/7] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim
  2010-12-01 10:56     ` Mel Gorman
@ 2010-12-01 11:32       ` Johannes Weiner
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2010-12-01 11:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Wed, Dec 01, 2010 at 10:56:49AM +0000, Mel Gorman wrote:
> On Wed, Dec 01, 2010 at 11:27:45AM +0100, Johannes Weiner wrote:
> > On Mon, Nov 22, 2010 at 03:43:51PM +0000, Mel Gorman wrote:
> > > Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores
> > > the age of the pages it reclaims. This can incur significant stalls and
> > > potentially increase the number of major faults.
> > > 
> > > Compaction has reached the point where it is considered reasonably stable
> > > (meaning it has passed a lot of testing) and is a potential candidate for
> > > displacing lumpy reclaim. This patch introduces an alternative to lumpy
> > > reclaim whe compaction is available called reclaim/compaction. The basic
> > > operation is very simple - instead of selecting a contiguous range of pages
> > > to reclaim, a number of order-0 pages are reclaimed and then compaction is
> > > later by either kswapd (compact_zone_order()) or direct compaction
> > > (__alloc_pages_direct_compact()).
> > > 
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > 
> > > @@ -286,18 +290,20 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> > >  	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> > >  
> > >  	/*
> > > -	 * Some reclaim have alredy been failed. No worth to try synchronous
> > > -	 * lumpy reclaim.
> > > +	 * Initially assume we are entering either lumpy reclaim or
> > > +	 * reclaim/compaction.Depending on the order, we will either set the
> > > +	 * sync mode or just reclaim order-0 pages later.
> > >  	 */
> > > -	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
> > > -		return;
> > > +	if (COMPACTION_BUILD)
> > > +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> > > +	else
> > > +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> > 
> > Isn't this a regression for !COMPACTION_BUILD in that earlier kernels
> > would not do sync lumpy reclaim when somebody disabled it during the
> > async run?
> > 
> 
> You'll need to clarify your question I'm afraid. In 2.6.36 for example,
> if lumpy reclaim gets disabled then sync reclaim does not happen at all.
> This was due to large stalls being observed when copying large amounts
> of data to slow storage such as a USB external drive.

Sorry for the noise, I just verified that it really was dead code.  We
have

	if (should_reclaim_stall())
		set_lumpy_reclaim_mode(.sync=true)

but because the branch is never taken if lumpy is disabled, the
conditional in set_lumpy_reclaim_mode() is dead.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask
  2010-12-01 11:21       ` Johannes Weiner
@ 2010-12-01 11:56         ` Mel Gorman
  2010-12-02 11:04           ` Johannes Weiner
  0 siblings, 1 reply; 22+ messages in thread
From: Mel Gorman @ 2010-12-01 11:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Wed, Dec 01, 2010 at 12:21:16PM +0100, Johannes Weiner wrote:
> On Wed, Dec 01, 2010 at 10:50:29AM +0000, Mel Gorman wrote:
> > On Wed, Dec 01, 2010 at 11:27:32AM +0100, Johannes Weiner wrote:
> > > On Mon, Nov 22, 2010 at 03:43:50PM +0000, Mel Gorman wrote:
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -51,11 +51,20 @@
> > > >  #define CREATE_TRACE_POINTS
> > > >  #include <trace/events/vmscan.h>
> > > >  
> > > > -enum lumpy_mode {
> > > > -	LUMPY_MODE_NONE,
> > > > -	LUMPY_MODE_ASYNC,
> > > > -	LUMPY_MODE_SYNC,
> > > > -};
> > > > +/*
> > > > + * lumpy_mode determines how the inactive list is shrunk
> > > > + * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
> > > > + * LUMPY_MODE_ASYNC:  Do not block
> > > > + * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
> > > > + * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
> > > > + *			page from the LRU and reclaim all pages within a
> > > > + *			naturally aligned range
> > > 
> > > I find those names terribly undescriptive.  It also strikes me as an
> > > odd set of flags.  Can't this be represented with less?
> > > 
> > > 	LUMPY_MODE_ENABLED
> > > 	LUMPY_MODE_SYNC
> > > 
> > > or, after the rename,
> > > 
> > > 	RECLAIM_MODE_HIGHER	= 1
> > > 	RECLAIM_MODE_SYNC	= 2
> > > 	RECLAIM_MODE_LUMPY	= 4
> > > 
> > 
> > My problem with that is you have to infer what the behaviour is from what the
> > flags "are not" as opposed to what they are. For example, !LUMPY_MODE_SYNC
> > implies LUMPY_MODE_ASYNC instead of specifying LUMPY_MODE_ASYNC.
> 
> Sounds like a boolean value to me.  And it shows: you never actually
> check for RECLAIM_MODE_ASYNC in the code, you just always set it to
> the opposite of RECLAIM_MODE_SYNC - the flag which is actually read.
> 

If you insist, the ASYNC flag can be dropped. I found it easier to flag
what behaviour was expected than infer it. In retrospect, I should have
passed the flag into set_reclaim_mode() instead of a boolean and it
would have been obvious from the caller site as well.

> > It also looks very odd when trying to distinguish between order-0
> > standard reclaim, lumpy reclaim and reclaim/compaction.
> 
> That is true, because this is still an actual tristate.  It's probably
> better to defer until lumpy reclaim is gone and there is only one flag
> for higher-order reclaim left.
> 

Sure.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask
  2010-12-01 11:56         ` Mel Gorman
@ 2010-12-02 11:04           ` Johannes Weiner
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2010-12-02 11:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, linux-kernel

On Wed, Dec 01, 2010 at 11:56:33AM +0000, Mel Gorman wrote:
> On Wed, Dec 01, 2010 at 12:21:16PM +0100, Johannes Weiner wrote:
> > On Wed, Dec 01, 2010 at 10:50:29AM +0000, Mel Gorman wrote:
> > > On Wed, Dec 01, 2010 at 11:27:32AM +0100, Johannes Weiner wrote:
> > > > On Mon, Nov 22, 2010 at 03:43:50PM +0000, Mel Gorman wrote:
> > > > > + * lumpy_mode determines how the inactive list is shrunk
> > > > > + * LUMPY_MODE_SINGLE: Reclaim only order-0 pages
> > > > > + * LUMPY_MODE_ASYNC:  Do not block
> > > > > + * LUMPY_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
> > > > > + * LUMPY_MODE_CONTIGRECLAIM: For high-order allocations, take a reference
> > > > > + *			page from the LRU and reclaim all pages within a
> > > > > + *			naturally aligned range
> > > > 
> > > > I find those names terribly undescriptive.  It also strikes me as an
> > > > odd set of flags.  Can't this be represented with less?
> > > > 
> > > > 	LUMPY_MODE_ENABLED
> > > > 	LUMPY_MODE_SYNC
> > > > 
> > > > or, after the rename,
> > > > 
> > > > 	RECLAIM_MODE_HIGHER	= 1
> > > > 	RECLAIM_MODE_SYNC	= 2
> > > > 	RECLAIM_MODE_LUMPY	= 4
> > > 
> > > My problem with that is you have to infer what the behaviour is from what the
> > > flags "are not" as opposed to what they are. For example, !LUMPY_MODE_SYNC
> > > implies LUMPY_MODE_ASYNC instead of specifying LUMPY_MODE_ASYNC.
> > 
> > Sounds like a boolean value to me.  And it shows: you never actually
> > check for RECLAIM_MODE_ASYNC in the code, you just always set it to
> > the opposite of RECLAIM_MODE_SYNC - the flag which is actually read.
> 
> If you insist, the ASYNC flag can be dropped. I found it easier to flag
> what behaviour was expected than infer it.

It seems to be a matter of taste and nobody else seems to care, so I
am not insisting.  Let's just keep it as it is.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] mm: vmscan: Rename lumpy_mode to reclaim_mode fix
  2010-12-01 10:27   ` Johannes Weiner
  2010-12-01 10:50     ` Mel Gorman
@ 2010-12-02 12:03     ` Mel Gorman
  1 sibling, 0 replies; 22+ messages in thread
From: Mel Gorman @ 2010-12-02 12:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Johannes Weiner, KOSAKI Motohiro, Rik van Riel,
	linux-mm, linux-kernel

As suggested by Johannes, rename reclaim_mode to reclaim_mode_t. This is
a fix to the mmotm patch
broken-out/mm-vmscan-rename-lumpy_mode-to-reclaim_mode.patch.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/vmscan.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 42a4859..a9390fd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -63,12 +63,12 @@
  * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
-typedef unsigned __bitwise__ reclaim_mode;
-#define RECLAIM_MODE_SINGLE		((__force reclaim_mode)0x01u)
-#define RECLAIM_MODE_ASYNC		((__force reclaim_mode)0x02u)
-#define RECLAIM_MODE_SYNC		((__force reclaim_mode)0x04u)
-#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode)0x08u)
-#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode)0x10u)
+typedef unsigned __bitwise__ reclaim_mode_t;
+#define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
+#define RECLAIM_MODE_ASYNC		((__force reclaim_mode_t)0x02u)
+#define RECLAIM_MODE_SYNC		((__force reclaim_mode_t)0x04u)
+#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode_t)0x08u)
+#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -101,7 +101,7 @@ struct scan_control {
 	 * Intend to reclaim enough continuous memory rather than reclaim
 	 * enough amount of memory. i.e, mode for high order allocation.
 	 */
-	reclaim_mode reclaim_mode;
+	reclaim_mode_t reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -287,7 +287,7 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 static void set_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	reclaim_mode syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
+	reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
 
 	/*
 	 * Initially assume we are entering either lumpy reclaim or

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-12-02 12:04 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-22 15:43 [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Mel Gorman
2010-11-22 15:43 ` [PATCH 1/7] mm: compaction: Add trace events for memory compaction activity Mel Gorman
2010-11-22 15:43 ` [PATCH 2/7] mm: vmscan: Convert lumpy_mode into a bitmask Mel Gorman
2010-12-01 10:27   ` Johannes Weiner
2010-12-01 10:50     ` Mel Gorman
2010-12-01 11:21       ` Johannes Weiner
2010-12-01 11:56         ` Mel Gorman
2010-12-02 11:04           ` Johannes Weiner
2010-12-02 12:03     ` [PATCH] mm: vmscan: Rename lumpy_mode to reclaim_mode fix Mel Gorman
2010-11-22 15:43 ` [PATCH 3/7] mm: vmscan: Reclaim order-0 and use compaction instead of lumpy reclaim Mel Gorman
2010-12-01 10:27   ` Johannes Weiner
2010-12-01 10:56     ` Mel Gorman
2010-12-01 11:32       ` Johannes Weiner
2010-11-22 15:43 ` [PATCH 4/7] mm: migration: Allow migration to operate asynchronously and avoid synchronous compaction in the faster path Mel Gorman
2010-12-01 10:28   ` Johannes Weiner
2010-11-22 15:43 ` [PATCH 5/7] mm: migration: Cleanup migrate_pages API by matching types for offlining and sync Mel Gorman
2010-12-01 10:28   ` Johannes Weiner
2010-11-22 15:43 ` [PATCH 6/7] mm: compaction: Perform a faster migration scan when migrating asynchronously Mel Gorman
2010-12-01 10:31   ` Johannes Weiner
2010-11-22 15:43 ` [PATCH 7/7] mm: vmscan: Rename lumpy_mode to reclaim_mode Mel Gorman
2010-12-01 10:34   ` Johannes Weiner
2010-11-22 16:01 ` [PATCH 0/7] Use memory compaction instead of lumpy reclaim during high-order allocations V2 Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).