linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim
@ 2010-11-11 19:07 Mel Gorman
  2010-11-11 19:07 ` [PATCH 1/3] mm,vmscan: Convert lumpy_mode into a bitmask Mel Gorman
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Mel Gorman @ 2010-11-11 19:07 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

(cc'ing people currently looking at transparent hugepages as this series
is aimed at avoiding lumpy reclaim being deleted)

Huge page allocations are not expected to be cheap but lumpy reclaim is still
very disruptive. While it is far better than reclaiming random order-0 pages
and hoping for the best, it still ignore the reference bit of pages near the
reference page selected from the LRU. Memory compaction was merged in 2.6.35
to use less lumpy reclaim by moving pages around instead of reclaiming when
there were enough pages free. It has been tested fairly heavily at this point.
This is a prototype series to use compaction more aggressively.

When CONFIG_COMPACTION is set, lumpy reclaim is avoided where possible. What
it does instead is reclaim a number of order-0 pages and then compact the
zone to try and satisfy the allocation. This keeps a larger number of active
pages in memory at the cost of increased use of migration and compaction
scanning. As this is a prototype, it's also very clumsy. For example,
set_lumpy_reclaim_mode() still allows lumpy reclaim to be used and the
decision on when to use it is primitive. Lumpy reclaim can be avoided
entirely of course but the tests were a bit inconclusive - allocation
latency was lower if lumpy reclaim was never used but the test completion
times and reclaim statistics looked worse so I need to reconsider both the
analysis and the implementation. It's also about as subtle as a brick when
it comes to compaction doing a blind compaction of the zone after reclaiming
which is almost certainly more frequent than it needs to be but I'm leaving
optimisation considerations for the moment.

Ultimately, what I'd like to do is implement "lumpy compaction" where a
number of order-0 pages are reclaimed and then the pages that would be lumpy
reclaimed are instead migrated but it would be longer term and involve a
tight integration of compaction and reclaim which maybe we'd like to avoid
in the first pass. This series was to establish if just order-0 reclaims
and compaction is potentially workable and the test results are reasonably
promising. kernbench and sysbench were run as sniff tests even though they do
not exercise reclaim and performance was not affected as expected. The target
test was a high-order allocation stress test. Testing was based on kernel
2.6.37-rc1 with commit d88c0922 applied which fixes an important bug related
to page reference counting. The test machine was x86-64 with 3G of RAM.

STRESS-HIGHALLOC
                  fix-d88c0922 lumpycompact-v1r2
Pass 1          90.00 ( 0.00%)    89.00 (-1.00%)
Pass 2          91.00 ( 0.00%)    91.00 ( 0.00%)
At Rest         94.00 ( 0.00%)    94.00 ( 0.00%)

MMTests Statistics: duration
User/Sys Time Running Test (seconds)       3356.15   3336.46
Total Elapsed Time (seconds)               2052.07   1853.79

Success rates the same so functionally it's similar and it completed a bit
faster.

FTrace Reclaim Statistics: vmscan
                                      fix-d88c0922 lumpycompact-v1r2
Direct reclaims                                673        468 
Direct reclaim pages scanned                 60521     108221 
Direct reclaim pages reclaimed               37300      67114 
Direct reclaim write file async I/O           1459       3825 
Direct reclaim write anon async I/O           7989      10694 
Direct reclaim write file sync I/O               0          0 
Direct reclaim write anon sync I/O              92         53 
Wake kswapd requests                           823      11681 
Kswapd wakeups                                 608        558 
Kswapd pages scanned                       4509407    3682736 
Kswapd pages reclaimed                     2278056    2176076 
Kswapd reclaim write file async I/O          58446      46853 
Kswapd reclaim write anon async I/O         696616     410210 
Kswapd reclaim write file sync I/O               0          0 
Kswapd reclaim write anon sync I/O               0          0 
Time stalled direct reclaim (seconds)       139.75     128.09 
Time kswapd awake (seconds)                 833.03     669.29 

Total pages scanned                        4569928   3790957
Total pages reclaimed                      2315356   2243190
%age total pages scanned/reclaimed          50.67%    59.17%
%age total pages scanned/written            16.73%    12.44%
%age  file pages scanned/written             1.31%     1.34%
Percentage Time Spent Direct Reclaim         4.00%     3.70%
Percentage Time kswapd Awake                40.59%    36.10%

The time spent stalled and with kswapd awake are both
reduced as well as the total number of pages scanned and
reclaimed. Some of the ratios looks nicer but it's not very
obviously better except for the average latencies which I have posted at
http://www.csn.ul.ie/~mel/postings/lumpycompact-20101111/highalloc-interlatency-hydra-mean.ps
. Similar, the stddev graph in the same directory shows that allocation
times is more predictable.

The tarball I used for testing is available at
http://www.csn.ul.ie/~mel/mmtests-0.01-lumpycompact-0.01.tar.gz . The suite
assumes that the kernel source being tested was built and deployed on the
test machine. Otherwise, it should be a case of 

1. build + deploy kernel with d88c0922 applied
2. ./run-mmtests.sh --run-monitor vanilla
3. build + deploy with this series applies
4. ./run-mmtests.sh --run-monitor lumpycompact-v1r3

Results for comparison are in work/log . There is a rudimentary reporting
script called compare-kernel.sh which should be run with a CWD of work/log.

Comments?

 include/linux/compaction.h    |    9 +++++-
 include/linux/kernel.h        |    7 +++++
 include/trace/events/vmscan.h |    6 ++--
 mm/compaction.c               |    2 +-
 mm/vmscan.c                   |   61 +++++++++++++++++++++++++---------------
 5 files changed, 57 insertions(+), 28 deletions(-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] mm,vmscan: Convert lumpy_mode into a bitmask
  2010-11-11 19:07 [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim Mel Gorman
@ 2010-11-11 19:07 ` Mel Gorman
  2010-11-14  5:40   ` KOSAKI Motohiro
  2010-11-11 19:07 ` [PATCH 2/3] mm,compaction: Add COMPACTION_BUILD Mel Gorman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2010-11-11 19:07 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
syncronous or asyncronous. In preparation for using compaction instead of
lumpy reclaim, this patch converts the flags into a bitmap.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/trace/events/vmscan.h |    6 +++---
 mm/vmscan.c                   |   37 +++++++++++++++++++------------------
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index c255fcc..be76429 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,13 +25,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
 			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b8a6fdc..ffa438e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -51,11 +51,11 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
-enum lumpy_mode {
-	LUMPY_MODE_NONE,
-	LUMPY_MODE_ASYNC,
-	LUMPY_MODE_SYNC,
-};
+typedef unsigned __bitwise__ lumpy_mode;
+#define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
+#define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
+#define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
+#define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -88,7 +88,7 @@ struct scan_control {
 	 * Intend to reclaim enough continuous memory rather than reclaim
 	 * enough amount of memory. i.e, mode for high order allocation.
 	 */
-	enum lumpy_mode lumpy_reclaim_mode;
+	lumpy_mode lumpy_reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -274,13 +274,13 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+	lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
 
 	/*
 	 * Some reclaim have alredy been failed. No worth to try synchronous
 	 * lumpy reclaim.
 	 */
-	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
+	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
 		return;
 
 	/*
@@ -288,17 +288,18 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 	 * trouble getting a small set of contiguous pages, we
 	 * will reclaim both active and inactive pages.
 	 */
+	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode = mode;
+		sc->lumpy_reclaim_mode |= mode;
 	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode = mode;
+		sc->lumpy_reclaim_mode |= mode;
 	else
-		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
 }
 
 static void disable_lumpy_reclaim_mode(struct scan_control *sc)
 {
-	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+	sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
 }
 
 static inline int is_page_cache_freeable(struct page *page)
@@ -429,7 +430,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 		 * first attempt to free a range of pages fails.
 		 */
 		if (PageWriteback(page) &&
-		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
+		    (sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC))
 			wait_on_page_writeback(page);
 
 		if (!PageWriteback(page)) {
@@ -615,7 +616,7 @@ static enum page_references page_check_references(struct page *page,
 	referenced_page = TestClearPageReferenced(page);
 
 	/* Lumpy reclaim - ignore references */
-	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM)
 		return PAGEREF_RECLAIM;
 
 	/*
@@ -732,7 +733,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 * for any page for which writeback has already
 			 * started.
 			 */
-			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
+			if ((sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC) &&
 			    may_enter_fs)
 				wait_on_page_writeback(page);
 			else {
@@ -1317,7 +1318,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 		return false;
 
 	/* Only stall on lumpy reclaim */
-	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
 		return false;
 
 	/* If we have relaimed everything on the isolated list, no stall */
@@ -1368,7 +1369,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
 					ISOLATE_INACTIVE : ISOLATE_BOTH,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
@@ -1381,7 +1382,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
 					ISOLATE_INACTIVE : ISOLATE_BOTH,
 			zone, sc->mem_cgroup,
 			0, file);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] mm,compaction: Add COMPACTION_BUILD
  2010-11-11 19:07 [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim Mel Gorman
  2010-11-11 19:07 ` [PATCH 1/3] mm,vmscan: Convert lumpy_mode into a bitmask Mel Gorman
@ 2010-11-11 19:07 ` Mel Gorman
  2010-11-14  5:45   ` KOSAKI Motohiro
  2010-11-11 19:07 ` [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure Mel Gorman
  2010-11-14  5:31 ` [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim KOSAKI Motohiro
  3 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2010-11-11 19:07 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

To avoid #ifdef COMPACTION in a following patch, this patch adds
COMPACTION_BUILD that is similar to NUMA_BUILD in operation.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/kernel.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 450092c..c00c5d1 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -826,6 +826,13 @@ struct sysinfo {
 #define NUMA_BUILD 0
 #endif
 
+/* This helps us avoid #ifdef CONFIG_COMPACTION */
+#ifdef CONFIG_COMPACTION
+#define COMPACTION_BUILD 1
+#else
+#define COMPACTION_BUILD 0
+#endif
+
 /* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
 # define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-11 19:07 [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim Mel Gorman
  2010-11-11 19:07 ` [PATCH 1/3] mm,vmscan: Convert lumpy_mode into a bitmask Mel Gorman
  2010-11-11 19:07 ` [PATCH 2/3] mm,compaction: Add COMPACTION_BUILD Mel Gorman
@ 2010-11-11 19:07 ` Mel Gorman
  2010-11-12  9:37   ` Mel Gorman
  2010-11-14  5:59   ` KOSAKI Motohiro
  2010-11-14  5:31 ` [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim KOSAKI Motohiro
  3 siblings, 2 replies; 17+ messages in thread
From: Mel Gorman @ 2010-11-11 19:07 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

Lumpy reclaim is disruptive. It reclaims both a large number of pages
and ignores the age of the majority of pages it reclaims. This can incur
significant stalls and potentially increase the number of major faults.

Compaction has reached the point where it is considered reasonably stable
(meaning it has passed a lot of testing) and is a potential candidate for
displacing lumpy reclaim. This patch reduces the use of lumpy reclaim when
the priority is high enough to indicate low pressure. The basic operation
is very simple. Instead of selecting a contiguous range of pages to reclaim,
lumpy compaction reclaims a number of order-0 pages and then calls compaction
for the zone. If the watermarks are not met, another reclaim+compaction
cycle occurs.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/compaction.h |    9 ++++++++-
 mm/compaction.c            |    2 +-
 mm/vmscan.c                |   38 ++++++++++++++++++++++++++------------
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 5ac5155..2ae6613 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -22,7 +22,8 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask);
-
+extern unsigned long compact_zone_order(struct zone *zone,
+			int order, gfp_t gfp_mask);
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
 
@@ -59,6 +60,12 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	return COMPACT_CONTINUE;
 }
 
+static inline unsigned long compact_zone_order(struct zone *zone,
+			int order, gfp_t gfp_mask)
+{
+	return 0;
+}
+
 static inline void defer_compaction(struct zone *zone)
 {
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index 4d709ee..f987f47 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -418,7 +418,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	return ret;
 }
 
-static unsigned long compact_zone_order(struct zone *zone,
+unsigned long compact_zone_order(struct zone *zone,
 						int order, gfp_t gfp_mask)
 {
 	struct compact_control cc = {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ffa438e..da35cdb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -32,6 +32,7 @@
 #include <linux/topology.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
+#include <linux/compaction.h>
 #include <linux/notifier.h>
 #include <linux/rwsem.h>
 #include <linux/delay.h>
@@ -56,6 +57,7 @@ typedef unsigned __bitwise__ lumpy_mode;
 #define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
 #define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
 #define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
+#define LUMPY_MODE_COMPACTION		((__force lumpy_mode)0x10u)
 
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
@@ -274,25 +276,27 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
 				   bool sync)
 {
-	lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
 
 	/*
-	 * Some reclaim have alredy been failed. No worth to try synchronous
-	 * lumpy reclaim.
+	 * Initially assume we are entering either lumpy reclaim or lumpy
+	 * compaction. Depending on the order, we will either set the sync
+	 * mode or just reclaim order-0 pages later.
 	 */
-	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
-		return;
+	if (COMPACTION_BUILD)
+		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
+	else
+		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 
 	/*
 	 * If we need a large contiguous chunk of memory, or have
 	 * trouble getting a small set of contiguous pages, we
 	 * will reclaim both active and inactive pages.
 	 */
-	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
 	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode |= mode;
+		sc->lumpy_reclaim_mode |= syncmode;
 	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode |= mode;
+		sc->lumpy_reclaim_mode |= syncmode;
 	else
 		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
 }
@@ -1366,11 +1370,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 
+	/*
+	 * If we are lumpy compacting, we bump nr_to_scan to at least
+	 * the size of the page we are trying to allocate
+	 */
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
+		nr_to_scan = max(nr_to_scan, (1UL << sc->order));
+
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
-					ISOLATE_INACTIVE : ISOLATE_BOTH,
+			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
 		if (current_is_kswapd())
@@ -1382,8 +1393,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
-					ISOLATE_INACTIVE : ISOLATE_BOTH,
+			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
+					ISOLATE_BOTH : ISOLATE_INACTIVE,
 			zone, sc->mem_cgroup,
 			0, file);
 		/*
@@ -1416,6 +1427,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
 
+	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
+		compact_zone_order(zone, sc->order, sc->gfp_mask);
+
 	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-11 19:07 ` [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure Mel Gorman
@ 2010-11-12  9:37   ` Mel Gorman
  2010-11-14  5:43     ` KOSAKI Motohiro
  2010-11-14  6:02     ` KOSAKI Motohiro
  2010-11-14  5:59   ` KOSAKI Motohiro
  1 sibling, 2 replies; 17+ messages in thread
From: Mel Gorman @ 2010-11-12  9:37 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Thu, Nov 11, 2010 at 07:07:04PM +0000, Mel Gorman wrote:
> +	if (COMPACTION_BUILD)
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> +	else
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
>  

Gack, I posted the slightly wrong version. This version prevents lumpy
reclaim ever being used. The figures I posted were for a patch where
this condition looked like

        if (COMPACTION_BUILD && priority > DEF_PRIORITY - 2)
                sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
        else
                sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim
  2010-11-11 19:07 [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim Mel Gorman
                   ` (2 preceding siblings ...)
  2010-11-11 19:07 ` [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure Mel Gorman
@ 2010-11-14  5:31 ` KOSAKI Motohiro
  3 siblings, 0 replies; 17+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  5:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Andrea Arcangeli, Andrew Morton, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

> (cc'ing people currently looking at transparent hugepages as this series
> is aimed at avoiding lumpy reclaim being deleted)
> 
> Huge page allocations are not expected to be cheap but lumpy reclaim is still
> very disruptive. While it is far better than reclaiming random order-0 pages
> and hoping for the best, it still ignore the reference bit of pages near the
> reference page selected from the LRU. Memory compaction was merged in 2.6.35
> to use less lumpy reclaim by moving pages around instead of reclaiming when
> there were enough pages free. It has been tested fairly heavily at this point.
> This is a prototype series to use compaction more aggressively.
> 
> When CONFIG_COMPACTION is set, lumpy reclaim is avoided where possible. What
> it does instead is reclaim a number of order-0 pages and then compact the
> zone to try and satisfy the allocation. This keeps a larger number of active
> pages in memory at the cost of increased use of migration and compaction
> scanning. As this is a prototype, it's also very clumsy. For example,
> set_lumpy_reclaim_mode() still allows lumpy reclaim to be used and the
> decision on when to use it is primitive. Lumpy reclaim can be avoided
> entirely of course but the tests were a bit inconclusive - allocation
> latency was lower if lumpy reclaim was never used but the test completion
> times and reclaim statistics looked worse so I need to reconsider both the
> analysis and the implementation. It's also about as subtle as a brick when
> it comes to compaction doing a blind compaction of the zone after reclaiming
> which is almost certainly more frequent than it needs to be but I'm leaving
> optimisation considerations for the moment.
> 
> Ultimately, what I'd like to do is implement "lumpy compaction" where a
> number of order-0 pages are reclaimed and then the pages that would be lumpy
> reclaimed are instead migrated but it would be longer term and involve a
> tight integration of compaction and reclaim which maybe we'd like to avoid
> in the first pass. This series was to establish if just order-0 reclaims
> and compaction is potentially workable and the test results are reasonably
> promising. kernbench and sysbench were run as sniff tests even though they do
> not exercise reclaim and performance was not affected as expected. The target
> test was a high-order allocation stress test. Testing was based on kernel
> 2.6.37-rc1 with commit d88c0922 applied which fixes an important bug related
> to page reference counting. The test machine was x86-64 with 3G of RAM.

Brilliant! This is just I wanted long time.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] mm,vmscan: Convert lumpy_mode into a bitmask
  2010-11-11 19:07 ` [PATCH 1/3] mm,vmscan: Convert lumpy_mode into a bitmask Mel Gorman
@ 2010-11-14  5:40   ` KOSAKI Motohiro
  2010-11-15  9:16     ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  5:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Andrea Arcangeli, Andrew Morton, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

> Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
> syncronous or asyncronous. In preparation for using compaction instead of
> lumpy reclaim, this patch converts the flags into a bitmap.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  include/trace/events/vmscan.h |    6 +++---
>  mm/vmscan.c                   |   37 +++++++++++++++++++------------------
>  2 files changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index c255fcc..be76429 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -25,13 +25,13 @@
>  
>  #define trace_reclaim_flags(page, sync) ( \
>  	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
> -	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
> +	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
>  	)
>  
>  #define trace_shrink_flags(file, sync) ( \
> -	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
> +	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
>  			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
> -	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
> +	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
>  	)
>  
>  TRACE_EVENT(mm_vmscan_kswapd_sleep,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b8a6fdc..ffa438e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -51,11 +51,11 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/vmscan.h>
>  
> -enum lumpy_mode {
> -	LUMPY_MODE_NONE,
> -	LUMPY_MODE_ASYNC,
> -	LUMPY_MODE_SYNC,
> -};
> +typedef unsigned __bitwise__ lumpy_mode;
> +#define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
> +#define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
> +#define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
> +#define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)

Please write a comment of description of each bit meaning.


>  
>  struct scan_control {
>  	/* Incremented by the number of inactive pages that were scanned */
> @@ -88,7 +88,7 @@ struct scan_control {
>  	 * Intend to reclaim enough continuous memory rather than reclaim
>  	 * enough amount of memory. i.e, mode for high order allocation.
>  	 */
> -	enum lumpy_mode lumpy_reclaim_mode;
> +	lumpy_mode lumpy_reclaim_mode;
>  
>  	/* Which cgroup do we reclaim from */
>  	struct mem_cgroup *mem_cgroup;
> @@ -274,13 +274,13 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>  static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
>  				   bool sync)
>  {
> -	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> +	lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
>  
>  	/*
>  	 * Some reclaim have alredy been failed. No worth to try synchronous
>  	 * lumpy reclaim.
>  	 */
> -	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> +	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
>  		return;

Probaby, we can remove LUMPY_MODE_SINGLE entirely. and this line can be
change to

	if (sync && !(sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM))


btw, LUMPY_MODE_ASYNC can be removed too.


>  
>  	/*
> @@ -288,17 +288,18 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
>  	 * trouble getting a small set of contiguous pages, we
>  	 * will reclaim both active and inactive pages.
>  	 */
> +	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
>  	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> -		sc->lumpy_reclaim_mode = mode;
> +		sc->lumpy_reclaim_mode |= mode;
>  	else if (sc->order && priority < DEF_PRIORITY - 2)
> -		sc->lumpy_reclaim_mode = mode;
> +		sc->lumpy_reclaim_mode |= mode;
>  	else
> -		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
>  }
>  
>  static void disable_lumpy_reclaim_mode(struct scan_control *sc)
>  {
> -	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +	sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
>  }
>  
>  static inline int is_page_cache_freeable(struct page *page)
> @@ -429,7 +430,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  		 * first attempt to free a range of pages fails.
>  		 */
>  		if (PageWriteback(page) &&
> -		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
> +		    (sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC))
>  			wait_on_page_writeback(page);
>  
>  		if (!PageWriteback(page)) {
> @@ -615,7 +616,7 @@ static enum page_references page_check_references(struct page *page,
>  	referenced_page = TestClearPageReferenced(page);
>  
>  	/* Lumpy reclaim - ignore references */
> -	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
> +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM)
>  		return PAGEREF_RECLAIM;
>  
>  	/*
> @@ -732,7 +733,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  			 * for any page for which writeback has already
>  			 * started.
>  			 */
> -			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
> +			if ((sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC) &&
>  			    may_enter_fs)
>  				wait_on_page_writeback(page);
>  			else {
> @@ -1317,7 +1318,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
>  		return false;
>  
>  	/* Only stall on lumpy reclaim */
> -	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
>  		return false;
>  
>  	/* If we have relaimed everything on the isolated list, no stall */
> @@ -1368,7 +1369,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  	if (scanning_global_lru(sc)) {
>  		nr_taken = isolate_pages_global(nr_to_scan,
>  			&page_list, &nr_scanned, sc->order,
> -			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
> +			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
>  					ISOLATE_INACTIVE : ISOLATE_BOTH,
>  			zone, 0, file);
>  		zone->pages_scanned += nr_scanned;
> @@ -1381,7 +1382,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  	} else {
>  		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
>  			&page_list, &nr_scanned, sc->order,
> -			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
> +			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
>  					ISOLATE_INACTIVE : ISOLATE_BOTH,
>  			zone, sc->mem_cgroup,
>  			0, file);
> -- 
> 1.7.1
> 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-12  9:37   ` Mel Gorman
@ 2010-11-14  5:43     ` KOSAKI Motohiro
  2010-11-15  9:17       ` Mel Gorman
  2010-11-14  6:02     ` KOSAKI Motohiro
  1 sibling, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  5:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Andrea Arcangeli, Andrew Morton, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

> On Thu, Nov 11, 2010 at 07:07:04PM +0000, Mel Gorman wrote:
> > +	if (COMPACTION_BUILD)
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> > +	else
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> >  
> 
> Gack, I posted the slightly wrong version. This version prevents lumpy
> reclaim ever being used. The figures I posted were for a patch where
> this condition looked like
> 
>         if (COMPACTION_BUILD && priority > DEF_PRIORITY - 2)
>                 sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
>         else
>                 sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;

In all other place, heavy reclaim detection are used folliowing.

	if (priority < DEF_PRIORITY - 2)


So, "priority >= DEF_PRIORITY - 2" is more symmetric, I think. but if you have strong
reason, I don't oppse.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] mm,compaction: Add COMPACTION_BUILD
  2010-11-11 19:07 ` [PATCH 2/3] mm,compaction: Add COMPACTION_BUILD Mel Gorman
@ 2010-11-14  5:45   ` KOSAKI Motohiro
  2010-11-15  9:26     ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  5:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Andrea Arcangeli, Andrew Morton, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

> To avoid #ifdef COMPACTION in a following patch, this patch adds
> COMPACTION_BUILD that is similar to NUMA_BUILD in operation.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  include/linux/kernel.h |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 450092c..c00c5d1 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -826,6 +826,13 @@ struct sysinfo {
>  #define NUMA_BUILD 0
>  #endif
>  
> +/* This helps us avoid #ifdef CONFIG_COMPACTION */
> +#ifdef CONFIG_COMPACTION
> +#define COMPACTION_BUILD 1
> +#else
> +#define COMPACTION_BUILD 0
> +#endif
> +

Looks good, of cource. but I think this patch can be fold [3/3] beucase 
it doesn't have any change.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-11 19:07 ` [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure Mel Gorman
  2010-11-12  9:37   ` Mel Gorman
@ 2010-11-14  5:59   ` KOSAKI Motohiro
  2010-11-15  9:25     ` Mel Gorman
  1 sibling, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  5:59 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Andrea Arcangeli, Andrew Morton, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

> Lumpy reclaim is disruptive. It reclaims both a large number of pages
> and ignores the age of the majority of pages it reclaims. This can incur
> significant stalls and potentially increase the number of major faults.
> 
> Compaction has reached the point where it is considered reasonably stable
> (meaning it has passed a lot of testing) and is a potential candidate for
> displacing lumpy reclaim. This patch reduces the use of lumpy reclaim when
> the priority is high enough to indicate low pressure. The basic operation
> is very simple. Instead of selecting a contiguous range of pages to reclaim,
> lumpy compaction reclaims a number of order-0 pages and then calls compaction
> for the zone. If the watermarks are not met, another reclaim+compaction
> cycle occurs.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  include/linux/compaction.h |    9 ++++++++-
>  mm/compaction.c            |    2 +-
>  mm/vmscan.c                |   38 ++++++++++++++++++++++++++------------
>  3 files changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 5ac5155..2ae6613 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -22,7 +22,8 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
>  extern int fragmentation_index(struct zone *zone, unsigned int order);
>  extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  			int order, gfp_t gfp_mask, nodemask_t *mask);
> -
> +extern unsigned long compact_zone_order(struct zone *zone,
> +			int order, gfp_t gfp_mask);
>  /* Do not skip compaction more than 64 times */
>  #define COMPACT_MAX_DEFER_SHIFT 6
>  
> @@ -59,6 +60,12 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  	return COMPACT_CONTINUE;
>  }
>  
> +static inline unsigned long compact_zone_order(struct zone *zone,
> +			int order, gfp_t gfp_mask)
> +{
> +	return 0;
> +}
> +
>  static inline void defer_compaction(struct zone *zone)
>  {
>  }
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4d709ee..f987f47 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -418,7 +418,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  	return ret;
>  }
>  
> -static unsigned long compact_zone_order(struct zone *zone,
> +unsigned long compact_zone_order(struct zone *zone,
>  						int order, gfp_t gfp_mask)
>  {
>  	struct compact_control cc = {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ffa438e..da35cdb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -32,6 +32,7 @@
>  #include <linux/topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpuset.h>
> +#include <linux/compaction.h>
>  #include <linux/notifier.h>
>  #include <linux/rwsem.h>
>  #include <linux/delay.h>
> @@ -56,6 +57,7 @@ typedef unsigned __bitwise__ lumpy_mode;
>  #define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
>  #define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
>  #define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
> +#define LUMPY_MODE_COMPACTION		((__force lumpy_mode)0x10u)
>  
>  struct scan_control {
>  	/* Incremented by the number of inactive pages that were scanned */
> @@ -274,25 +276,27 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>  static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
>  				   bool sync)
>  {
> -	lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> +	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
>  
>  	/*
> -	 * Some reclaim have alredy been failed. No worth to try synchronous
> -	 * lumpy reclaim.
> +	 * Initially assume we are entering either lumpy reclaim or lumpy
> +	 * compaction. Depending on the order, we will either set the sync
> +	 * mode or just reclaim order-0 pages later.
>  	 */
> -	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
> -		return;
> +	if (COMPACTION_BUILD)
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> +	else
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
>  
>  	/*
>  	 * If we need a large contiguous chunk of memory, or have
>  	 * trouble getting a small set of contiguous pages, we
>  	 * will reclaim both active and inactive pages.
>  	 */
> -	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
>  	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> -		sc->lumpy_reclaim_mode |= mode;
> +		sc->lumpy_reclaim_mode |= syncmode;
>  	else if (sc->order && priority < DEF_PRIORITY - 2)
> -		sc->lumpy_reclaim_mode |= mode;
> +		sc->lumpy_reclaim_mode |= syncmode;

Does "LUMPY_MODE_COMPACTION | LUMPY_MODE_SYNC" have any benefit?
I haven't understand this semantics. please elaborate?


>  	else
>  		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
>  }
> @@ -1366,11 +1370,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  	lru_add_drain();
>  	spin_lock_irq(&zone->lru_lock);
>  
> +	/*
> +	 * If we are lumpy compacting, we bump nr_to_scan to at least
> +	 * the size of the page we are trying to allocate
> +	 */
> +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
> +		nr_to_scan = max(nr_to_scan, (1UL << sc->order));
> +
>  	if (scanning_global_lru(sc)) {
>  		nr_taken = isolate_pages_global(nr_to_scan,
>  			&page_list, &nr_scanned, sc->order,
> -			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
> -					ISOLATE_INACTIVE : ISOLATE_BOTH,
> +			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
> +					ISOLATE_BOTH : ISOLATE_INACTIVE,
>  			zone, 0, file);
>  		zone->pages_scanned += nr_scanned;
>  		if (current_is_kswapd())
> @@ -1382,8 +1393,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  	} else {
>  		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
>  			&page_list, &nr_scanned, sc->order,
> -			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
> -					ISOLATE_INACTIVE : ISOLATE_BOTH,
> +			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
> +					ISOLATE_BOTH : ISOLATE_INACTIVE,
>  			zone, sc->mem_cgroup,
>  			0, file);
>  		/*
> @@ -1416,6 +1427,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  
>  	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
>  
> +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
> +		compact_zone_order(zone, sc->order, sc->gfp_mask);
> +

If free pages are very little, compaction may not work. don't we need to
check NR_FREE_PAGES?


>  	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
>  		zone_idx(zone),
>  		nr_scanned, nr_reclaimed,
> -- 
> 1.7.1
> 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-12  9:37   ` Mel Gorman
  2010-11-14  5:43     ` KOSAKI Motohiro
@ 2010-11-14  6:02     ` KOSAKI Motohiro
  2010-11-15  9:22       ` Mel Gorman
  1 sibling, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2010-11-14  6:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Andrea Arcangeli, Andrew Morton, Rik van Riel,
	Johannes Weiner, linux-mm, linux-kernel

> On Thu, Nov 11, 2010 at 07:07:04PM +0000, Mel Gorman wrote:
> > +	if (COMPACTION_BUILD)
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> > +	else
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> >  
> 
> Gack, I posted the slightly wrong version. This version prevents lumpy
> reclaim ever being used. The figures I posted were for a patch where
> this condition looked like
> 
>         if (COMPACTION_BUILD && priority > DEF_PRIORITY - 2)
>                 sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
>         else
>                 sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;

Can you please tell us your opinition which is better 1) automatically turn lumby on
by priority (this approach) 2) introduce GFP_LUMPY (andrea proposed). I'm not
sure which is better, then I'd like to hear both pros/cons concern.

Thanks.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] mm,vmscan: Convert lumpy_mode into a bitmask
  2010-11-14  5:40   ` KOSAKI Motohiro
@ 2010-11-15  9:16     ` Mel Gorman
  0 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2010-11-15  9:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrea Arcangeli, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Sun, Nov 14, 2010 at 02:40:21PM +0900, KOSAKI Motohiro wrote:
> > Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
> > syncronous or asyncronous. In preparation for using compaction instead of
> > lumpy reclaim, this patch converts the flags into a bitmap.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  include/trace/events/vmscan.h |    6 +++---
> >  mm/vmscan.c                   |   37 +++++++++++++++++++------------------
> >  2 files changed, 22 insertions(+), 21 deletions(-)
> > 
> > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> > index c255fcc..be76429 100644
> > --- a/include/trace/events/vmscan.h
> > +++ b/include/trace/events/vmscan.h
> > @@ -25,13 +25,13 @@
> >  
> >  #define trace_reclaim_flags(page, sync) ( \
> >  	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
> > -	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
> > +	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
> >  	)
> >  
> >  #define trace_shrink_flags(file, sync) ( \
> > -	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
> > +	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_MIXED : \
> >  			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
> > -	(sync == LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
> > +	(sync & LUMPY_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
> >  	)
> >  
> >  TRACE_EVENT(mm_vmscan_kswapd_sleep,
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index b8a6fdc..ffa438e 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -51,11 +51,11 @@
> >  #define CREATE_TRACE_POINTS
> >  #include <trace/events/vmscan.h>
> >  
> > -enum lumpy_mode {
> > -	LUMPY_MODE_NONE,
> > -	LUMPY_MODE_ASYNC,
> > -	LUMPY_MODE_SYNC,
> > -};
> > +typedef unsigned __bitwise__ lumpy_mode;
> > +#define LUMPY_MODE_SINGLE		((__force lumpy_mode)0x01u)
> > +#define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
> > +#define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
> > +#define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
> 
> Please write a comment of description of each bit meaning.
> 

Will do.

> 
> >  
> >  struct scan_control {
> >  	/* Incremented by the number of inactive pages that were scanned */
> > @@ -88,7 +88,7 @@ struct scan_control {
> >  	 * Intend to reclaim enough continuous memory rather than reclaim
> >  	 * enough amount of memory. i.e, mode for high order allocation.
> >  	 */
> > -	enum lumpy_mode lumpy_reclaim_mode;
> > +	lumpy_mode lumpy_reclaim_mode;
> >  
> >  	/* Which cgroup do we reclaim from */
> >  	struct mem_cgroup *mem_cgroup;
> > @@ -274,13 +274,13 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
> >  static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> >  				   bool sync)
> >  {
> > -	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> > +	lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> >  
> >  	/*
> >  	 * Some reclaim have alredy been failed. No worth to try synchronous
> >  	 * lumpy reclaim.
> >  	 */
> > -	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> > +	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
> >  		return;
> 
> Probaby, we can remove LUMPY_MODE_SINGLE entirely. and this line can be
> change to
> 
> 	if (sync && !(sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM))
> 

I had this initially but I found myself getting confused during development
because I had to recall each time "if it's not contig reclaim, what is it?" -
It could be either compaction or single. I decided to spell it out
because it was easier to understand but I can switch back if necessary.

> btw, LUMPY_MODE_ASYNC can be removed too.
> 

Similar reasoning - even though I'm not doing anything with the information,
I found it easier to understand if it was spelled out.

> >  	/*
> > @@ -288,17 +288,18 @@ static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> >  	 * trouble getting a small set of contiguous pages, we
> >  	 * will reclaim both active and inactive pages.
> >  	 */
> > +	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> >  	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> > -		sc->lumpy_reclaim_mode = mode;
> > +		sc->lumpy_reclaim_mode |= mode;
> >  	else if (sc->order && priority < DEF_PRIORITY - 2)
> > -		sc->lumpy_reclaim_mode = mode;
> > +		sc->lumpy_reclaim_mode |= mode;
> >  	else
> > -		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
> >  }
> >  
> >  static void disable_lumpy_reclaim_mode(struct scan_control *sc)
> >  {
> > -	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> > +	sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
> >  }
> >  
> >  static inline int is_page_cache_freeable(struct page *page)
> > @@ -429,7 +430,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
> >  		 * first attempt to free a range of pages fails.
> >  		 */
> >  		if (PageWriteback(page) &&
> > -		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
> > +		    (sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC))
> >  			wait_on_page_writeback(page);
> >  
> >  		if (!PageWriteback(page)) {
> > @@ -615,7 +616,7 @@ static enum page_references page_check_references(struct page *page,
> >  	referenced_page = TestClearPageReferenced(page);
> >  
> >  	/* Lumpy reclaim - ignore references */
> > -	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
> > +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM)
> >  		return PAGEREF_RECLAIM;
> >  
> >  	/*
> > @@ -732,7 +733,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  			 * for any page for which writeback has already
> >  			 * started.
> >  			 */
> > -			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
> > +			if ((sc->lumpy_reclaim_mode & LUMPY_MODE_SYNC) &&
> >  			    may_enter_fs)
> >  				wait_on_page_writeback(page);
> >  			else {
> > @@ -1317,7 +1318,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
> >  		return false;
> >  
> >  	/* Only stall on lumpy reclaim */
> > -	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> > +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
> >  		return false;
> >  
> >  	/* If we have relaimed everything on the isolated list, no stall */
> > @@ -1368,7 +1369,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> >  	if (scanning_global_lru(sc)) {
> >  		nr_taken = isolate_pages_global(nr_to_scan,
> >  			&page_list, &nr_scanned, sc->order,
> > -			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
> > +			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
> >  					ISOLATE_INACTIVE : ISOLATE_BOTH,
> >  			zone, 0, file);
> >  		zone->pages_scanned += nr_scanned;
> > @@ -1381,7 +1382,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> >  	} else {
> >  		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
> >  			&page_list, &nr_scanned, sc->order,
> > -			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
> > +			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
> >  					ISOLATE_INACTIVE : ISOLATE_BOTH,
> >  			zone, sc->mem_cgroup,
> >  			0, file);
> > -- 
> > 1.7.1
> > 
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-14  5:43     ` KOSAKI Motohiro
@ 2010-11-15  9:17       ` Mel Gorman
  0 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2010-11-15  9:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrea Arcangeli, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Sun, Nov 14, 2010 at 02:43:12PM +0900, KOSAKI Motohiro wrote:
> > On Thu, Nov 11, 2010 at 07:07:04PM +0000, Mel Gorman wrote:
> > > +	if (COMPACTION_BUILD)
> > > +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> > > +	else
> > > +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> > >  
> > 
> > Gack, I posted the slightly wrong version. This version prevents lumpy
> > reclaim ever being used. The figures I posted were for a patch where
> > this condition looked like
> > 
> >         if (COMPACTION_BUILD && priority > DEF_PRIORITY - 2)
> >                 sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> >         else
> >                 sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> 
> In all other place, heavy reclaim detection are used folliowing.
> 
> 	if (priority < DEF_PRIORITY - 2)
> 
> 
> So, "priority >= DEF_PRIORITY - 2" is more symmetric, I think. but if you have strong
> reason, I don't oppse.
> 

I had no strong reason other than "I don't want lumpy reclaim to be used
easily". I will match the other places. Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-14  6:02     ` KOSAKI Motohiro
@ 2010-11-15  9:22       ` Mel Gorman
  2010-11-15 15:23         ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2010-11-15  9:22 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrea Arcangeli, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Sun, Nov 14, 2010 at 03:02:03PM +0900, KOSAKI Motohiro wrote:
> > On Thu, Nov 11, 2010 at 07:07:04PM +0000, Mel Gorman wrote:
> > > +	if (COMPACTION_BUILD)
> > > +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> > > +	else
> > > +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> > >  
> > 
> > Gack, I posted the slightly wrong version. This version prevents lumpy
> > reclaim ever being used. The figures I posted were for a patch where
> > this condition looked like
> > 
> >         if (COMPACTION_BUILD && priority > DEF_PRIORITY - 2)
> >                 sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> >         else
> >                 sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> 
> Can you please tell us your opinition which is better 1) automatically turn lumby on
> by priority (this approach) 2) introduce GFP_LUMPY (andrea proposed). I'm not
> sure which is better, then I'd like to hear both pros/cons concern.
> 

That's a very good question!

The main "pro" of using lumpy reclaim is that it has been tested. It's known
to be very heavy and disrupt the system but it's also known to work. Lumpy
reclaim is also less suspectible to allocation races than compaction is
i.e. if memory is low, compaction requires that X number of pages be free
where as lumpy frees the pages it requires.

GFP_LUMPY is something else and is only partially related. Transparent Huge
Pages (THP) does not want to hit lumpy reclaim no matter what the circumstances
are - It is always better for THP to not use lumpy reclaim. It's debatable
whether it should even reclaim order-0 pages for compaction so even with
this series, THP might still introduce GFP_LUMPY.

Does this answer your question?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-14  5:59   ` KOSAKI Motohiro
@ 2010-11-15  9:25     ` Mel Gorman
  0 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2010-11-15  9:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrea Arcangeli, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Sun, Nov 14, 2010 at 02:59:31PM +0900, KOSAKI Motohiro wrote:
> > Lumpy reclaim is disruptive. It reclaims both a large number of pages
> > and ignores the age of the majority of pages it reclaims. This can incur
> > significant stalls and potentially increase the number of major faults.
> > 
> > Compaction has reached the point where it is considered reasonably stable
> > (meaning it has passed a lot of testing) and is a potential candidate for
> > displacing lumpy reclaim. This patch reduces the use of lumpy reclaim when
> > the priority is high enough to indicate low pressure. The basic operation
> > is very simple. Instead of selecting a contiguous range of pages to reclaim,
> > lumpy compaction reclaims a number of order-0 pages and then calls compaction
> > for the zone. If the watermarks are not met, another reclaim+compaction
> > cycle occurs.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  include/linux/compaction.h |    9 ++++++++-
> >  mm/compaction.c            |    2 +-
> >  mm/vmscan.c                |   38 ++++++++++++++++++++++++++------------
> >  3 files changed, 35 insertions(+), 14 deletions(-)
> > 
> > diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> > index 5ac5155..2ae6613 100644
> > --- a/include/linux/compaction.h
> > +++ b/include/linux/compaction.h
> > @@ -22,7 +22,8 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
> >  extern int fragmentation_index(struct zone *zone, unsigned int order);
> >  extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
> >  			int order, gfp_t gfp_mask, nodemask_t *mask);
> > -
> > +extern unsigned long compact_zone_order(struct zone *zone,
> > +			int order, gfp_t gfp_mask);
> >  /* Do not skip compaction more than 64 times */
> >  #define COMPACT_MAX_DEFER_SHIFT 6
> >  
> > @@ -59,6 +60,12 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
> >  	return COMPACT_CONTINUE;
> >  }
> >  
> > +static inline unsigned long compact_zone_order(struct zone *zone,
> > +			int order, gfp_t gfp_mask)
> > +{
> > +	return 0;
> > +}
> > +
> >  static inline void defer_compaction(struct zone *zone)
> >  {
> >  }
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 4d709ee..f987f47 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -418,7 +418,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> >  	return ret;
> >  }
> >  
> > -static unsigned long compact_zone_order(struct zone *zone,
> > +unsigned long compact_zone_order(struct zone *zone,
> >  						int order, gfp_t gfp_mask)
> >  {
> >  	struct compact_control cc = {
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index ffa438e..da35cdb 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/topology.h>
> >  #include <linux/cpu.h>
> >  #include <linux/cpuset.h>
> > +#include <linux/compaction.h>
> >  #include <linux/notifier.h>
> >  #include <linux/rwsem.h>
> >  #include <linux/delay.h>
> > @@ -56,6 +57,7 @@ typedef unsigned __bitwise__ lumpy_mode;
> >  #define LUMPY_MODE_ASYNC		((__force lumpy_mode)0x02u)
> >  #define LUMPY_MODE_SYNC			((__force lumpy_mode)0x04u)
> >  #define LUMPY_MODE_CONTIGRECLAIM	((__force lumpy_mode)0x08u)
> > +#define LUMPY_MODE_COMPACTION		((__force lumpy_mode)0x10u)
> >  
> >  struct scan_control {
> >  	/* Incremented by the number of inactive pages that were scanned */
> > @@ -274,25 +276,27 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
> >  static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> >  				   bool sync)
> >  {
> > -	lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> > +	lumpy_mode syncmode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> >  
> >  	/*
> > -	 * Some reclaim have alredy been failed. No worth to try synchronous
> > -	 * lumpy reclaim.
> > +	 * Initially assume we are entering either lumpy reclaim or lumpy
> > +	 * compaction. Depending on the order, we will either set the sync
> > +	 * mode or just reclaim order-0 pages later.
> >  	 */
> > -	if (sync && sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE)
> > -		return;
> > +	if (COMPACTION_BUILD)
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_COMPACTION;
> > +	else
> > +		sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> >  
> >  	/*
> >  	 * If we need a large contiguous chunk of memory, or have
> >  	 * trouble getting a small set of contiguous pages, we
> >  	 * will reclaim both active and inactive pages.
> >  	 */
> > -	sc->lumpy_reclaim_mode = LUMPY_MODE_CONTIGRECLAIM;
> >  	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> > -		sc->lumpy_reclaim_mode |= mode;
> > +		sc->lumpy_reclaim_mode |= syncmode;
> >  	else if (sc->order && priority < DEF_PRIORITY - 2)
> > -		sc->lumpy_reclaim_mode |= mode;
> > +		sc->lumpy_reclaim_mode |= syncmode;
> 
> Does "LUMPY_MODE_COMPACTION | LUMPY_MODE_SYNC" have any benefit?
> I haven't understand this semantics. please elaborate?
> 

At the moment, it doesn't have any benefit. In the future, we might pass
the flags down to migration which currently always behaves in a sync fashion.
For now, I think it's better to flag what we expect the behaviour to be
even if it's not responded to appropriately.

> 
> >  	else
> >  		sc->lumpy_reclaim_mode = LUMPY_MODE_SINGLE | LUMPY_MODE_ASYNC;
> >  }
> > @@ -1366,11 +1370,18 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> >  	lru_add_drain();
> >  	spin_lock_irq(&zone->lru_lock);
> >  
> > +	/*
> > +	 * If we are lumpy compacting, we bump nr_to_scan to at least
> > +	 * the size of the page we are trying to allocate
> > +	 */
> > +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
> > +		nr_to_scan = max(nr_to_scan, (1UL << sc->order));
> > +
> >  	if (scanning_global_lru(sc)) {
> >  		nr_taken = isolate_pages_global(nr_to_scan,
> >  			&page_list, &nr_scanned, sc->order,
> > -			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
> > -					ISOLATE_INACTIVE : ISOLATE_BOTH,
> > +			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
> > +					ISOLATE_BOTH : ISOLATE_INACTIVE,
> >  			zone, 0, file);
> >  		zone->pages_scanned += nr_scanned;
> >  		if (current_is_kswapd())
> > @@ -1382,8 +1393,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> >  	} else {
> >  		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
> >  			&page_list, &nr_scanned, sc->order,
> > -			sc->lumpy_reclaim_mode & LUMPY_MODE_SINGLE ?
> > -					ISOLATE_INACTIVE : ISOLATE_BOTH,
> > +			sc->lumpy_reclaim_mode & LUMPY_MODE_CONTIGRECLAIM ?
> > +					ISOLATE_BOTH : ISOLATE_INACTIVE,
> >  			zone, sc->mem_cgroup,
> >  			0, file);
> >  		/*
> > @@ -1416,6 +1427,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> >  
> >  	putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
> >  
> > +	if (sc->lumpy_reclaim_mode & LUMPY_MODE_COMPACTION)
> > +		compact_zone_order(zone, sc->order, sc->gfp_mask);
> > +
> 
> If free pages are very little, compaction may not work. don't we need to
> check NR_FREE_PAGES?
> 

Yes, it's on my TODO list to split out the logic used in
try_to_compact_pages to decide if compact_zone_order() should be called
or not.

Well spotted!

> 
> >  	trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
> >  		zone_idx(zone),
> >  		nr_scanned, nr_reclaimed,
> > -- 
> > 1.7.1
> > 
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] mm,compaction: Add COMPACTION_BUILD
  2010-11-14  5:45   ` KOSAKI Motohiro
@ 2010-11-15  9:26     ` Mel Gorman
  0 siblings, 0 replies; 17+ messages in thread
From: Mel Gorman @ 2010-11-15  9:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrea Arcangeli, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Sun, Nov 14, 2010 at 02:45:07PM +0900, KOSAKI Motohiro wrote:
> > To avoid #ifdef COMPACTION in a following patch, this patch adds
> > COMPACTION_BUILD that is similar to NUMA_BUILD in operation.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  include/linux/kernel.h |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 450092c..c00c5d1 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -826,6 +826,13 @@ struct sysinfo {
> >  #define NUMA_BUILD 0
> >  #endif
> >  
> > +/* This helps us avoid #ifdef CONFIG_COMPACTION */
> > +#ifdef CONFIG_COMPACTION
> > +#define COMPACTION_BUILD 1
> > +#else
> > +#define COMPACTION_BUILD 0
> > +#endif
> > +
> 
> Looks good, of cource. but I think this patch can be fold [3/3] beucase 
> it doesn't have any change.
> 

Ok, I can do that.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure
  2010-11-15  9:22       ` Mel Gorman
@ 2010-11-15 15:23         ` Andrea Arcangeli
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2010-11-15 15:23 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, Johannes Weiner,
	linux-mm, linux-kernel

On Mon, Nov 15, 2010 at 09:22:56AM +0000, Mel Gorman wrote:
> GFP_LUMPY is something else and is only partially related. Transparent Huge
> Pages (THP) does not want to hit lumpy reclaim no matter what the circumstances
> are - It is always better for THP to not use lumpy reclaim. It's debatable

Agreed.

> whether it should even reclaim order-0 pages for compaction so even with
> this series, THP might still introduce GFP_LUMPY.

reclaim of some order 0 page shouldn't do any significant harm as long
as the young bits are not ignored and it's just going "normal" and not
aggressive like lumpy.

Also we it's ok to do some reclaim as we can free some slab that can't
be compacted in case there's excessive amount of slab caches to be
shrunk to have a chance to convert unmovable pageblocks to movable
ones. And we need at least 2M fully available as migration
destination (but I guess that is always available :).

In general interleaving compaction with regular-reclaim (no lumpy)
before failing allocation sounds ok to me.

I guess these days compaction would tend to succeed before lumpy ever
gets invoked so the trouble with lumpy would then only trigger when
compaction starts failing and we enter reclaim to create more movable
pageblocks, but I don't want to risk bad behavior when the amount of
anoymous memory goes very high and not all anonymous memory can be
backed fully by hugepages.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-11-15 15:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-11 19:07 [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim Mel Gorman
2010-11-11 19:07 ` [PATCH 1/3] mm,vmscan: Convert lumpy_mode into a bitmask Mel Gorman
2010-11-14  5:40   ` KOSAKI Motohiro
2010-11-15  9:16     ` Mel Gorman
2010-11-11 19:07 ` [PATCH 2/3] mm,compaction: Add COMPACTION_BUILD Mel Gorman
2010-11-14  5:45   ` KOSAKI Motohiro
2010-11-15  9:26     ` Mel Gorman
2010-11-11 19:07 ` [PATCH 3/3] mm,vmscan: Reclaim order-0 and compact instead of lumpy reclaim when under light pressure Mel Gorman
2010-11-12  9:37   ` Mel Gorman
2010-11-14  5:43     ` KOSAKI Motohiro
2010-11-15  9:17       ` Mel Gorman
2010-11-14  6:02     ` KOSAKI Motohiro
2010-11-15  9:22       ` Mel Gorman
2010-11-15 15:23         ` Andrea Arcangeli
2010-11-14  5:59   ` KOSAKI Motohiro
2010-11-15  9:25     ` Mel Gorman
2010-11-14  5:31 ` [RFC PATCH 0/3] Use compaction to reduce a dependency on lumpy reclaim KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).