linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Removal of lumpy reclaim
@ 2012-03-28 16:06 Mel Gorman
  2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Mel Gorman @ 2012-03-28 16:06 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins,
	Mel Gorman

(cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim
in shrink_active_list()")

In the interest of keeping my fingers from the flames at LSF/MM, I'm
releasing an RFC for lumpy reclaim removal. The first patch removes removes
lumpy reclaim itself and the second removes reclaim_mode_t. They can be
merged together but the resulting patch is harder to review.

The patches are based on commit e22057c8599373e5caef0bc42bdb95d2a361ab0d
which is after Andrew's tree was merged but before 3.4-rc1 is released.

Roughly 1K of text is removed, over 200 lines of code and struct scan_control
is smaller.

   text	   data	    bss	    dec	    hex	filename
6723455	1931304	2260992	10915751	 a68fa7	vmlinux-3.3.0-git
6722303	1931304	2260992	10914599	 a68b27	vmlinux-3.3.0-lumpyremove-v1

There are behaviour changes caused by the series with details in the
patches themselves. I ran some preliminary tests but coverage is shaky
due to time constraints. The kernels tested were

3.2.0		  Vanilla 3.2.0 kernel
3.3.0-git	  Commit e22057c which will be part of 3.4-rc1
3.3.0-lumpyremove These two patches

fs-mark running in a threaded configuration showed nothing useful

postmark had interesting results. I know postmark is not very useful
as a mail server benchmark but it pushes page reclaim in a manner that
is useful from a testing perspective. Regressions in page reclaim can
result in regressions in postmark when the WSS for postmark is larger than
physical memory.

POSTMARK
                                        3.2.0-vanilla     3.3.0-git      lumpyremove-v1r3
Transactions per second:               16.00 ( 0.00%)    19.00 (18.75%)    19.00 (18.75%)
Data megabytes read per second:        18.62 ( 0.00%)    23.18 (24.49%)    22.56 (21.16%)
Data megabytes written per second:     35.49 ( 0.00%)    44.18 (24.49%)    42.99 (21.13%)
Files created alone per second:        26.00 ( 0.00%)    35.00 (34.62%)    34.00 (30.77%)
Files create/transact per second:       8.00 ( 0.00%)     9.00 (12.50%)     9.00 (12.50%)
Files deleted alone per second:       680.00 ( 0.00%)  6124.00 (800.59%)  2041.00 (200.15%)
Files delete/transact per second:       8.00 ( 0.00%)     9.00 (12.50%)     9.00 (12.50%)

MMTests Statistics: duration
Sys Time Running Test (seconds)             119.61    111.16    111.40
User+Sys Time Running Test (seconds)        153.19    144.13    143.29
Total Elapsed Time (seconds)               1171.34    940.97    966.97

MMTests Statistics: vmstat
Page Ins                                    13797412    13734736    13731792
Page Outs                                   43284036    42959856    42744668
Swap Ins                                        7751           0           0
Swap Outs                                       9617           0           0
Direct pages scanned                          334395           0           0
Kswapd pages scanned                         9664358     9933599     9929577
Kswapd pages reclaimed                       9621893     9932913     9928893
Direct pages reclaimed                        334395           0           0
Kswapd efficiency                                99%         99%         99%
Kswapd velocity                             8250.686   10556.765   10268.754
Direct efficiency                               100%        100%        100%
Direct velocity                              285.481       0.000       0.000
Percentage direct scans                           3%          0%          0%
Page writes by reclaim                          9619           0           0
Page writes file                                   2           0           0
Page writes anon                                9617           0           0
Page reclaim immediate                             7           0           0
Page rescued immediate                             0           0           0
Slabs scanned                                  38912       38912       38912
Direct inode steals                                0           0           0
Kswapd inode steals                           154304      160972      158444
Kswapd skipped wait                                0           0           0
THP fault alloc                                    4           4           4
THP collapse alloc                                 0           0           0
THP splits                                         3           0           0
THP fault fallback                                 0           0           0
THP collapse fail                                  0           0           0
Compaction stalls                                  1           0           0
Compaction success                                 1           0           0
Compaction failures                                0           0           0
Compaction pages moved                             0           0           0
Compaction move failure                            0           0           0

It looks like 3.3.0-git is better in general although that "Files deleted
alone per second" looks like an anomaly. Removing lumpy reclaim fully
affects things a bit but not enough to be of concern as monitoring was
running at the same time which disrupts results. Dirty pages were not
being encountered at the end of the LRU so the behaviour change related
to THP allocations stalling on dirty pages would not be triggered.

Note that swap in/out, direct reclaim and page writes from reclaim dropped
to 0 between 3.2.0 and 3.3.0-git. According to a range of results I have
for mainline kernels between 2.6.32 and 3.3.0 on a different machine, this
swap in/out and direct reclaim problem was introduced after 3.0 and fixed
by 3.3.0 with 3.1.x and 3.2.x both showing swap in/out, direct reclaim
and page writes from reclaim. If I had to guess, it was fixed by commits
e0887c19, fe4b1b24 and 0cee34fd but I did not double check[1].

Removing direct reclaim does not make an obvious difference but note that
THP was barely used at all in this benchmark. Benchmarks that stress both
page reclaim and THP at the same time in a meaningful manner are thin on
the ground.

A benchmark that DD writes a large file also showed nothing interesting
but I was not really expecting it to. The test looks for problems related to
a large linear writer and removing lumpy reclaim was unlikely to affect it.

I ran a benchmark that stressed high-order allocation. This is very
artifical load but was used in the past to evaluate lumpy reclaim and
compaction. Generally I look at allocation success rates and latency figures.

STRESS-HIGHALLOC
                 3.2.0-vanilla     3.3.0-git        lumpyremove-v1r3
Pass 1          82.00 ( 0.00%)    27.00 (-55.00%)    32.00 (-50.00%)
Pass 2          70.00 ( 0.00%)    37.00 (-33.00%)    40.00 (-30.00%)
while Rested    90.00 ( 0.00%)    88.00 (-2.00%)    88.00 (-2.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds)             735.12    688.13    683.91
User+Sys Time Running Test (seconds)       2764.46   3278.45   3271.41
Total Elapsed Time (seconds)               1204.41   1140.29   1137.58

MMTests Statistics: vmstat
Page Ins                                     5426648     2840348     2695120
Page Outs                                    7206376     7854516     7860408
Swap Ins                                       36799           0           0
Swap Outs                                      76903           4           0
Direct pages scanned                           31981       43749      160647
Kswapd pages scanned                        26658682     1285341     1195956
Kswapd pages reclaimed                       2248583     1271621     1178420
Direct pages reclaimed                          6397       14416       94093
Kswapd efficiency                                 8%         98%         98%
Kswapd velocity                            22134.225    1127.205    1051.316
Direct efficiency                                20%         32%         58%
Direct velocity                               26.553      38.367     141.218
Percentage direct scans                           0%          3%         11%
Page writes by reclaim                       6530481           4           0
Page writes file                             6453578           0           0
Page writes anon                               76903           4           0
Page reclaim immediate                        256742       17832       61576
Page rescued immediate                             0           0           0
Slabs scanned                                1073152      971776      975872
Direct inode steals                                0      196279      205178
Kswapd inode steals                           139260       70390       64323
Kswapd skipped wait                            21711           1           0
THP fault alloc                                    1         126         143
THP collapse alloc                               324         294         224
THP splits                                        32           8          10
THP fault fallback                                 0           0           0
THP collapse fail                                  5           6           7
Compaction stalls                                364        1312        1324
Compaction success                               255         343         366
Compaction failures                              109         969         958
Compaction pages moved                        265107     3952630     4489215
Compaction move failure                         7493       26038       24739

Success rates are completely hosed for 3.4-rc1 which is almost certainly
due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
expected this would happen for kswapd and impair allocation success rates
(https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
a difference: 95% less scanning, 43% less reclaim by kswapd

In comparison, reclaim/compaction is not aggressive and gives up easily
which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
much more aggressive about reclaim/compaction than THP allocations are. The
stress test above is allocating like neither THP or hugetlbfs but is much
closer to THP.

Mainline is now impared in terms of high order allocation under heavy load
although I do not know to what degree as I did not test with __GFP_REPEAT.
Still, keep it in mind for bugs related to hugepage pool resizing, THP
allocation and high order atomic allocation failures from network devices.

Despite this, I think we should merge the patches in this series. The
stress tests were very useful when the main user was hugetlb pool resizing
and when rattling out bugs in memory compaction but are now too unrealistic
to draw solid conclusions from. They need to be replaced but that should
not delay the lumpy reclaim removal.

I'd appreciate it people took a look at the patches and see if there was
anything I missed.

[1] Where are these results you say? They are generated using MM Tests to
    see what negative trends could be identified. They are still in the
    process of running. I've had limited time to dig through the data.

 include/trace/events/vmscan.h |   40 ++-----
 mm/vmscan.c                   |  263 ++++-------------------------------------
 2 files changed, 37 insertions(+), 266 deletions(-)

-- 
1.7.9.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm: vmscan: Remove lumpy reclaim
  2012-03-28 16:06 [RFC PATCH 0/2] Removal of lumpy reclaim Mel Gorman
@ 2012-03-28 16:06 ` Mel Gorman
  2012-04-06 23:52   ` Ying Han
  2012-03-28 16:06 ` [PATCH 2/2] mm: vmscan: Remove reclaim_mode_t Mel Gorman
  2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton
  2 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2012-03-28 16:06 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins,
	Mel Gorman

Lumpy reclaim had a purpose but in the mind of some, it was to kick
the system so hard it trashed. For others the purpose was to complicate
vmscan.c. Over time it was giving softer shoes and a nicer attitude but
memory compaction needs to step up and replace it so this patch sends
lumpy reclaim to the farm.

Here are the important notes related to the patch.

1. The tracepoint format changes for isolating LRU pages.

2. This patch stops reclaim/compaction entering sync reclaim as this
   was only intended for lumpy reclaim and an oversight. Page migration
   has its own logic for stalling on writeback pages if necessary and
   memory compaction is already using it. This is a behaviour change.

3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall
   on PageWriteback with CONFIG_COMPACTION has been this way for a while.
   I am calling it out in case this is a surpise to people. This behaviour
   avoids a situation where we wait on a page being written back to
   slow storage like USB. Currently we depend on wait_iff_congested()
   for throttling if if too many dirty pages are scanned.

4. Reclaim/compaction can no longer queue dirty pages in pageout()
   if the underlying BDI is congested. Lumpy reclaim used this logic and
   reclaim/compaction was using it in error. This is a behaviour change.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/vmscan.h |   36 ++-----
 mm/vmscan.c                   |  209 +++--------------------------------------
 2 files changed, 22 insertions(+), 223 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index f64560e..6f60b33 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -13,7 +13,7 @@
 #define RECLAIM_WB_ANON		0x0001u
 #define RECLAIM_WB_FILE		0x0002u
 #define RECLAIM_WB_MIXED	0x0010u
-#define RECLAIM_WB_SYNC		0x0004u
+#define RECLAIM_WB_SYNC		0x0004u	/* Unused, all reclaim async */
 #define RECLAIM_WB_ASYNC	0x0008u
 
 #define show_reclaim_flags(flags)				\
@@ -27,13 +27,13 @@
 
 #define trace_reclaim_flags(page, sync) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
+	(RECLAIM_WB_ASYNC)   \
 	)
 
 #define trace_shrink_flags(file, sync) ( \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \
-			(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
-	(sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
+	( \
+		(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) |  \
+		(RECLAIM_WB_ASYNC) \
 	)
 
 TRACE_EVENT(mm_vmscan_kswapd_sleep,
@@ -263,22 +263,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file),
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file),
 
 	TP_STRUCT__entry(
 		__field(int, order)
 		__field(unsigned long, nr_requested)
 		__field(unsigned long, nr_scanned)
 		__field(unsigned long, nr_taken)
-		__field(unsigned long, nr_lumpy_taken)
-		__field(unsigned long, nr_lumpy_dirty)
-		__field(unsigned long, nr_lumpy_failed)
 		__field(isolate_mode_t, isolate_mode)
 		__field(int, file)
 	),
@@ -288,22 +282,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
 		__entry->nr_requested = nr_requested;
 		__entry->nr_scanned = nr_scanned;
 		__entry->nr_taken = nr_taken;
-		__entry->nr_lumpy_taken = nr_lumpy_taken;
-		__entry->nr_lumpy_dirty = nr_lumpy_dirty;
-		__entry->nr_lumpy_failed = nr_lumpy_failed;
 		__entry->isolate_mode = isolate_mode;
 		__entry->file = file;
 	),
 
-	TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu contig_taken=%lu contig_dirty=%lu contig_failed=%lu file=%d",
+	TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu file=%d",
 		__entry->isolate_mode,
 		__entry->order,
 		__entry->nr_requested,
 		__entry->nr_scanned,
 		__entry->nr_taken,
-		__entry->nr_lumpy_taken,
-		__entry->nr_lumpy_dirty,
-		__entry->nr_lumpy_failed,
 		__entry->file)
 );
 
@@ -313,13 +301,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file)
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
 
 );
 
@@ -329,13 +314,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
 		unsigned long nr_requested,
 		unsigned long nr_scanned,
 		unsigned long nr_taken,
-		unsigned long nr_lumpy_taken,
-		unsigned long nr_lumpy_dirty,
-		unsigned long nr_lumpy_failed,
 		isolate_mode_t isolate_mode,
 		int file),
 
-	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file)
+	TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
 
 );
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33c332b..68319e4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -56,19 +56,11 @@
 /*
  * reclaim_mode determines how the inactive list is shrunk
  * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
- * RECLAIM_MODE_ASYNC:  Do not block
- * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
- * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, take a reference
- *			page from the LRU and reclaim all pages within a
- *			naturally aligned range
  * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
  *			order-0 pages and then compact the zone
  */
 typedef unsigned __bitwise__ reclaim_mode_t;
 #define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
-#define RECLAIM_MODE_ASYNC		((__force reclaim_mode_t)0x02u)
-#define RECLAIM_MODE_SYNC		((__force reclaim_mode_t)0x04u)
-#define RECLAIM_MODE_LUMPYRECLAIM	((__force reclaim_mode_t)0x08u)
 #define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
 
 struct scan_control {
@@ -364,37 +356,23 @@ out:
 	return ret;
 }
 
-static void set_reclaim_mode(int priority, struct scan_control *sc,
-				   bool sync)
+static void set_reclaim_mode(int priority, struct scan_control *sc)
 {
-	reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
-
 	/*
-	 * Initially assume we are entering either lumpy reclaim or
-	 * reclaim/compaction.Depending on the order, we will either set the
-	 * sync mode or just reclaim order-0 pages later.
-	 */
-	if (COMPACTION_BUILD)
-		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
-	else
-		sc->reclaim_mode = RECLAIM_MODE_LUMPYRECLAIM;
-
-	/*
-	 * Avoid using lumpy reclaim or reclaim/compaction if possible by
-	 * restricting when its set to either costly allocations or when
+	 * Restrict reclaim/compaction to costly allocations or when
 	 * under memory pressure
 	 */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->reclaim_mode |= syncmode;
-	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->reclaim_mode |= syncmode;
+	if (COMPACTION_BUILD && sc->order &&
+			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
+			 priority < DEF_PRIORITY - 2))
+		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
 	else
-		sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
+		sc->reclaim_mode = RECLAIM_MODE_SINGLE;
 }
 
 static void reset_reclaim_mode(struct scan_control *sc)
 {
-	sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
+	sc->reclaim_mode = RECLAIM_MODE_SINGLE;
 }
 
 static inline int is_page_cache_freeable(struct page *page)
@@ -416,10 +394,6 @@ static int may_write_to_queue(struct backing_dev_info *bdi,
 		return 1;
 	if (bdi == current->backing_dev_info)
 		return 1;
-
-	/* lumpy reclaim for hugepage often need a lot of write */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		return 1;
 	return 0;
 }
 
@@ -710,10 +684,6 @@ static enum page_references page_check_references(struct page *page,
 	referenced_ptes = page_referenced(page, 1, mz->mem_cgroup, &vm_flags);
 	referenced_page = TestClearPageReferenced(page);
 
-	/* Lumpy reclaim - ignore references */
-	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
-		return PAGEREF_RECLAIM;
-
 	/*
 	 * Mlock lost the isolation race with us.  Let try_to_unmap()
 	 * move the page to the unevictable list.
@@ -813,19 +783,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		if (PageWriteback(page)) {
 			nr_writeback++;
-			/*
-			 * Synchronous reclaim cannot queue pages for
-			 * writeback due to the possibility of stack overflow
-			 * but if it encounters a page under writeback, wait
-			 * for the IO to complete.
-			 */
-			if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
-			    may_enter_fs)
-				wait_on_page_writeback(page);
-			else {
-				unlock_page(page);
-				goto keep_lumpy;
-			}
+			unlock_page(page);
+			goto keep;
 		}
 
 		references = page_check_references(page, mz, sc);
@@ -908,7 +867,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto activate_locked;
 			case PAGE_SUCCESS:
 				if (PageWriteback(page))
-					goto keep_lumpy;
+					goto keep;
 				if (PageDirty(page))
 					goto keep;
 
@@ -1007,8 +966,6 @@ activate_locked:
 keep_locked:
 		unlock_page(page);
 keep:
-		reset_reclaim_mode(sc);
-keep_lumpy:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
@@ -1064,11 +1021,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
 	if (!all_lru_mode && !!page_is_file_cache(page) != file)
 		return ret;
 
-	/*
-	 * When this function is being called for lumpy reclaim, we
-	 * initially look into all LRU pages, active, inactive and
-	 * unevictable; only give shrink_page_list evictable pages.
-	 */
+	/* Do not give back unevictable pages for compaction */
 	if (PageUnevictable(page))
 		return ret;
 
@@ -1153,9 +1106,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	struct lruvec *lruvec;
 	struct list_head *src;
 	unsigned long nr_taken = 0;
-	unsigned long nr_lumpy_taken = 0;
-	unsigned long nr_lumpy_dirty = 0;
-	unsigned long nr_lumpy_failed = 0;
 	unsigned long scan;
 	int lru = LRU_BASE;
 
@@ -1168,10 +1118,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 
 	for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
 		struct page *page;
-		unsigned long pfn;
-		unsigned long end_pfn;
-		unsigned long page_pfn;
-		int zone_id;
 
 		page = lru_to_page(src);
 		prefetchw_prev_lru_page(page, src, flags);
@@ -1193,84 +1139,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		default:
 			BUG();
 		}
-
-		if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM))
-			continue;
-
-		/*
-		 * Attempt to take all pages in the order aligned region
-		 * surrounding the tag page.  Only take those pages of
-		 * the same active state as that tag page.  We may safely
-		 * round the target page pfn down to the requested order
-		 * as the mem_map is guaranteed valid out to MAX_ORDER,
-		 * where that page is in a different zone we will detect
-		 * it from its zone id and abort this block scan.
-		 */
-		zone_id = page_zone_id(page);
-		page_pfn = page_to_pfn(page);
-		pfn = page_pfn & ~((1 << sc->order) - 1);
-		end_pfn = pfn + (1 << sc->order);
-		for (; pfn < end_pfn; pfn++) {
-			struct page *cursor_page;
-
-			/* The target page is in the block, ignore it. */
-			if (unlikely(pfn == page_pfn))
-				continue;
-
-			/* Avoid holes within the zone. */
-			if (unlikely(!pfn_valid_within(pfn)))
-				break;
-
-			cursor_page = pfn_to_page(pfn);
-
-			/* Check that we have not crossed a zone boundary. */
-			if (unlikely(page_zone_id(cursor_page) != zone_id))
-				break;
-
-			/*
-			 * If we don't have enough swap space, reclaiming of
-			 * anon page which don't already have a swap slot is
-			 * pointless.
-			 */
-			if (nr_swap_pages <= 0 && PageSwapBacked(cursor_page) &&
-			    !PageSwapCache(cursor_page))
-				break;
-
-			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
-				unsigned int isolated_pages;
-
-				mem_cgroup_lru_del(cursor_page);
-				list_move(&cursor_page->lru, dst);
-				isolated_pages = hpage_nr_pages(cursor_page);
-				nr_taken += isolated_pages;
-				nr_lumpy_taken += isolated_pages;
-				if (PageDirty(cursor_page))
-					nr_lumpy_dirty += isolated_pages;
-				scan++;
-				pfn += isolated_pages - 1;
-			} else {
-				/*
-				 * Check if the page is freed already.
-				 *
-				 * We can't use page_count() as that
-				 * requires compound_head and we don't
-				 * have a pin on the page here. If a
-				 * page is tail, we may or may not
-				 * have isolated the head, so assume
-				 * it's not free, it'd be tricky to
-				 * track the head status without a
-				 * page pin.
-				 */
-				if (!PageTail(cursor_page) &&
-				    !atomic_read(&cursor_page->_count))
-					continue;
-				break;
-			}
-		}
-
-		/* If we break out of the loop above, lumpy reclaim failed */
-		if (pfn < end_pfn)
-			nr_lumpy_failed++;
 	}
 
 	*nr_scanned = scan;
@@ -1278,7 +1146,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 	trace_mm_vmscan_lru_isolate(sc->order,
 			nr_to_scan, scan,
 			nr_taken,
-			nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed,
 			mode, file);
 	return nr_taken;
 }
@@ -1454,47 +1321,6 @@ update_isolated_counts(struct mem_cgroup_zone *mz,
 }
 
 /*
- * Returns true if a direct reclaim should wait on pages under writeback.
- *
- * If we are direct reclaiming for contiguous pages and we do not reclaim
- * everything in the list, try again and wait for writeback IO to complete.
- * This will stall high-order allocations noticeably. Only do that when really
- * need to free the pages under high memory pressure.
- */
-static inline bool should_reclaim_stall(unsigned long nr_taken,
-					unsigned long nr_freed,
-					int priority,
-					struct scan_control *sc)
-{
-	int lumpy_stall_priority;
-
-	/* kswapd should not stall on sync IO */
-	if (current_is_kswapd())
-		return false;
-
-	/* Only stall on lumpy reclaim */
-	if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
-		return false;
-
-	/* If we have reclaimed everything on the isolated list, no stall */
-	if (nr_freed == nr_taken)
-		return false;
-
-	/*
-	 * For high-order allocations, there are two stall thresholds.
-	 * High-cost allocations stall immediately where as lower
-	 * order allocations such as stacks require the scanning
-	 * priority to be much higher before stalling.
-	 */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		lumpy_stall_priority = DEF_PRIORITY;
-	else
-		lumpy_stall_priority = DEF_PRIORITY / 3;
-
-	return priority <= lumpy_stall_priority;
-}
-
-/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1522,9 +1348,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 			return SWAP_CLUSTER_MAX;
 	}
 
-	set_reclaim_mode(priority, sc, false);
-	if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
-		isolate_mode |= ISOLATE_ACTIVE;
+	set_reclaim_mode(priority, sc);
 
 	lru_add_drain();
 
@@ -1556,13 +1380,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 	nr_reclaimed = shrink_page_list(&page_list, mz, sc, priority,
 						&nr_dirty, &nr_writeback);
 
-	/* Check if we should syncronously wait for writeback */
-	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		set_reclaim_mode(priority, sc, true);
-		nr_reclaimed += shrink_page_list(&page_list, mz, sc,
-					priority, &nr_dirty, &nr_writeback);
-	}
-
 	spin_lock_irq(&zone->lru_lock);
 
 	reclaim_stat->recent_scanned[0] += nr_anon;
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] mm: vmscan: Remove reclaim_mode_t
  2012-03-28 16:06 [RFC PATCH 0/2] Removal of lumpy reclaim Mel Gorman
  2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman
@ 2012-03-28 16:06 ` Mel Gorman
  2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton
  2 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2012-03-28 16:06 UTC (permalink / raw)
  To: Linux-MM, LKML
  Cc: Andrew Morton, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins,
	Mel Gorman

There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
and lumpy reclaim have been removed. This patch gets rid of reclaim_mode_t
as well and improves the documentation about what reclaim/compaction is
and when it is triggered.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/trace/events/vmscan.h |    4 +--
 mm/vmscan.c                   |   72 +++++++++++++----------------------------
 2 files changed, 24 insertions(+), 52 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 6f60b33..f66cc93 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -25,12 +25,12 @@
 		{RECLAIM_WB_ASYNC,	"RECLAIM_WB_ASYNC"}	\
 		) : "RECLAIM_WB_NONE"
 
-#define trace_reclaim_flags(page, sync) ( \
+#define trace_reclaim_flags(page) ( \
 	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
 	(RECLAIM_WB_ASYNC)   \
 	)
 
-#define trace_shrink_flags(file, sync) ( \
+#define trace_shrink_flags(file) \
 	( \
 		(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) |  \
 		(RECLAIM_WB_ASYNC) \
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 68319e4..36c6ad2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,16 +53,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
-/*
- * reclaim_mode determines how the inactive list is shrunk
- * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
- * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
- *			order-0 pages and then compact the zone
- */
-typedef unsigned __bitwise__ reclaim_mode_t;
-#define RECLAIM_MODE_SINGLE		((__force reclaim_mode_t)0x01u)
-#define RECLAIM_MODE_COMPACTION		((__force reclaim_mode_t)0x10u)
-
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
 	unsigned long nr_scanned;
@@ -89,12 +79,6 @@ struct scan_control {
 	int order;
 
 	/*
-	 * Intend to reclaim enough continuous memory rather than reclaim
-	 * enough amount of memory. i.e, mode for high order allocation.
-	 */
-	reclaim_mode_t reclaim_mode;
-
-	/*
 	 * The memory cgroup that hit its limit and as a result is the
 	 * primary target of this reclaim invocation.
 	 */
@@ -356,25 +340,6 @@ out:
 	return ret;
 }
 
-static void set_reclaim_mode(int priority, struct scan_control *sc)
-{
-	/*
-	 * Restrict reclaim/compaction to costly allocations or when
-	 * under memory pressure
-	 */
-	if (COMPACTION_BUILD && sc->order &&
-			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
-			 priority < DEF_PRIORITY - 2))
-		sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
-	else
-		sc->reclaim_mode = RECLAIM_MODE_SINGLE;
-}
-
-static void reset_reclaim_mode(struct scan_control *sc)
-{
-	sc->reclaim_mode = RECLAIM_MODE_SINGLE;
-}
-
 static inline int is_page_cache_freeable(struct page *page)
 {
 	/*
@@ -497,8 +462,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			/* synchronous write or broken a_ops? */
 			ClearPageReclaim(page);
 		}
-		trace_mm_vmscan_writepage(page,
-			trace_reclaim_flags(page, sc->reclaim_mode));
+		trace_mm_vmscan_writepage(page, trace_reclaim_flags(page));
 		inc_zone_page_state(page, NR_VMSCAN_WRITE);
 		return PAGE_SUCCESS;
 	}
@@ -953,7 +917,6 @@ cull_mlocked:
 			try_to_free_swap(page);
 		unlock_page(page);
 		putback_lru_page(page);
-		reset_reclaim_mode(sc);
 		continue;
 
 activate_locked:
@@ -1348,8 +1311,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 			return SWAP_CLUSTER_MAX;
 	}
 
-	set_reclaim_mode(priority, sc);
-
 	lru_add_drain();
 
 	if (!sc->may_unmap)
@@ -1428,7 +1389,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
 		zone_idx(zone),
 		nr_scanned, nr_reclaimed,
 		priority,
-		trace_shrink_flags(file, sc->reclaim_mode));
+		trace_shrink_flags(file));
 	return nr_reclaimed;
 }
 
@@ -1507,8 +1468,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	lru_add_drain();
 
-	reset_reclaim_mode(sc);
-
 	if (!sc->may_unmap)
 		isolate_mode |= ISOLATE_UNMAPPED;
 	if (!sc->may_writepage)
@@ -1821,23 +1780,35 @@ out:
 	}
 }
 
+/* Use reclaim/compaction for costly allocs or under memory pressure */
+static bool in_reclaim_compaction(int priority, struct scan_control *sc)
+{
+	if (COMPACTION_BUILD && sc->order &&
+			(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
+			 priority < DEF_PRIORITY - 2))
+		return true;
+
+	return false;
+}
+
 /*
- * Reclaim/compaction depends on a number of pages being freed. To avoid
- * disruption to the system, a small number of order-0 pages continue to be
- * rotated and reclaimed in the normal fashion. However, by the time we get
- * back to the allocator and call try_to_compact_zone(), we ensure that
- * there are enough free pages for it to be likely successful
+ * Reclaim/compaction is used for high-order allocation requests. It reclaims
+ * order-0 pages before compacting the zone. should_continue_reclaim() returns
+ * true if more pages should be reclaimed such that when the page allocator
+ * calls try_to_compact_zone() that it will have enough free pages to succeed.
+ * It will give up earlier than that if there is difficulty reclaiming pages.
  */
 static inline bool should_continue_reclaim(struct mem_cgroup_zone *mz,
 					unsigned long nr_reclaimed,
 					unsigned long nr_scanned,
+					int priority,
 					struct scan_control *sc)
 {
 	unsigned long pages_for_compaction;
 	unsigned long inactive_lru_pages;
 
 	/* If not in reclaim/compaction mode, stop */
-	if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION))
+	if (!in_reclaim_compaction(priority, sc))
 		return false;
 
 	/* Consider stopping depending on scan and reclaim activity */
@@ -1944,7 +1915,8 @@ restart:
 
 	/* reclaim/compaction might need reclaim to continue */
 	if (should_continue_reclaim(mz, nr_reclaimed,
-					sc->nr_scanned - nr_scanned, sc))
+					sc->nr_scanned - nr_scanned,
+					priority, sc))
 		goto restart;
 
 	throttle_vm_writeout(sc->gfp_mask);
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] Removal of lumpy reclaim
  2012-03-28 16:06 [RFC PATCH 0/2] Removal of lumpy reclaim Mel Gorman
  2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman
  2012-03-28 16:06 ` [PATCH 2/2] mm: vmscan: Remove reclaim_mode_t Mel Gorman
@ 2012-04-06 19:34 ` Andrew Morton
  2012-04-06 20:31   ` Hugh Dickins
  2012-04-10  8:32   ` Mel Gorman
  2 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2012-04-06 19:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins

On Wed, 28 Mar 2012 17:06:21 +0100
Mel Gorman <mgorman@suse.de> wrote:

> (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim
> in shrink_active_list()")
> 
> In the interest of keeping my fingers from the flames at LSF/MM, I'm
> releasing an RFC for lumpy reclaim removal.

I grabbed them, thanks.

>
> ...
>
> MMTests Statistics: vmstat
> Page Ins                                     5426648     2840348     2695120
> Page Outs                                    7206376     7854516     7860408
> Swap Ins                                       36799           0           0
> Swap Outs                                      76903           4           0
> Direct pages scanned                           31981       43749      160647
> Kswapd pages scanned                        26658682     1285341     1195956
> Kswapd pages reclaimed                       2248583     1271621     1178420
> Direct pages reclaimed                          6397       14416       94093
> Kswapd efficiency                                 8%         98%         98%
> Kswapd velocity                            22134.225    1127.205    1051.316
> Direct efficiency                                20%         32%         58%
> Direct velocity                               26.553      38.367     141.218
> Percentage direct scans                           0%          3%         11%
> Page writes by reclaim                       6530481           4           0
> Page writes file                             6453578           0           0
> Page writes anon                               76903           4           0
> Page reclaim immediate                        256742       17832       61576
> Page rescued immediate                             0           0           0
> Slabs scanned                                1073152      971776      975872
> Direct inode steals                                0      196279      205178
> Kswapd inode steals                           139260       70390       64323
> Kswapd skipped wait                            21711           1           0
> THP fault alloc                                    1         126         143
> THP collapse alloc                               324         294         224
> THP splits                                        32           8          10
> THP fault fallback                                 0           0           0
> THP collapse fail                                  5           6           7
> Compaction stalls                                364        1312        1324
> Compaction success                               255         343         366
> Compaction failures                              109         969         958
> Compaction pages moved                        265107     3952630     4489215
> Compaction move failure                         7493       26038       24739
>
> ...
>
> Success rates are completely hosed for 3.4-rc1 which is almost certainly
> due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
> expected this would happen for kswapd and impair allocation success rates
> (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
> a difference: 95% less scanning, 43% less reclaim by kswapd
> 
> In comparison, reclaim/compaction is not aggressive and gives up easily
> which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
> much more aggressive about reclaim/compaction than THP allocations are. The
> stress test above is allocating like neither THP or hugetlbfs but is much
> closer to THP.

We seem to be thrashing around a bit with the performance, and we
aren't tracking this closely enough.

What is kswapd efficiency?  pages-relclaimed/pages-scanned?  Why did it
increase so much?  Are pages which were reclaimed via prune_icache_sb()
included?  If so, they can make a real mess of the scanning efficiency
metric.

The increase in PGINODESTEAL is remarkable.  It seems to largely be a
transfer from kswapd inode stealing.  Bad from a latency POV, at least.
What would cause this change?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] Removal of lumpy reclaim
  2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton
@ 2012-04-06 20:31   ` Hugh Dickins
  2012-04-07  3:00     ` KOSAKI Motohiro
  2012-04-09 18:10     ` Rik van Riel
  2012-04-10  8:32   ` Mel Gorman
  1 sibling, 2 replies; 14+ messages in thread
From: Hugh Dickins @ 2012-04-06 20:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Linux-MM, LKML, Rik van Riel, Konstantin Khlebnikov,
	Hugh Dickins

On Fri, 6 Apr 2012, Andrew Morton wrote:
> On Wed, 28 Mar 2012 17:06:21 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim
> > in shrink_active_list()")
> > 
> > In the interest of keeping my fingers from the flames at LSF/MM, I'm
> > releasing an RFC for lumpy reclaim removal.
> 
> I grabbed them, thanks.

I do have a concern with this: I was expecting lumpy reclaim to be
replaced by compaction, and indeed it is when CONFIG_COMPACTION=y.
But when CONFIG_COMPACTION is not set, we're back to 2.6.22 in
relying upon blind chance to provide order>0 pages.

Hugh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: Remove lumpy reclaim
  2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman
@ 2012-04-06 23:52   ` Ying Han
  2012-04-10  8:24     ` Mel Gorman
  0 siblings, 1 reply; 14+ messages in thread
From: Ying Han @ 2012-04-06 23:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Andrew Morton, Rik van Riel,
	Konstantin Khlebnikov, Hugh Dickins

On Wed, Mar 28, 2012 at 9:06 AM, Mel Gorman <mgorman@suse.de> wrote:
> Lumpy reclaim had a purpose but in the mind of some, it was to kick
> the system so hard it trashed. For others the purpose was to complicate
> vmscan.c. Over time it was giving softer shoes and a nicer attitude but
> memory compaction needs to step up and replace it so this patch sends
> lumpy reclaim to the farm.
>
> Here are the important notes related to the patch.
>
> 1. The tracepoint format changes for isolating LRU pages.
>
> 2. This patch stops reclaim/compaction entering sync reclaim as this
>   was only intended for lumpy reclaim and an oversight. Page migration
>   has its own logic for stalling on writeback pages if necessary and
>   memory compaction is already using it. This is a behaviour change.
>
> 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall
>   on PageWriteback with CONFIG_COMPACTION has been this way for a while.
>   I am calling it out in case this is a surpise to people.

Mel,

Can you point me the commit making that change? I am looking at
v3.4-rc1 where set_reclaim_mode() still set RECLAIM_MODE_SYNC for
COMPACTION_BUILD.

--Ying

This behaviour
>   avoids a situation where we wait on a page being written back to
>   slow storage like USB. Currently we depend on wait_iff_congested()
>   for throttling if if too many dirty pages are scanned.
>
> 4. Reclaim/compaction can no longer queue dirty pages in pageout()
>   if the underlying BDI is congested. Lumpy reclaim used this logic and
>   reclaim/compaction was using it in error. This is a behaviour change.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  include/trace/events/vmscan.h |   36 ++-----
>  mm/vmscan.c                   |  209 +++--------------------------------------
>  2 files changed, 22 insertions(+), 223 deletions(-)
>
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index f64560e..6f60b33 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -13,7 +13,7 @@
>  #define RECLAIM_WB_ANON                0x0001u
>  #define RECLAIM_WB_FILE                0x0002u
>  #define RECLAIM_WB_MIXED       0x0010u
> -#define RECLAIM_WB_SYNC                0x0004u
> +#define RECLAIM_WB_SYNC                0x0004u /* Unused, all reclaim async */
>  #define RECLAIM_WB_ASYNC       0x0008u
>
>  #define show_reclaim_flags(flags)                              \
> @@ -27,13 +27,13 @@
>
>  #define trace_reclaim_flags(page, sync) ( \
>        (page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
> -       (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC)   \
> +       (RECLAIM_WB_ASYNC)   \
>        )
>
>  #define trace_shrink_flags(file, sync) ( \
> -       (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_MIXED : \
> -                       (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON)) |  \
> -       (sync & RECLAIM_MODE_SYNC ? RECLAIM_WB_SYNC : RECLAIM_WB_ASYNC) \
> +       ( \
> +               (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) |  \
> +               (RECLAIM_WB_ASYNC) \
>        )
>
>  TRACE_EVENT(mm_vmscan_kswapd_sleep,
> @@ -263,22 +263,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
>                unsigned long nr_requested,
>                unsigned long nr_scanned,
>                unsigned long nr_taken,
> -               unsigned long nr_lumpy_taken,
> -               unsigned long nr_lumpy_dirty,
> -               unsigned long nr_lumpy_failed,
>                isolate_mode_t isolate_mode,
>                int file),
>
> -       TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file),
> +       TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file),
>
>        TP_STRUCT__entry(
>                __field(int, order)
>                __field(unsigned long, nr_requested)
>                __field(unsigned long, nr_scanned)
>                __field(unsigned long, nr_taken)
> -               __field(unsigned long, nr_lumpy_taken)
> -               __field(unsigned long, nr_lumpy_dirty)
> -               __field(unsigned long, nr_lumpy_failed)
>                __field(isolate_mode_t, isolate_mode)
>                __field(int, file)
>        ),
> @@ -288,22 +282,16 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
>                __entry->nr_requested = nr_requested;
>                __entry->nr_scanned = nr_scanned;
>                __entry->nr_taken = nr_taken;
> -               __entry->nr_lumpy_taken = nr_lumpy_taken;
> -               __entry->nr_lumpy_dirty = nr_lumpy_dirty;
> -               __entry->nr_lumpy_failed = nr_lumpy_failed;
>                __entry->isolate_mode = isolate_mode;
>                __entry->file = file;
>        ),
>
> -       TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu contig_taken=%lu contig_dirty=%lu contig_failed=%lu file=%d",
> +       TP_printk("isolate_mode=%d order=%d nr_requested=%lu nr_scanned=%lu nr_taken=%lu file=%d",
>                __entry->isolate_mode,
>                __entry->order,
>                __entry->nr_requested,
>                __entry->nr_scanned,
>                __entry->nr_taken,
> -               __entry->nr_lumpy_taken,
> -               __entry->nr_lumpy_dirty,
> -               __entry->nr_lumpy_failed,
>                __entry->file)
>  );
>
> @@ -313,13 +301,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
>                unsigned long nr_requested,
>                unsigned long nr_scanned,
>                unsigned long nr_taken,
> -               unsigned long nr_lumpy_taken,
> -               unsigned long nr_lumpy_dirty,
> -               unsigned long nr_lumpy_failed,
>                isolate_mode_t isolate_mode,
>                int file),
>
> -       TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file)
> +       TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
>
>  );
>
> @@ -329,13 +314,10 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
>                unsigned long nr_requested,
>                unsigned long nr_scanned,
>                unsigned long nr_taken,
> -               unsigned long nr_lumpy_taken,
> -               unsigned long nr_lumpy_dirty,
> -               unsigned long nr_lumpy_failed,
>                isolate_mode_t isolate_mode,
>                int file),
>
> -       TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode, file)
> +       TP_ARGS(order, nr_requested, nr_scanned, nr_taken, isolate_mode, file)
>
>  );
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 33c332b..68319e4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -56,19 +56,11 @@
>  /*
>  * reclaim_mode determines how the inactive list is shrunk
>  * RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
> - * RECLAIM_MODE_ASYNC:  Do not block
> - * RECLAIM_MODE_SYNC:   Allow blocking e.g. call wait_on_page_writeback
> - * RECLAIM_MODE_LUMPYRECLAIM: For high-order allocations, take a reference
> - *                     page from the LRU and reclaim all pages within a
> - *                     naturally aligned range
>  * RECLAIM_MODE_COMPACTION: For high-order allocations, reclaim a number of
>  *                     order-0 pages and then compact the zone
>  */
>  typedef unsigned __bitwise__ reclaim_mode_t;
>  #define RECLAIM_MODE_SINGLE            ((__force reclaim_mode_t)0x01u)
> -#define RECLAIM_MODE_ASYNC             ((__force reclaim_mode_t)0x02u)
> -#define RECLAIM_MODE_SYNC              ((__force reclaim_mode_t)0x04u)
> -#define RECLAIM_MODE_LUMPYRECLAIM      ((__force reclaim_mode_t)0x08u)
>  #define RECLAIM_MODE_COMPACTION                ((__force reclaim_mode_t)0x10u)
>
>  struct scan_control {
> @@ -364,37 +356,23 @@ out:
>        return ret;
>  }
>
> -static void set_reclaim_mode(int priority, struct scan_control *sc,
> -                                  bool sync)
> +static void set_reclaim_mode(int priority, struct scan_control *sc)
>  {
> -       reclaim_mode_t syncmode = sync ? RECLAIM_MODE_SYNC : RECLAIM_MODE_ASYNC;
> -
>        /*
> -        * Initially assume we are entering either lumpy reclaim or
> -        * reclaim/compaction.Depending on the order, we will either set the
> -        * sync mode or just reclaim order-0 pages later.
> -        */
> -       if (COMPACTION_BUILD)
> -               sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
> -       else
> -               sc->reclaim_mode = RECLAIM_MODE_LUMPYRECLAIM;
> -
> -       /*
> -        * Avoid using lumpy reclaim or reclaim/compaction if possible by
> -        * restricting when its set to either costly allocations or when
> +        * Restrict reclaim/compaction to costly allocations or when
>         * under memory pressure
>         */
> -       if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> -               sc->reclaim_mode |= syncmode;
> -       else if (sc->order && priority < DEF_PRIORITY - 2)
> -               sc->reclaim_mode |= syncmode;
> +       if (COMPACTION_BUILD && sc->order &&
> +                       (sc->order > PAGE_ALLOC_COSTLY_ORDER ||
> +                        priority < DEF_PRIORITY - 2))
> +               sc->reclaim_mode = RECLAIM_MODE_COMPACTION;
>        else
> -               sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
> +               sc->reclaim_mode = RECLAIM_MODE_SINGLE;
>  }
>
>  static void reset_reclaim_mode(struct scan_control *sc)
>  {
> -       sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
> +       sc->reclaim_mode = RECLAIM_MODE_SINGLE;
>  }
>
>  static inline int is_page_cache_freeable(struct page *page)
> @@ -416,10 +394,6 @@ static int may_write_to_queue(struct backing_dev_info *bdi,
>                return 1;
>        if (bdi == current->backing_dev_info)
>                return 1;
> -
> -       /* lumpy reclaim for hugepage often need a lot of write */
> -       if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> -               return 1;
>        return 0;
>  }
>
> @@ -710,10 +684,6 @@ static enum page_references page_check_references(struct page *page,
>        referenced_ptes = page_referenced(page, 1, mz->mem_cgroup, &vm_flags);
>        referenced_page = TestClearPageReferenced(page);
>
> -       /* Lumpy reclaim - ignore references */
> -       if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
> -               return PAGEREF_RECLAIM;
> -
>        /*
>         * Mlock lost the isolation race with us.  Let try_to_unmap()
>         * move the page to the unevictable list.
> @@ -813,19 +783,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>
>                if (PageWriteback(page)) {
>                        nr_writeback++;
> -                       /*
> -                        * Synchronous reclaim cannot queue pages for
> -                        * writeback due to the possibility of stack overflow
> -                        * but if it encounters a page under writeback, wait
> -                        * for the IO to complete.
> -                        */
> -                       if ((sc->reclaim_mode & RECLAIM_MODE_SYNC) &&
> -                           may_enter_fs)
> -                               wait_on_page_writeback(page);
> -                       else {
> -                               unlock_page(page);
> -                               goto keep_lumpy;
> -                       }
> +                       unlock_page(page);
> +                       goto keep;
>                }
>
>                references = page_check_references(page, mz, sc);
> @@ -908,7 +867,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>                                goto activate_locked;
>                        case PAGE_SUCCESS:
>                                if (PageWriteback(page))
> -                                       goto keep_lumpy;
> +                                       goto keep;
>                                if (PageDirty(page))
>                                        goto keep;
>
> @@ -1007,8 +966,6 @@ activate_locked:
>  keep_locked:
>                unlock_page(page);
>  keep:
> -               reset_reclaim_mode(sc);
> -keep_lumpy:
>                list_add(&page->lru, &ret_pages);
>                VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
>        }
> @@ -1064,11 +1021,7 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
>        if (!all_lru_mode && !!page_is_file_cache(page) != file)
>                return ret;
>
> -       /*
> -        * When this function is being called for lumpy reclaim, we
> -        * initially look into all LRU pages, active, inactive and
> -        * unevictable; only give shrink_page_list evictable pages.
> -        */
> +       /* Do not give back unevictable pages for compaction */
>        if (PageUnevictable(page))
>                return ret;
>
> @@ -1153,9 +1106,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>        struct lruvec *lruvec;
>        struct list_head *src;
>        unsigned long nr_taken = 0;
> -       unsigned long nr_lumpy_taken = 0;
> -       unsigned long nr_lumpy_dirty = 0;
> -       unsigned long nr_lumpy_failed = 0;
>        unsigned long scan;
>        int lru = LRU_BASE;
>
> @@ -1168,10 +1118,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>
>        for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
>                struct page *page;
> -               unsigned long pfn;
> -               unsigned long end_pfn;
> -               unsigned long page_pfn;
> -               int zone_id;
>
>                page = lru_to_page(src);
>                prefetchw_prev_lru_page(page, src, flags);
> @@ -1193,84 +1139,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>                default:
>                        BUG();
>                }
> -
> -               if (!sc->order || !(sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM))
> -                       continue;
> -
> -               /*
> -                * Attempt to take all pages in the order aligned region
> -                * surrounding the tag page.  Only take those pages of
> -                * the same active state as that tag page.  We may safely
> -                * round the target page pfn down to the requested order
> -                * as the mem_map is guaranteed valid out to MAX_ORDER,
> -                * where that page is in a different zone we will detect
> -                * it from its zone id and abort this block scan.
> -                */
> -               zone_id = page_zone_id(page);
> -               page_pfn = page_to_pfn(page);
> -               pfn = page_pfn & ~((1 << sc->order) - 1);
> -               end_pfn = pfn + (1 << sc->order);
> -               for (; pfn < end_pfn; pfn++) {
> -                       struct page *cursor_page;
> -
> -                       /* The target page is in the block, ignore it. */
> -                       if (unlikely(pfn == page_pfn))
> -                               continue;
> -
> -                       /* Avoid holes within the zone. */
> -                       if (unlikely(!pfn_valid_within(pfn)))
> -                               break;
> -
> -                       cursor_page = pfn_to_page(pfn);
> -
> -                       /* Check that we have not crossed a zone boundary. */
> -                       if (unlikely(page_zone_id(cursor_page) != zone_id))
> -                               break;
> -
> -                       /*
> -                        * If we don't have enough swap space, reclaiming of
> -                        * anon page which don't already have a swap slot is
> -                        * pointless.
> -                        */
> -                       if (nr_swap_pages <= 0 && PageSwapBacked(cursor_page) &&
> -                           !PageSwapCache(cursor_page))
> -                               break;
> -
> -                       if (__isolate_lru_page(cursor_page, mode, file) == 0) {
> -                               unsigned int isolated_pages;
> -
> -                               mem_cgroup_lru_del(cursor_page);
> -                               list_move(&cursor_page->lru, dst);
> -                               isolated_pages = hpage_nr_pages(cursor_page);
> -                               nr_taken += isolated_pages;
> -                               nr_lumpy_taken += isolated_pages;
> -                               if (PageDirty(cursor_page))
> -                                       nr_lumpy_dirty += isolated_pages;
> -                               scan++;
> -                               pfn += isolated_pages - 1;
> -                       } else {
> -                               /*
> -                                * Check if the page is freed already.
> -                                *
> -                                * We can't use page_count() as that
> -                                * requires compound_head and we don't
> -                                * have a pin on the page here. If a
> -                                * page is tail, we may or may not
> -                                * have isolated the head, so assume
> -                                * it's not free, it'd be tricky to
> -                                * track the head status without a
> -                                * page pin.
> -                                */
> -                               if (!PageTail(cursor_page) &&
> -                                   !atomic_read(&cursor_page->_count))
> -                                       continue;
> -                               break;
> -                       }
> -               }
> -
> -               /* If we break out of the loop above, lumpy reclaim failed */
> -               if (pfn < end_pfn)
> -                       nr_lumpy_failed++;
>        }
>
>        *nr_scanned = scan;
> @@ -1278,7 +1146,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>        trace_mm_vmscan_lru_isolate(sc->order,
>                        nr_to_scan, scan,
>                        nr_taken,
> -                       nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed,
>                        mode, file);
>        return nr_taken;
>  }
> @@ -1454,47 +1321,6 @@ update_isolated_counts(struct mem_cgroup_zone *mz,
>  }
>
>  /*
> - * Returns true if a direct reclaim should wait on pages under writeback.
> - *
> - * If we are direct reclaiming for contiguous pages and we do not reclaim
> - * everything in the list, try again and wait for writeback IO to complete.
> - * This will stall high-order allocations noticeably. Only do that when really
> - * need to free the pages under high memory pressure.
> - */
> -static inline bool should_reclaim_stall(unsigned long nr_taken,
> -                                       unsigned long nr_freed,
> -                                       int priority,
> -                                       struct scan_control *sc)
> -{
> -       int lumpy_stall_priority;
> -
> -       /* kswapd should not stall on sync IO */
> -       if (current_is_kswapd())
> -               return false;
> -
> -       /* Only stall on lumpy reclaim */
> -       if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
> -               return false;
> -
> -       /* If we have reclaimed everything on the isolated list, no stall */
> -       if (nr_freed == nr_taken)
> -               return false;
> -
> -       /*
> -        * For high-order allocations, there are two stall thresholds.
> -        * High-cost allocations stall immediately where as lower
> -        * order allocations such as stacks require the scanning
> -        * priority to be much higher before stalling.
> -        */
> -       if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> -               lumpy_stall_priority = DEF_PRIORITY;
> -       else
> -               lumpy_stall_priority = DEF_PRIORITY / 3;
> -
> -       return priority <= lumpy_stall_priority;
> -}
> -
> -/*
>  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
>  * of reclaimed pages
>  */
> @@ -1522,9 +1348,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>                        return SWAP_CLUSTER_MAX;
>        }
>
> -       set_reclaim_mode(priority, sc, false);
> -       if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
> -               isolate_mode |= ISOLATE_ACTIVE;
> +       set_reclaim_mode(priority, sc);
>
>        lru_add_drain();
>
> @@ -1556,13 +1380,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct mem_cgroup_zone *mz,
>        nr_reclaimed = shrink_page_list(&page_list, mz, sc, priority,
>                                                &nr_dirty, &nr_writeback);
>
> -       /* Check if we should syncronously wait for writeback */
> -       if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
> -               set_reclaim_mode(priority, sc, true);
> -               nr_reclaimed += shrink_page_list(&page_list, mz, sc,
> -                                       priority, &nr_dirty, &nr_writeback);
> -       }
> -
>        spin_lock_irq(&zone->lru_lock);
>
>        reclaim_stat->recent_scanned[0] += nr_anon;
> --
> 1.7.9.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] Removal of lumpy reclaim
  2012-04-06 20:31   ` Hugh Dickins
@ 2012-04-07  3:00     ` KOSAKI Motohiro
  2012-04-09 18:10     ` Rik van Riel
  1 sibling, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2012-04-07  3:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mel Gorman, Linux-MM, LKML, Rik van Riel,
	Konstantin Khlebnikov, kosaki.motohiro

(4/6/12 1:31 PM), Hugh Dickins wrote:
> On Fri, 6 Apr 2012, Andrew Morton wrote:
>> On Wed, 28 Mar 2012 17:06:21 +0100
>> Mel Gorman<mgorman@suse.de>  wrote:
>>
>>> (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim
>>> in shrink_active_list()")
>>>
>>> In the interest of keeping my fingers from the flames at LSF/MM, I'm
>>> releasing an RFC for lumpy reclaim removal.
>>
>> I grabbed them, thanks.
>
> I do have a concern with this: I was expecting lumpy reclaim to be
> replaced by compaction, and indeed it is when CONFIG_COMPACTION=y.
> But when CONFIG_COMPACTION is not set, we're back to 2.6.22 in
> relying upon blind chance to provide order>0 pages.

I was putted most big objection to remove lumpy when compaction merging. But
I think that's ok. Because of, desktop and server people always use COMPACTION=y
kernel and embedded people don't use swap (then lumpy wouldn't work).

My thought was to keep gradual development and avoid aggressive regression. and
Mel did. compaction is now completely stable and we have no reason to keep lumpy,
I think.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] Removal of lumpy reclaim
  2012-04-06 20:31   ` Hugh Dickins
  2012-04-07  3:00     ` KOSAKI Motohiro
@ 2012-04-09 18:10     ` Rik van Riel
  2012-04-09 19:18       ` Hugh Dickins
  1 sibling, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2012-04-09 18:10 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mel Gorman, Linux-MM, LKML, Konstantin Khlebnikov

On 04/06/2012 04:31 PM, Hugh Dickins wrote:
> On Fri, 6 Apr 2012, Andrew Morton wrote:
>> On Wed, 28 Mar 2012 17:06:21 +0100
>> Mel Gorman<mgorman@suse.de>  wrote:
>>
>>> (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim
>>> in shrink_active_list()")
>>>
>>> In the interest of keeping my fingers from the flames at LSF/MM, I'm
>>> releasing an RFC for lumpy reclaim removal.
>>
>> I grabbed them, thanks.
>
> I do have a concern with this: I was expecting lumpy reclaim to be
> replaced by compaction, and indeed it is when CONFIG_COMPACTION=y.
> But when CONFIG_COMPACTION is not set, we're back to 2.6.22 in
> relying upon blind chance to provide order>0 pages.

Is this an issue for any architecture?

I could see NOMMU being unable to use compaction, but
chances are lumpy reclaim would be sufficient for that
configuration, anyway...

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] Removal of lumpy reclaim
  2012-04-09 18:10     ` Rik van Riel
@ 2012-04-09 19:18       ` Hugh Dickins
  2012-04-09 23:40         ` Rik van Riel
  0 siblings, 1 reply; 14+ messages in thread
From: Hugh Dickins @ 2012-04-09 19:18 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Linux-MM, LKML, Konstantin Khlebnikov

On Mon, 9 Apr 2012, Rik van Riel wrote:
> On 04/06/2012 04:31 PM, Hugh Dickins wrote:
> > On Fri, 6 Apr 2012, Andrew Morton wrote:
> > > On Wed, 28 Mar 2012 17:06:21 +0100
> > > Mel Gorman<mgorman@suse.de>  wrote:
> > > 
> > > > (cc'ing active people in the thread "[patch 68/92] mm: forbid
> > > > lumpy-reclaim
> > > > in shrink_active_list()")
> > > > 
> > > > In the interest of keeping my fingers from the flames at LSF/MM, I'm
> > > > releasing an RFC for lumpy reclaim removal.
> > > 
> > > I grabbed them, thanks.
> > 
> > I do have a concern with this: I was expecting lumpy reclaim to be
> > replaced by compaction, and indeed it is when CONFIG_COMPACTION=y.
> > But when CONFIG_COMPACTION is not set, we're back to 2.6.22 in
> > relying upon blind chance to provide order>0 pages.
> 
> Is this an issue for any architecture?

Dunno about any architecture as a whole; but I'd expect users of SLOB
or TINY config options to want to still use lumpy rather than the more
efficient but weightier COMPACTION+MIGRATION.

Though "size migrate.o compaction.o" on my 32-bit config does not
reach 8kB, so maybe it's not a big deal after all.

> 
> I could see NOMMU being unable to use compaction, but

Yes, COMPACTION depends on MMU.

> chances are lumpy reclaim would be sufficient for that
> configuration, anyway...

That's an argument for your patch in 3.4-rc, which uses lumpy only
when !COMPACTION_BUILD.  But here we're worrying about Mel's patch,
which removes the lumpy code completely.

Hugh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] Removal of lumpy reclaim
  2012-04-09 19:18       ` Hugh Dickins
@ 2012-04-09 23:40         ` Rik van Riel
  0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2012-04-09 23:40 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Mel Gorman, Linux-MM, LKML, Konstantin Khlebnikov

On 04/09/2012 03:18 PM, Hugh Dickins wrote:
> On Mon, 9 Apr 2012, Rik van Riel wrote:

>> I could see NOMMU being unable to use compaction, but
>
> Yes, COMPACTION depends on MMU.
>
>> chances are lumpy reclaim would be sufficient for that
>> configuration, anyway...
>
> That's an argument for your patch in 3.4-rc, which uses lumpy only
> when !COMPACTION_BUILD.  But here we're worrying about Mel's patch,
> which removes the lumpy code completely.

Sorry, that was a typo in my mail.

I wanted to say that I expect lumpy reclaim to NOT be
sufficient for NOMMU anyway, because it cannot reclaim
lumps of memory large enough to fit a new process.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: Remove lumpy reclaim
  2012-04-06 23:52   ` Ying Han
@ 2012-04-10  8:24     ` Mel Gorman
  2012-04-10  9:29       ` Mel Gorman
  0 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2012-04-10  8:24 UTC (permalink / raw)
  To: Ying Han
  Cc: Linux-MM, LKML, Andrew Morton, Rik van Riel,
	Konstantin Khlebnikov, Hugh Dickins

On Fri, Apr 06, 2012 at 04:52:09PM -0700, Ying Han wrote:
> On Wed, Mar 28, 2012 at 9:06 AM, Mel Gorman <mgorman@suse.de> wrote:
> > Lumpy reclaim had a purpose but in the mind of some, it was to kick
> > the system so hard it trashed. For others the purpose was to complicate
> > vmscan.c. Over time it was giving softer shoes and a nicer attitude but
> > memory compaction needs to step up and replace it so this patch sends
> > lumpy reclaim to the farm.
> >
> > Here are the important notes related to the patch.
> >
> > 1. The tracepoint format changes for isolating LRU pages.
> >
> > 2. This patch stops reclaim/compaction entering sync reclaim as this
> >   was only intended for lumpy reclaim and an oversight. Page migration
> >   has its own logic for stalling on writeback pages if necessary and
> >   memory compaction is already using it. This is a behaviour change.
> >
> > 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall
> >   on PageWriteback with CONFIG_COMPACTION has been this way for a while.
> >   I am calling it out in case this is a surpise to people.
> 
> Mel,
> 
> Can you point me the commit making that change? I am looking at
> v3.4-rc1 where set_reclaim_mode() still set RECLAIM_MODE_SYNC for
> COMPACTION_BUILD.
> 

You're right.

There is only one call site that passes sync==true for set_reclaim_mode() in
vmscan.c and that is only if should_reclaim_stall() returns true. It had the
comment "Only stall on lumpy reclaim" but the comment is not accurate
and that mislead me.

Thanks, I'll revisit the patch.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] Removal of lumpy reclaim
  2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton
  2012-04-06 20:31   ` Hugh Dickins
@ 2012-04-10  8:32   ` Mel Gorman
  1 sibling, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2012-04-10  8:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, Rik van Riel, Konstantin Khlebnikov, Hugh Dickins

On Fri, Apr 06, 2012 at 12:34:39PM -0700, Andrew Morton wrote:
> On Wed, 28 Mar 2012 17:06:21 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > (cc'ing active people in the thread "[patch 68/92] mm: forbid lumpy-reclaim
> > in shrink_active_list()")
> > 
> > In the interest of keeping my fingers from the flames at LSF/MM, I'm
> > releasing an RFC for lumpy reclaim removal.
> 
> I grabbed them, thanks.
> 

There probably will be a V2 as Ying pointed out a problem with patch 1.

> >
> > ...
> >
> > MMTests Statistics: vmstat
> > Page Ins                                     5426648     2840348     2695120
> > Page Outs                                    7206376     7854516     7860408
> > Swap Ins                                       36799           0           0
> > Swap Outs                                      76903           4           0
> > Direct pages scanned                           31981       43749      160647
> > Kswapd pages scanned                        26658682     1285341     1195956
> > Kswapd pages reclaimed                       2248583     1271621     1178420
> > Direct pages reclaimed                          6397       14416       94093
> > Kswapd efficiency                                 8%         98%         98%
> > Kswapd velocity                            22134.225    1127.205    1051.316
> > Direct efficiency                                20%         32%         58%
> > Direct velocity                               26.553      38.367     141.218
> > Percentage direct scans                           0%          3%         11%
> > Page writes by reclaim                       6530481           4           0
> > Page writes file                             6453578           0           0
> > Page writes anon                               76903           4           0
> > Page reclaim immediate                        256742       17832       61576
> > Page rescued immediate                             0           0           0
> > Slabs scanned                                1073152      971776      975872
> > Direct inode steals                                0      196279      205178
> > Kswapd inode steals                           139260       70390       64323
> > Kswapd skipped wait                            21711           1           0
> > THP fault alloc                                    1         126         143
> > THP collapse alloc                               324         294         224
> > THP splits                                        32           8          10
> > THP fault fallback                                 0           0           0
> > THP collapse fail                                  5           6           7
> > Compaction stalls                                364        1312        1324
> > Compaction success                               255         343         366
> > Compaction failures                              109         969         958
> > Compaction pages moved                        265107     3952630     4489215
> > Compaction move failure                         7493       26038       24739
> >
> > ...
> >
> > Success rates are completely hosed for 3.4-rc1 which is almost certainly
> > due to [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. I
> > expected this would happen for kswapd and impair allocation success rates
> > (https://lkml.org/lkml/2012/1/25/166) but I did not anticipate this much
> > a difference: 95% less scanning, 43% less reclaim by kswapd
> > 
> > In comparison, reclaim/compaction is not aggressive and gives up easily
> > which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would be
> > much more aggressive about reclaim/compaction than THP allocations are. The
> > stress test above is allocating like neither THP or hugetlbfs but is much
> > closer to THP.
> 
> We seem to be thrashing around a bit with the performance, and we
> aren't tracking this closely enough.
> 

Yes.

> What is kswapd efficiency?  pages-relclaimed/pages-scanned? 

pages_reclaimed*100/pages_scanned

> Why did it
> increase so much? 

Lumpy reclaim increases the number of pages scanned in
isolate_lru_pages() and that is what I was attributing it to.

> Are pages which were reclaimed via prune_icache_sb()
> included?  If so, they can make a real mess of the scanning efficiency
> metric.
> 

I don't think so. For Kswapd efficiency, I'm using "kswapd_steal" from
vmstat and that is updated by shrink_inactive_list and not the slab
shrinker

> The increase in PGINODESTEAL is remarkable.  It seems to largely be a
> transfer from kswapd inode stealing.  Bad from a latency POV, at least.
> What would cause this change?

I'm playing catch-up at the moment and right now, I do not have a good
explanation as to why it changed like this. The most likely explanation
is that we are reclaiming fewer pages leading to more slab reclaim.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: Remove lumpy reclaim
  2012-04-10  8:24     ` Mel Gorman
@ 2012-04-10  9:29       ` Mel Gorman
  2012-04-10 17:25         ` Ying Han
  0 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2012-04-10  9:29 UTC (permalink / raw)
  To: Ying Han
  Cc: Linux-MM, LKML, Andrew Morton, Rik van Riel,
	Konstantin Khlebnikov, Hugh Dickins

On Tue, Apr 10, 2012 at 09:24:54AM +0100, Mel Gorman wrote:
> On Fri, Apr 06, 2012 at 04:52:09PM -0700, Ying Han wrote:
> > On Wed, Mar 28, 2012 at 9:06 AM, Mel Gorman <mgorman@suse.de> wrote:
> > > Lumpy reclaim had a purpose but in the mind of some, it was to kick
> > > the system so hard it trashed. For others the purpose was to complicate
> > > vmscan.c. Over time it was giving softer shoes and a nicer attitude but
> > > memory compaction needs to step up and replace it so this patch sends
> > > lumpy reclaim to the farm.
> > >
> > > Here are the important notes related to the patch.
> > >
> > > 1. The tracepoint format changes for isolating LRU pages.
> > >
> > > 2. This patch stops reclaim/compaction entering sync reclaim as this
> > >   was only intended for lumpy reclaim and an oversight. Page migration
> > >   has its own logic for stalling on writeback pages if necessary and
> > >   memory compaction is already using it. This is a behaviour change.
> > >
> > > 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall
> > >   on PageWriteback with CONFIG_COMPACTION has been this way for a while.
> > >   I am calling it out in case this is a surpise to people.
> > 
> > Mel,
> > 
> > Can you point me the commit making that change? I am looking at
> > v3.4-rc1 where set_reclaim_mode() still set RECLAIM_MODE_SYNC for
> > COMPACTION_BUILD.
> > 
> 
> You're right.
> 
> There is only one call site that passes sync==true for set_reclaim_mode() in
> vmscan.c and that is only if should_reclaim_stall() returns true. It had the
> comment "Only stall on lumpy reclaim" but the comment is not accurate
> and that mislead me.
> 
> Thanks, I'll revisit the patch.
> 

Just to be clear, I think the patch is right in that stalling on page
writeback was intended just for lumpy reclaim. I've split out the patch
that stops reclaim/compaction entering sync reclaim but the end result
of the series is the same. Unfortunately we do not have tracing to record
how often reclaim waited on writeback during compaction so my historical
data does not indicate how often it happened. However, it may partially
explain occasionaly complaints about interactivity during heavy writeback
when THP is enabled (the bulk of the stalls were due to something else but
on rare occasions disabling THP was reported to make a small unquantifable
difference). I'll enable ftrace to record how often mm_vmscan_writepage()
used RECLAIM_MODE_SYNC during tests for this series and include that
information in the changelog.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: Remove lumpy reclaim
  2012-04-10  9:29       ` Mel Gorman
@ 2012-04-10 17:25         ` Ying Han
  0 siblings, 0 replies; 14+ messages in thread
From: Ying Han @ 2012-04-10 17:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, LKML, Andrew Morton, Rik van Riel,
	Konstantin Khlebnikov, Hugh Dickins

On Tue, Apr 10, 2012 at 2:29 AM, Mel Gorman <mgorman@suse.de> wrote:
> On Tue, Apr 10, 2012 at 09:24:54AM +0100, Mel Gorman wrote:
>> On Fri, Apr 06, 2012 at 04:52:09PM -0700, Ying Han wrote:
>> > On Wed, Mar 28, 2012 at 9:06 AM, Mel Gorman <mgorman@suse.de> wrote:
>> > > Lumpy reclaim had a purpose but in the mind of some, it was to kick
>> > > the system so hard it trashed. For others the purpose was to complicate
>> > > vmscan.c. Over time it was giving softer shoes and a nicer attitude but
>> > > memory compaction needs to step up and replace it so this patch sends
>> > > lumpy reclaim to the farm.
>> > >
>> > > Here are the important notes related to the patch.
>> > >
>> > > 1. The tracepoint format changes for isolating LRU pages.
>> > >
>> > > 2. This patch stops reclaim/compaction entering sync reclaim as this
>> > >   was only intended for lumpy reclaim and an oversight. Page migration
>> > >   has its own logic for stalling on writeback pages if necessary and
>> > >   memory compaction is already using it. This is a behaviour change.
>> > >
>> > > 3. RECLAIM_MODE_SYNC no longer exists. pageout() does not stall
>> > >   on PageWriteback with CONFIG_COMPACTION has been this way for a while.
>> > >   I am calling it out in case this is a surpise to people.
>> >
>> > Mel,
>> >
>> > Can you point me the commit making that change? I am looking at
>> > v3.4-rc1 where set_reclaim_mode() still set RECLAIM_MODE_SYNC for
>> > COMPACTION_BUILD.
>> >
>>
>> You're right.
>>
>> There is only one call site that passes sync==true for set_reclaim_mode() in
>> vmscan.c and that is only if should_reclaim_stall() returns true. It had the
>> comment "Only stall on lumpy reclaim" but the comment is not accurate
>> and that mislead me.
>>
>> Thanks, I'll revisit the patch.
>>
>
> Just to be clear, I think the patch is right in that stalling on page
> writeback was intended just for lumpy reclaim.

I see mismatch between the comment "Only stall on lumpy reclaim" and
the actual implementation in should_reclaim_stall(). Not sure what is
intended, but based on the code, both lumpy and compaction reclaim
will be stalled under PageWriteback.

I've split out the patch
> that stops reclaim/compaction entering sync reclaim but the end result
> of the series is the same.

I think that make senses to me for compaction due to its migrating page nature.

Unfortunately we do not have tracing to record
> how often reclaim waited on writeback during compaction so my historical
> data does not indicate how often it happened. However, it may partially
> explain occasionaly complaints about interactivity during heavy writeback
> when THP is enabled (the bulk of the stalls were due to something else but
> on rare occasions disabling THP was reported to make a small unquantifable
> difference). I'll enable ftrace to record how often mm_vmscan_writepage()
> used RECLAIM_MODE_SYNC during tests for this series and include that
> information in the changelog.

Thanks for looking into it.

--Ying

> --
> Mel Gorman
> SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-04-10 17:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-28 16:06 [RFC PATCH 0/2] Removal of lumpy reclaim Mel Gorman
2012-03-28 16:06 ` [PATCH 1/2] mm: vmscan: Remove " Mel Gorman
2012-04-06 23:52   ` Ying Han
2012-04-10  8:24     ` Mel Gorman
2012-04-10  9:29       ` Mel Gorman
2012-04-10 17:25         ` Ying Han
2012-03-28 16:06 ` [PATCH 2/2] mm: vmscan: Remove reclaim_mode_t Mel Gorman
2012-04-06 19:34 ` [RFC PATCH 0/2] Removal of lumpy reclaim Andrew Morton
2012-04-06 20:31   ` Hugh Dickins
2012-04-07  3:00     ` KOSAKI Motohiro
2012-04-09 18:10     ` Rik van Riel
2012-04-09 19:18       ` Hugh Dickins
2012-04-09 23:40         ` Rik van Riel
2012-04-10  8:32   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).