All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/7] low latency synchrounous lumpy reclaim
@ 2010-08-05  6:11 ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:11 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro


If slow usb storage is connected and run plenty io operation, lumpy
reclaim often stall in shrink_inactive_list(). This patch series try 
to solve this issue.
At least, This works fine on my desktop and usb stick environment :-)

This patch is still RFC. comment, reviewing and testing are welcome!




Wu Fengguang (1):
  vmscan: raise the bar to PAGEOUT_IO_SYNC stalls

KOSAKI Motohiro (6):
  vmscan: synchronous lumpy reclaim don't call congestion_wait()
  vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
  vmscan: narrowing synchrounous lumply reclaim condition
  vmscan: kill dead code in shrink_inactive_list()
  vmscan: remove PF_SWAPWRITE from __zone_reclaim()
  vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated

 mm/vmscan.c |  211 ++++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 138 insertions(+), 73 deletions(-)




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC][PATCH 0/7] low latency synchrounous lumpy reclaim
@ 2010-08-05  6:11 ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:11 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro


If slow usb storage is connected and run plenty io operation, lumpy
reclaim often stall in shrink_inactive_list(). This patch series try 
to solve this issue.
At least, This works fine on my desktop and usb stick environment :-)

This patch is still RFC. comment, reviewing and testing are welcome!




Wu Fengguang (1):
  vmscan: raise the bar to PAGEOUT_IO_SYNC stalls

KOSAKI Motohiro (6):
  vmscan: synchronous lumpy reclaim don't call congestion_wait()
  vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
  vmscan: narrowing synchrounous lumply reclaim condition
  vmscan: kill dead code in shrink_inactive_list()
  vmscan: remove PF_SWAPWRITE from __zone_reclaim()
  vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated

 mm/vmscan.c |  211 ++++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 138 insertions(+), 73 deletions(-)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
  2010-08-05  6:11 ` KOSAKI Motohiro
@ 2010-08-05  6:12   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:12 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

From: Wu Fengguang <fengguang.wu@intel.com>

Fix "system goes unresponsive under memory pressure and lots of
dirty/writeback pages" bug.

	http://lkml.org/lkml/2010/4/4/86

In the above thread, Andreas Mohr described that

	Invoking any command locked up for minutes (note that I'm
	talking about attempted additional I/O to the _other_,
	_unaffected_ main system HDD - such as loading some shell
	binaries -, NOT the external SSD18M!!).

This happens when the two conditions are both meet:
- under memory pressure
- writing heavily to a slow device

OOM also happens in Andreas' system. The OOM trace shows that 3
processes are stuck in wait_on_page_writeback() in the direct reclaim
path. One in do_fork() and the other two in unix_stream_sendmsg(). They
are blocked on this condition:

	(sc->order && priority < DEF_PRIORITY - 2)

which was introduced in commit 78dc583d (vmscan: low order lumpy reclaim
also should use PAGEOUT_IO_SYNC) one year ago. That condition may be too
permissive. In Andreas' case, 512MB/1024 = 512KB. If the direct reclaim
for the order-1 fork() allocation runs into a range of 512KB
hard-to-reclaim LRU pages, it will be stalled.

It's a severe problem in three ways.

Firstly, it can easily happen in daily desktop usage.  vmscan priority
can easily go below (DEF_PRIORITY - 2) on _local_ memory pressure. Even
if the system has 50% globally reclaimable pages, it still has good
opportunity to have 0.1% sized hard-to-reclaim ranges. For example, a
simple dd can easily create a big range (up to 20%) of dirty pages in
the LRU lists. And order-1 to order-3 allocations are more than common
with SLUB. Try "grep -v '1 :' /proc/slabinfo" to get the list of high
order slab caches. For example, the order-1 radix_tree_node slab cache
may stall applications at swap-in time; the order-3 inode cache on most
filesystems may stall applications when trying to read some file; the
order-2 proc_inode_cache may stall applications when trying to open a
/proc file.

Secondly, once triggered, it will stall unrelated processes (not doing IO
at all) in the system. This "one slow USB device stalls the whole system"
avalanching effect is very bad.

Thirdly, once stalled, the stall time could be intolerable long for the
users.  When there are 20MB queued writeback pages and USB 1.1 is
writing them in 1MB/s, wait_on_page_writeback() will stuck for up to 20
seconds.  Not to mention it may be called multiple times.

So raise the bar to only enable PAGEOUT_IO_SYNC when priority goes below
DEF_PRIORITY/3, or 6.25% LRU size. As the default dirty throttle ratio is
20%, it will hardly be triggered by pure dirty pages. We'd better treat
PAGEOUT_IO_SYNC as some last resort workaround -- its stall time is so
uncomfortably long (easily goes beyond 1s).

The bar is only raised for (order < PAGE_ALLOC_COSTLY_ORDER) allocations,
which are easy to satisfy in 1TB memory boxes. So, although 6.25% of
memory could be an awful lot of pages to scan on a system with 1TB of
memory, it won't really have to busy scan that much.

Andreas tested an older version of this patch and reported that it
mostly fixed his problem. Mel Gorman helped improve it and KOSAKI
Motohiro will fix it further in the next patch.

Reported-by: Andreas Mohr <andi@lisas.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |   52 +++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1b145e6..cf51d62 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1234,6 +1234,47 @@ static noinline_for_stack void update_isolated_counts(struct zone *zone,
 }
 
 /*
+ * Returns true if the caller should wait to clean dirty/writeback pages.
+ *
+ * If we are direct reclaiming for contiguous pages and we do not reclaim
+ * everything in the list, try again and wait for writeback IO to complete.
+ * This will stall high-order allocations noticeably. Only do that when really
+ * need to free the pages under high memory pressure.
+ */
+static inline bool should_reclaim_stall(unsigned long nr_taken,
+					unsigned long nr_freed,
+					int priority,
+					struct scan_control *sc)
+{
+	int lumpy_stall_priority;
+
+	/* kswapd should not stall on sync IO */
+	if (current_is_kswapd())
+		return false;
+
+	/* Only stall on lumpy reclaim */
+	if (!sc->lumpy_reclaim_mode)
+		return false;
+
+	/* If we have relaimed everything on the isolated list, no stall */
+	if (nr_freed == nr_taken)
+		return false;
+
+	/*
+	 * For high-order allocations, there are two stall thresholds.
+	 * High-cost allocations stall immediately where as lower
+	 * order allocations such as stacks require the scanning
+	 * priority to be much higher before stalling.
+	 */
+	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+		lumpy_stall_priority = DEF_PRIORITY;
+	else
+		lumpy_stall_priority = DEF_PRIORITY / 3;
+
+	return priority <= lumpy_stall_priority;
+}
+
+/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1298,16 +1339,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	nr_reclaimed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
 
-	/*
-	 * If we are direct reclaiming for contiguous pages and we do
-	 * not reclaim everything in the list, try again and wait
-	 * for IO to complete. This will stall high-order allocations
-	 * but that should be acceptable to the caller
-	 */
-	if (nr_reclaimed < nr_taken && !current_is_kswapd() &&
-			sc->lumpy_reclaim_mode) {
+	/* Check if we should syncronously wait for writeback */
+	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
-
 		/*
 		 * The attempt at page out may have made some
 		 * of the pages active, mark them inactive again.
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
@ 2010-08-05  6:12   ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:12 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

From: Wu Fengguang <fengguang.wu@intel.com>

Fix "system goes unresponsive under memory pressure and lots of
dirty/writeback pages" bug.

	http://lkml.org/lkml/2010/4/4/86

In the above thread, Andreas Mohr described that

	Invoking any command locked up for minutes (note that I'm
	talking about attempted additional I/O to the _other_,
	_unaffected_ main system HDD - such as loading some shell
	binaries -, NOT the external SSD18M!!).

This happens when the two conditions are both meet:
- under memory pressure
- writing heavily to a slow device

OOM also happens in Andreas' system. The OOM trace shows that 3
processes are stuck in wait_on_page_writeback() in the direct reclaim
path. One in do_fork() and the other two in unix_stream_sendmsg(). They
are blocked on this condition:

	(sc->order && priority < DEF_PRIORITY - 2)

which was introduced in commit 78dc583d (vmscan: low order lumpy reclaim
also should use PAGEOUT_IO_SYNC) one year ago. That condition may be too
permissive. In Andreas' case, 512MB/1024 = 512KB. If the direct reclaim
for the order-1 fork() allocation runs into a range of 512KB
hard-to-reclaim LRU pages, it will be stalled.

It's a severe problem in three ways.

Firstly, it can easily happen in daily desktop usage.  vmscan priority
can easily go below (DEF_PRIORITY - 2) on _local_ memory pressure. Even
if the system has 50% globally reclaimable pages, it still has good
opportunity to have 0.1% sized hard-to-reclaim ranges. For example, a
simple dd can easily create a big range (up to 20%) of dirty pages in
the LRU lists. And order-1 to order-3 allocations are more than common
with SLUB. Try "grep -v '1 :' /proc/slabinfo" to get the list of high
order slab caches. For example, the order-1 radix_tree_node slab cache
may stall applications at swap-in time; the order-3 inode cache on most
filesystems may stall applications when trying to read some file; the
order-2 proc_inode_cache may stall applications when trying to open a
/proc file.

Secondly, once triggered, it will stall unrelated processes (not doing IO
at all) in the system. This "one slow USB device stalls the whole system"
avalanching effect is very bad.

Thirdly, once stalled, the stall time could be intolerable long for the
users.  When there are 20MB queued writeback pages and USB 1.1 is
writing them in 1MB/s, wait_on_page_writeback() will stuck for up to 20
seconds.  Not to mention it may be called multiple times.

So raise the bar to only enable PAGEOUT_IO_SYNC when priority goes below
DEF_PRIORITY/3, or 6.25% LRU size. As the default dirty throttle ratio is
20%, it will hardly be triggered by pure dirty pages. We'd better treat
PAGEOUT_IO_SYNC as some last resort workaround -- its stall time is so
uncomfortably long (easily goes beyond 1s).

The bar is only raised for (order < PAGE_ALLOC_COSTLY_ORDER) allocations,
which are easy to satisfy in 1TB memory boxes. So, although 6.25% of
memory could be an awful lot of pages to scan on a system with 1TB of
memory, it won't really have to busy scan that much.

Andreas tested an older version of this patch and reported that it
mostly fixed his problem. Mel Gorman helped improve it and KOSAKI
Motohiro will fix it further in the next patch.

Reported-by: Andreas Mohr <andi@lisas.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |   52 +++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1b145e6..cf51d62 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1234,6 +1234,47 @@ static noinline_for_stack void update_isolated_counts(struct zone *zone,
 }
 
 /*
+ * Returns true if the caller should wait to clean dirty/writeback pages.
+ *
+ * If we are direct reclaiming for contiguous pages and we do not reclaim
+ * everything in the list, try again and wait for writeback IO to complete.
+ * This will stall high-order allocations noticeably. Only do that when really
+ * need to free the pages under high memory pressure.
+ */
+static inline bool should_reclaim_stall(unsigned long nr_taken,
+					unsigned long nr_freed,
+					int priority,
+					struct scan_control *sc)
+{
+	int lumpy_stall_priority;
+
+	/* kswapd should not stall on sync IO */
+	if (current_is_kswapd())
+		return false;
+
+	/* Only stall on lumpy reclaim */
+	if (!sc->lumpy_reclaim_mode)
+		return false;
+
+	/* If we have relaimed everything on the isolated list, no stall */
+	if (nr_freed == nr_taken)
+		return false;
+
+	/*
+	 * For high-order allocations, there are two stall thresholds.
+	 * High-cost allocations stall immediately where as lower
+	 * order allocations such as stacks require the scanning
+	 * priority to be much higher before stalling.
+	 */
+	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+		lumpy_stall_priority = DEF_PRIORITY;
+	else
+		lumpy_stall_priority = DEF_PRIORITY / 3;
+
+	return priority <= lumpy_stall_priority;
+}
+
+/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1298,16 +1339,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	nr_reclaimed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
 
-	/*
-	 * If we are direct reclaiming for contiguous pages and we do
-	 * not reclaim everything in the list, try again and wait
-	 * for IO to complete. This will stall high-order allocations
-	 * but that should be acceptable to the caller
-	 */
-	if (nr_reclaimed < nr_taken && !current_is_kswapd() &&
-			sc->lumpy_reclaim_mode) {
+	/* Check if we should syncronously wait for writeback */
+	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
-
 		/*
 		 * The attempt at page out may have made some
 		 * of the pages active, mark them inactive again.
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait()
  2010-08-05  6:11 ` KOSAKI Motohiro
@ 2010-08-05  6:13   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:13 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

congestion_wait() mean "waiting quueue congestion is cleared".
That said, if the system have plenty dirty pages and flusher thread push
new request to IO queue conteniously, IO queue are not cleared
congestion status for long time. thus, congestion_wait(HZ/10) become
almostly equivalent schedule_timeout(HZ/10).

However, synchronous lumpy reclaim donesn't need this
congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) are
using wait_on_page_writeback() and it provide sufficient waiting.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index cf51d62..1cdc3db 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1341,7 +1341,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		congestion_wait(BLK_RW_ASYNC, HZ/10);
 		/*
 		 * The attempt at page out may have made some
 		 * of the pages active, mark them inactive again.
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait()
@ 2010-08-05  6:13   ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:13 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

congestion_wait() mean "waiting quueue congestion is cleared".
That said, if the system have plenty dirty pages and flusher thread push
new request to IO queue conteniously, IO queue are not cleared
congestion status for long time. thus, congestion_wait(HZ/10) become
almostly equivalent schedule_timeout(HZ/10).

However, synchronous lumpy reclaim donesn't need this
congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) are
using wait_on_page_writeback() and it provide sufficient waiting.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index cf51d62..1cdc3db 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1341,7 +1341,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		congestion_wait(BLK_RW_ASYNC, HZ/10);
 		/*
 		 * The attempt at page out may have made some
 		 * of the pages active, mark them inactive again.
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
  2010-08-05  6:11 ` KOSAKI Motohiro
@ 2010-08-05  6:13   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:13 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

When synchrounous lumpy reclaim, there is no reason to give up to
reclaim pages even if page is locked. We use lock_page() instead
trylock_page() in this case.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1cdc3db..833b6ad 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -665,7 +665,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		page = lru_to_page(page_list);
 		list_del(&page->lru);
 
-		if (!trylock_page(page))
+		if (sync_writeback == PAGEOUT_IO_SYNC)
+			lock_page(page);
+		else if (!trylock_page(page))
 			goto keep;
 
 		VM_BUG_ON(PageActive(page));
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
@ 2010-08-05  6:13   ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:13 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

When synchrounous lumpy reclaim, there is no reason to give up to
reclaim pages even if page is locked. We use lock_page() instead
trylock_page() in this case.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1cdc3db..833b6ad 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -665,7 +665,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		page = lru_to_page(page_list);
 		list_del(&page->lru);
 
-		if (!trylock_page(page))
+		if (sync_writeback == PAGEOUT_IO_SYNC)
+			lock_page(page);
+		else if (!trylock_page(page))
 			goto keep;
 
 		VM_BUG_ON(PageActive(page));
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-08-05  6:11 ` KOSAKI Motohiro
@ 2010-08-05  6:14   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:14 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

Now, mainly shrink_page_list() give up to reclaim the page by following
10 reasons.

  1. trylock_page() failure
  2. page is unevictable
  3. zone reclaim and page is mapped
  4. PageWriteback() is true
  5. page is swapbacked and swap is full
  6. add_to_swap() failure
  7. page is dirty and gfpmask don't have GFP_IO, GFP_FS
  8. page is pinned
  9. IO queue is congested
 10. pageout() start IO, but not finished

When lumpy reclaim, all of failure cause synchronous lumpy reclaim but
It's slightly stupid. At least, case (2), (3), (5), (6), (7) and (8)
don't have any worth to retry. Then, This patch implement to disable
lumpy reclaim while reclaim processing.

Case (9) is more interesting. current behavior is,
  1. start shrink_page_list(async)
  2. found queue_congested()
  3. skip pageout write
  4. even tough start shrink_page_list(sync)
  5. wait a lot of pages
  6. again, found queue_congested()
  7. give up pageout write again

So, it's meaningless time wasting. however just skipping page reclaim
seems no good idea for high order lumpy reclaim. because, example, x86
hugepage need 512 contenious memory. They often have much dirty pages
than queue congestion threshold (~=128).

After this patch, pageout() retrieve queueue congestion following

 - If order > PAGE_ALLOC_COSTLY_ORDER
	Ignore queue congestion always.
 - If order <= PAGE_ALLOC_COSTLY_ORDER
	skip write page and disable lumpy reclaim.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |  122 ++++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 76 insertions(+), 46 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 833b6ad..f7aabd2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -51,6 +51,12 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
+enum lumpy_mode {
+	LUMPY_MODE_NONE,
+	LUMPY_MODE_ASYNC,
+	LUMPY_MODE_SYNC,
+};
+
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
 	unsigned long nr_scanned;
@@ -82,7 +88,7 @@ struct scan_control {
 	 * Intend to reclaim enough contenious memory rather than to reclaim
 	 * enough amount memory. I.e, it's the mode for high order allocation.
 	 */
-	bool lumpy_reclaim_mode;
+	enum lumpy_mode lumpy_reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -265,6 +271,36 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	return ret;
 }
 
+static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
+				   bool sync)
+{
+	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+
+	/*
+	 * Some reclaim have alredy been failed. No worth to try synchronous
+	 * lumpy reclaim.
+	 */
+	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
+		return;
+
+	/*
+	 * If we need a large contiguous chunk of memory, or have
+	 * trouble getting a small set of contiguous pages, we
+	 * will reclaim both active and inactive pages.
+	 */
+	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+		sc->lumpy_reclaim_mode = mode;
+	else if (sc->order && priority < DEF_PRIORITY - 2)
+		sc->lumpy_reclaim_mode = mode;
+	else
+		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+}
+
+static void disable_lumpy_reclaim_mode(struct scan_control *sc)
+{
+	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+}
+
 static inline int is_page_cache_freeable(struct page *page)
 {
 	/*
@@ -275,7 +311,8 @@ static inline int is_page_cache_freeable(struct page *page)
 	return page_count(page) - page_has_private(page) == 2;
 }
 
-static int may_write_to_queue(struct backing_dev_info *bdi)
+static int may_write_to_queue(struct backing_dev_info *bdi,
+			      struct scan_control *sc)
 {
 	if (current->flags & PF_SWAPWRITE)
 		return 1;
@@ -283,6 +320,10 @@ static int may_write_to_queue(struct backing_dev_info *bdi)
 		return 1;
 	if (bdi == current->backing_dev_info)
 		return 1;
+
+	/* lumpy reclaim for hugepage often need a lot of write */
+	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+		return 1;
 	return 0;
 }
 
@@ -307,12 +348,6 @@ static void handle_write_error(struct address_space *mapping,
 	unlock_page(page);
 }
 
-/* Request for sync pageout. */
-enum pageout_io {
-	PAGEOUT_IO_ASYNC,
-	PAGEOUT_IO_SYNC,
-};
-
 /* possible outcome of pageout() */
 typedef enum {
 	/* failed to write page out, page is locked */
@@ -330,7 +365,7 @@ typedef enum {
  * Calls ->writepage().
  */
 static pageout_t pageout(struct page *page, struct address_space *mapping,
-						enum pageout_io sync_writeback)
+			 struct scan_control *sc)
 {
 	/*
 	 * If the page is dirty, only perform writeback if that write
@@ -366,8 +401,10 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 	}
 	if (mapping->a_ops->writepage == NULL)
 		return PAGE_ACTIVATE;
-	if (!may_write_to_queue(mapping->backing_dev_info))
+	if (!may_write_to_queue(mapping->backing_dev_info, sc)) {
+		disable_lumpy_reclaim_mode(sc);
 		return PAGE_KEEP;
+	}
 
 	if (clear_page_dirty_for_io(page)) {
 		int res;
@@ -394,7 +431,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 		 * direct reclaiming a large contiguous area and the
 		 * first attempt to free a range of pages fails.
 		 */
-		if (PageWriteback(page) && sync_writeback == PAGEOUT_IO_SYNC)
+		if (PageWriteback(page) &&
+		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
 			wait_on_page_writeback(page);
 
 		if (!PageWriteback(page)) {
@@ -402,7 +440,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			ClearPageReclaim(page);
 		}
 		trace_mm_vmscan_writepage(page,
-			sync_writeback == PAGEOUT_IO_SYNC);
+				sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC);
 		inc_zone_page_state(page, NR_VMSCAN_WRITE);
 		return PAGE_SUCCESS;
 	}
@@ -580,7 +618,7 @@ static enum page_references page_check_references(struct page *page,
 	referenced_page = TestClearPageReferenced(page);
 
 	/* Lumpy reclaim - ignore references */
-	if (sc->lumpy_reclaim_mode)
+	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
 		return PAGEREF_RECLAIM;
 
 	/*
@@ -644,8 +682,7 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
  * shrink_page_list() returns the number of reclaimed pages
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
-					struct scan_control *sc,
-					enum pageout_io sync_writeback)
+				      struct scan_control *sc)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -665,7 +702,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		page = lru_to_page(page_list);
 		list_del(&page->lru);
 
-		if (sync_writeback == PAGEOUT_IO_SYNC)
+		if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
 			lock_page(page);
 		else if (!trylock_page(page))
 			goto keep;
@@ -696,10 +733,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 * for any page for which writeback has already
 			 * started.
 			 */
-			if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
+			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
+			    may_enter_fs)
 				wait_on_page_writeback(page);
-			else
-				goto keep_locked;
+			else {
+				unlock_page(page);
+				goto keep_lumpy;
+			}
 		}
 
 		references = page_check_references(page, sc);
@@ -753,14 +793,17 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 
 			/* Page is dirty, try to write it out here */
-			switch (pageout(page, mapping, sync_writeback)) {
+			switch (pageout(page, mapping, sc)) {
 			case PAGE_KEEP:
 				goto keep_locked;
 			case PAGE_ACTIVATE:
 				goto activate_locked;
 			case PAGE_SUCCESS:
-				if (PageWriteback(page) || PageDirty(page))
+				if (PageWriteback(page))
+					goto keep_lumpy;
+				if (PageDirty(page))
 					goto keep;
+
 				/*
 				 * A synchronous write - probably a ramdisk.  Go
 				 * ahead and try to reclaim the page.
@@ -843,6 +886,7 @@ cull_mlocked:
 			try_to_free_swap(page);
 		unlock_page(page);
 		putback_lru_page(page);
+		disable_lumpy_reclaim_mode(sc);
 		continue;
 
 activate_locked:
@@ -855,6 +899,8 @@ activate_locked:
 keep_locked:
 		unlock_page(page);
 keep:
+		disable_lumpy_reclaim_mode(sc);
+keep_lumpy:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
@@ -1255,7 +1301,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 		return false;
 
 	/* Only stall on lumpy reclaim */
-	if (!sc->lumpy_reclaim_mode)
+	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
 		return false;
 
 	/* If we have relaimed everything on the isolated list, no stall */
@@ -1300,15 +1346,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 			return SWAP_CLUSTER_MAX;
 	}
 
-
+	set_lumpy_reclaim_mode(priority, sc, false);
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode ?
-				ISOLATE_BOTH : ISOLATE_INACTIVE,
+			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+					ISOLATE_INACTIVE: ISOLATE_BOTH,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
 		if (current_is_kswapd())
@@ -1320,8 +1366,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode ?
-				ISOLATE_BOTH : ISOLATE_INACTIVE,
+			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+					ISOLATE_INACTIVE: ISOLATE_BOTH,
 			zone, sc->mem_cgroup,
 			0, file);
 		/*
@@ -1339,7 +1385,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
+	nr_reclaimed = shrink_page_list(&page_list, sc);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
@@ -1350,7 +1396,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 		nr_active = clear_active_flags(&page_list, NULL);
 		count_vm_events(PGDEACTIVATE, nr_active);
 
-		nr_reclaimed += shrink_page_list(&page_list, sc, PAGEOUT_IO_SYNC);
+		set_lumpy_reclaim_mode(priority, sc, true);
+		nr_reclaimed += shrink_page_list(&page_list, sc);
 	}
 
 	local_irq_disable();
@@ -1721,21 +1768,6 @@ out:
 	}
 }
 
-static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc)
-{
-	/*
-	 * If we need a large contiguous chunk of memory, or have
-	 * trouble getting a small set of contiguous pages, we
-	 * will reclaim both active and inactive pages.
-	 */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode = 1;
-	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode = 1;
-	else
-		sc->lumpy_reclaim_mode = 0;
-}
-
 /*
  * This is a basic per-zone page freer.  Used by both kswapd and direct reclaim.
  */
@@ -1750,8 +1782,6 @@ static void shrink_zone(int priority, struct zone *zone,
 
 	get_scan_count(zone, sc, nr, priority);
 
-	set_lumpy_reclaim_mode(priority, sc);
-
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-08-05  6:14   ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:14 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

Now, mainly shrink_page_list() give up to reclaim the page by following
10 reasons.

  1. trylock_page() failure
  2. page is unevictable
  3. zone reclaim and page is mapped
  4. PageWriteback() is true
  5. page is swapbacked and swap is full
  6. add_to_swap() failure
  7. page is dirty and gfpmask don't have GFP_IO, GFP_FS
  8. page is pinned
  9. IO queue is congested
 10. pageout() start IO, but not finished

When lumpy reclaim, all of failure cause synchronous lumpy reclaim but
It's slightly stupid. At least, case (2), (3), (5), (6), (7) and (8)
don't have any worth to retry. Then, This patch implement to disable
lumpy reclaim while reclaim processing.

Case (9) is more interesting. current behavior is,
  1. start shrink_page_list(async)
  2. found queue_congested()
  3. skip pageout write
  4. even tough start shrink_page_list(sync)
  5. wait a lot of pages
  6. again, found queue_congested()
  7. give up pageout write again

So, it's meaningless time wasting. however just skipping page reclaim
seems no good idea for high order lumpy reclaim. because, example, x86
hugepage need 512 contenious memory. They often have much dirty pages
than queue congestion threshold (~=128).

After this patch, pageout() retrieve queueue congestion following

 - If order > PAGE_ALLOC_COSTLY_ORDER
	Ignore queue congestion always.
 - If order <= PAGE_ALLOC_COSTLY_ORDER
	skip write page and disable lumpy reclaim.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |  122 ++++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 76 insertions(+), 46 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 833b6ad..f7aabd2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -51,6 +51,12 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
 
+enum lumpy_mode {
+	LUMPY_MODE_NONE,
+	LUMPY_MODE_ASYNC,
+	LUMPY_MODE_SYNC,
+};
+
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
 	unsigned long nr_scanned;
@@ -82,7 +88,7 @@ struct scan_control {
 	 * Intend to reclaim enough contenious memory rather than to reclaim
 	 * enough amount memory. I.e, it's the mode for high order allocation.
 	 */
-	bool lumpy_reclaim_mode;
+	enum lumpy_mode lumpy_reclaim_mode;
 
 	/* Which cgroup do we reclaim from */
 	struct mem_cgroup *mem_cgroup;
@@ -265,6 +271,36 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	return ret;
 }
 
+static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
+				   bool sync)
+{
+	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
+
+	/*
+	 * Some reclaim have alredy been failed. No worth to try synchronous
+	 * lumpy reclaim.
+	 */
+	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
+		return;
+
+	/*
+	 * If we need a large contiguous chunk of memory, or have
+	 * trouble getting a small set of contiguous pages, we
+	 * will reclaim both active and inactive pages.
+	 */
+	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+		sc->lumpy_reclaim_mode = mode;
+	else if (sc->order && priority < DEF_PRIORITY - 2)
+		sc->lumpy_reclaim_mode = mode;
+	else
+		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+}
+
+static void disable_lumpy_reclaim_mode(struct scan_control *sc)
+{
+	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
+}
+
 static inline int is_page_cache_freeable(struct page *page)
 {
 	/*
@@ -275,7 +311,8 @@ static inline int is_page_cache_freeable(struct page *page)
 	return page_count(page) - page_has_private(page) == 2;
 }
 
-static int may_write_to_queue(struct backing_dev_info *bdi)
+static int may_write_to_queue(struct backing_dev_info *bdi,
+			      struct scan_control *sc)
 {
 	if (current->flags & PF_SWAPWRITE)
 		return 1;
@@ -283,6 +320,10 @@ static int may_write_to_queue(struct backing_dev_info *bdi)
 		return 1;
 	if (bdi == current->backing_dev_info)
 		return 1;
+
+	/* lumpy reclaim for hugepage often need a lot of write */
+	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+		return 1;
 	return 0;
 }
 
@@ -307,12 +348,6 @@ static void handle_write_error(struct address_space *mapping,
 	unlock_page(page);
 }
 
-/* Request for sync pageout. */
-enum pageout_io {
-	PAGEOUT_IO_ASYNC,
-	PAGEOUT_IO_SYNC,
-};
-
 /* possible outcome of pageout() */
 typedef enum {
 	/* failed to write page out, page is locked */
@@ -330,7 +365,7 @@ typedef enum {
  * Calls ->writepage().
  */
 static pageout_t pageout(struct page *page, struct address_space *mapping,
-						enum pageout_io sync_writeback)
+			 struct scan_control *sc)
 {
 	/*
 	 * If the page is dirty, only perform writeback if that write
@@ -366,8 +401,10 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 	}
 	if (mapping->a_ops->writepage == NULL)
 		return PAGE_ACTIVATE;
-	if (!may_write_to_queue(mapping->backing_dev_info))
+	if (!may_write_to_queue(mapping->backing_dev_info, sc)) {
+		disable_lumpy_reclaim_mode(sc);
 		return PAGE_KEEP;
+	}
 
 	if (clear_page_dirty_for_io(page)) {
 		int res;
@@ -394,7 +431,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 		 * direct reclaiming a large contiguous area and the
 		 * first attempt to free a range of pages fails.
 		 */
-		if (PageWriteback(page) && sync_writeback == PAGEOUT_IO_SYNC)
+		if (PageWriteback(page) &&
+		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
 			wait_on_page_writeback(page);
 
 		if (!PageWriteback(page)) {
@@ -402,7 +440,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 			ClearPageReclaim(page);
 		}
 		trace_mm_vmscan_writepage(page,
-			sync_writeback == PAGEOUT_IO_SYNC);
+				sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC);
 		inc_zone_page_state(page, NR_VMSCAN_WRITE);
 		return PAGE_SUCCESS;
 	}
@@ -580,7 +618,7 @@ static enum page_references page_check_references(struct page *page,
 	referenced_page = TestClearPageReferenced(page);
 
 	/* Lumpy reclaim - ignore references */
-	if (sc->lumpy_reclaim_mode)
+	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
 		return PAGEREF_RECLAIM;
 
 	/*
@@ -644,8 +682,7 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
  * shrink_page_list() returns the number of reclaimed pages
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
-					struct scan_control *sc,
-					enum pageout_io sync_writeback)
+				      struct scan_control *sc)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -665,7 +702,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		page = lru_to_page(page_list);
 		list_del(&page->lru);
 
-		if (sync_writeback == PAGEOUT_IO_SYNC)
+		if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
 			lock_page(page);
 		else if (!trylock_page(page))
 			goto keep;
@@ -696,10 +733,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			 * for any page for which writeback has already
 			 * started.
 			 */
-			if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
+			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
+			    may_enter_fs)
 				wait_on_page_writeback(page);
-			else
-				goto keep_locked;
+			else {
+				unlock_page(page);
+				goto keep_lumpy;
+			}
 		}
 
 		references = page_check_references(page, sc);
@@ -753,14 +793,17 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 
 			/* Page is dirty, try to write it out here */
-			switch (pageout(page, mapping, sync_writeback)) {
+			switch (pageout(page, mapping, sc)) {
 			case PAGE_KEEP:
 				goto keep_locked;
 			case PAGE_ACTIVATE:
 				goto activate_locked;
 			case PAGE_SUCCESS:
-				if (PageWriteback(page) || PageDirty(page))
+				if (PageWriteback(page))
+					goto keep_lumpy;
+				if (PageDirty(page))
 					goto keep;
+
 				/*
 				 * A synchronous write - probably a ramdisk.  Go
 				 * ahead and try to reclaim the page.
@@ -843,6 +886,7 @@ cull_mlocked:
 			try_to_free_swap(page);
 		unlock_page(page);
 		putback_lru_page(page);
+		disable_lumpy_reclaim_mode(sc);
 		continue;
 
 activate_locked:
@@ -855,6 +899,8 @@ activate_locked:
 keep_locked:
 		unlock_page(page);
 keep:
+		disable_lumpy_reclaim_mode(sc);
+keep_lumpy:
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
 	}
@@ -1255,7 +1301,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
 		return false;
 
 	/* Only stall on lumpy reclaim */
-	if (!sc->lumpy_reclaim_mode)
+	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
 		return false;
 
 	/* If we have relaimed everything on the isolated list, no stall */
@@ -1300,15 +1346,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 			return SWAP_CLUSTER_MAX;
 	}
 
-
+	set_lumpy_reclaim_mode(priority, sc, false);
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
 
 	if (scanning_global_lru(sc)) {
 		nr_taken = isolate_pages_global(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode ?
-				ISOLATE_BOTH : ISOLATE_INACTIVE,
+			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+					ISOLATE_INACTIVE: ISOLATE_BOTH,
 			zone, 0, file);
 		zone->pages_scanned += nr_scanned;
 		if (current_is_kswapd())
@@ -1320,8 +1366,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	} else {
 		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
 			&page_list, &nr_scanned, sc->order,
-			sc->lumpy_reclaim_mode ?
-				ISOLATE_BOTH : ISOLATE_INACTIVE,
+			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
+					ISOLATE_INACTIVE: ISOLATE_BOTH,
 			zone, sc->mem_cgroup,
 			0, file);
 		/*
@@ -1339,7 +1385,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	spin_unlock_irq(&zone->lru_lock);
 
-	nr_reclaimed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
+	nr_reclaimed = shrink_page_list(&page_list, sc);
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
@@ -1350,7 +1396,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 		nr_active = clear_active_flags(&page_list, NULL);
 		count_vm_events(PGDEACTIVATE, nr_active);
 
-		nr_reclaimed += shrink_page_list(&page_list, sc, PAGEOUT_IO_SYNC);
+		set_lumpy_reclaim_mode(priority, sc, true);
+		nr_reclaimed += shrink_page_list(&page_list, sc);
 	}
 
 	local_irq_disable();
@@ -1721,21 +1768,6 @@ out:
 	}
 }
 
-static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc)
-{
-	/*
-	 * If we need a large contiguous chunk of memory, or have
-	 * trouble getting a small set of contiguous pages, we
-	 * will reclaim both active and inactive pages.
-	 */
-	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
-		sc->lumpy_reclaim_mode = 1;
-	else if (sc->order && priority < DEF_PRIORITY - 2)
-		sc->lumpy_reclaim_mode = 1;
-	else
-		sc->lumpy_reclaim_mode = 0;
-}
-
 /*
  * This is a basic per-zone page freer.  Used by both kswapd and direct reclaim.
  */
@@ -1750,8 +1782,6 @@ static void shrink_zone(int priority, struct zone *zone,
 
 	get_scan_count(zone, sc, nr, priority);
 
-	set_lumpy_reclaim_mode(priority, sc);
-
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 5/7] vmscan: kill dead code in shrink_inactive_list()
  2010-08-05  6:11 ` KOSAKI Motohiro
@ 2010-08-05  6:14   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:14 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

When synchrounous lumy reclaim occur, page_list have gurantee to
don't have active page because now page activation in shrink_page_list()
always disable lumpy reclaim.

Then, This patch remove virtual dead code.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f7aabd2..f21dbeb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1334,7 +1334,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	unsigned long nr_scanned;
 	unsigned long nr_reclaimed = 0;
 	unsigned long nr_taken;
-	unsigned long nr_active;
 	unsigned long nr_anon;
 	unsigned long nr_file;
 
@@ -1389,13 +1388,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		/*
-		 * The attempt at page out may have made some
-		 * of the pages active, mark them inactive again.
-		 */
-		nr_active = clear_active_flags(&page_list, NULL);
-		count_vm_events(PGDEACTIVATE, nr_active);
-
 		set_lumpy_reclaim_mode(priority, sc, true);
 		nr_reclaimed += shrink_page_list(&page_list, sc);
 	}
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 5/7] vmscan: kill dead code in shrink_inactive_list()
@ 2010-08-05  6:14   ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:14 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

When synchrounous lumy reclaim occur, page_list have gurantee to
don't have active page because now page activation in shrink_page_list()
always disable lumpy reclaim.

Then, This patch remove virtual dead code.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f7aabd2..f21dbeb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1334,7 +1334,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 	unsigned long nr_scanned;
 	unsigned long nr_reclaimed = 0;
 	unsigned long nr_taken;
-	unsigned long nr_active;
 	unsigned long nr_anon;
 	unsigned long nr_file;
 
@@ -1389,13 +1388,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
 
 	/* Check if we should syncronously wait for writeback */
 	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
-		/*
-		 * The attempt at page out may have made some
-		 * of the pages active, mark them inactive again.
-		 */
-		nr_active = clear_active_flags(&page_list, NULL);
-		count_vm_events(PGDEACTIVATE, nr_active);
-
 		set_lumpy_reclaim_mode(priority, sc, true);
 		nr_reclaimed += shrink_page_list(&page_list, sc);
 	}
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 6/7] vmscan: remove PF_SWAPWRITE from __zone_reclaim()
  2010-08-05  6:11 ` KOSAKI Motohiro
@ 2010-08-05  6:15   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:15 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

When zone_reclaim() was merged at first, It always order-0
allocation with PF_SWAPWRITE flag. (probably they are both
unintional).

Former was fixed commit bd2f6199cf (vmscan: respect higher
order in zone_reclaim()). However it expose latter mistake
because !order-0 reclaim often cause dirty pages isolation
and large latency.

Now, we have good opportunity to fix latter too.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f21dbeb..e043e97 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2663,7 +2663,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	 * and we also need to be able to write out pages for RECLAIM_WRITE
 	 * and RECLAIM_SWAP.
 	 */
-	p->flags |= PF_MEMALLOC | PF_SWAPWRITE;
+	p->flags |= PF_MEMALLOC;
 	lockdep_set_current_reclaim_state(gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
@@ -2716,7 +2716,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	}
 
 	p->reclaim_state = NULL;
-	current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
+	current->flags &= ~PF_MEMALLOC;
 	lockdep_clear_current_reclaim_state();
 	return sc.nr_reclaimed >= nr_pages;
 }
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 6/7] vmscan: remove PF_SWAPWRITE from __zone_reclaim()
@ 2010-08-05  6:15   ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:15 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

When zone_reclaim() was merged at first, It always order-0
allocation with PF_SWAPWRITE flag. (probably they are both
unintional).

Former was fixed commit bd2f6199cf (vmscan: respect higher
order in zone_reclaim()). However it expose latter mistake
because !order-0 reclaim often cause dirty pages isolation
and large latency.

Now, we have good opportunity to fix latter too.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f21dbeb..e043e97 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2663,7 +2663,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	 * and we also need to be able to write out pages for RECLAIM_WRITE
 	 * and RECLAIM_SWAP.
 	 */
-	p->flags |= PF_MEMALLOC | PF_SWAPWRITE;
+	p->flags |= PF_MEMALLOC;
 	lockdep_set_current_reclaim_state(gfp_mask);
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
@@ -2716,7 +2716,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	}
 
 	p->reclaim_state = NULL;
-	current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
+	current->flags &= ~PF_MEMALLOC;
 	lockdep_clear_current_reclaim_state();
 	return sc.nr_reclaimed >= nr_pages;
 }
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 7/7] vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated
  2010-08-05  6:11 ` KOSAKI Motohiro
@ 2010-08-05  6:16   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:16 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

isolate_lru_pages() doesn't only isolate LRU tail pages, but also
isolate neighbor pages of the eviction page.

Now, the neighbor search don't stop even if neighbors can't be isolated.
It is silly. successful higher order allocation need full contenious
memory, even though only one page reclaim failure mean to fail making
enough contenious memory.

Then, isolate_lru_pages() should stop to search PFN neighbor pages and
try to search next page on LRU soon. This patch does it. Also all of
lumpy reclaim failure account nr_lumpy_failed.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e043e97..264addc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1047,14 +1047,18 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 				continue;
 
 			/* Avoid holes within the zone. */
-			if (unlikely(!pfn_valid_within(pfn)))
+			if (unlikely(!pfn_valid_within(pfn))) {
+				nr_lumpy_failed++;
 				break;
+			}
 
 			cursor_page = pfn_to_page(pfn);
 
 			/* Check that we have not crossed a zone boundary. */
-			if (unlikely(page_zone_id(cursor_page) != zone_id))
-				continue;
+			if (unlikely(page_zone_id(cursor_page) != zone_id)) {
+				nr_lumpy_failed++;
+				break;
+			}
 
 			/*
 			 * If we don't have enough swap space, reclaiming of
@@ -1062,8 +1066,10 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			 * pointless.
 			 */
 			if (nr_swap_pages <= 0 && PageAnon(cursor_page) &&
-					!PageSwapCache(cursor_page))
-				continue;
+			    !PageSwapCache(cursor_page)) {
+				nr_lumpy_failed++;
+				break;
+			}
 
 			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
 				list_move(&cursor_page->lru, dst);
@@ -1074,9 +1080,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 					nr_lumpy_dirty++;
 				scan++;
 			} else {
-				if (mode == ISOLATE_BOTH &&
-						page_count(cursor_page))
-					nr_lumpy_failed++;
+				/* the page is freed already. */
+				if (!page_count(cursor_page))
+					continue;
+				nr_lumpy_failed++;
+				break;
 			}
 		}
 	}
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 7/7] vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated
@ 2010-08-05  6:16   ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-05  6:16 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel
  Cc: kosaki.motohiro

isolate_lru_pages() doesn't only isolate LRU tail pages, but also
isolate neighbor pages of the eviction page.

Now, the neighbor search don't stop even if neighbors can't be isolated.
It is silly. successful higher order allocation need full contenious
memory, even though only one page reclaim failure mean to fail making
enough contenious memory.

Then, isolate_lru_pages() should stop to search PFN neighbor pages and
try to search next page on LRU soon. This patch does it. Also all of
lumpy reclaim failure account nr_lumpy_failed.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e043e97..264addc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1047,14 +1047,18 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 				continue;
 
 			/* Avoid holes within the zone. */
-			if (unlikely(!pfn_valid_within(pfn)))
+			if (unlikely(!pfn_valid_within(pfn))) {
+				nr_lumpy_failed++;
 				break;
+			}
 
 			cursor_page = pfn_to_page(pfn);
 
 			/* Check that we have not crossed a zone boundary. */
-			if (unlikely(page_zone_id(cursor_page) != zone_id))
-				continue;
+			if (unlikely(page_zone_id(cursor_page) != zone_id)) {
+				nr_lumpy_failed++;
+				break;
+			}
 
 			/*
 			 * If we don't have enough swap space, reclaiming of
@@ -1062,8 +1066,10 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			 * pointless.
 			 */
 			if (nr_swap_pages <= 0 && PageAnon(cursor_page) &&
-					!PageSwapCache(cursor_page))
-				continue;
+			    !PageSwapCache(cursor_page)) {
+				nr_lumpy_failed++;
+				break;
+			}
 
 			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
 				list_move(&cursor_page->lru, dst);
@@ -1074,9 +1080,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 					nr_lumpy_dirty++;
 				scan++;
 			} else {
-				if (mode == ISOLATE_BOTH &&
-						page_count(cursor_page))
-					nr_lumpy_failed++;
+				/* the page is freed already. */
+				if (!page_count(cursor_page))
+					continue;
+				nr_lumpy_failed++;
+				break;
 			}
 		}
 	}
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait()
  2010-08-05  6:13   ` KOSAKI Motohiro
@ 2010-08-05 13:55     ` Minchan Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 13:55 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:13:03PM +0900, KOSAKI Motohiro wrote:
> congestion_wait() mean "waiting quueue congestion is cleared".
> That said, if the system have plenty dirty pages and flusher thread push
> new request to IO queue conteniously, IO queue are not cleared
> congestion status for long time. thus, congestion_wait(HZ/10) become
> almostly equivalent schedule_timeout(HZ/10).
> 
> However, synchronous lumpy reclaim donesn't need this
> congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) are
> using wait_on_page_writeback() and it provide sufficient waiting.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait()
@ 2010-08-05 13:55     ` Minchan Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 13:55 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:13:03PM +0900, KOSAKI Motohiro wrote:
> congestion_wait() mean "waiting quueue congestion is cleared".
> That said, if the system have plenty dirty pages and flusher thread push
> new request to IO queue conteniously, IO queue are not cleared
> congestion status for long time. thus, congestion_wait(HZ/10) become
> almostly equivalent schedule_timeout(HZ/10).
> 
> However, synchronous lumpy reclaim donesn't need this
> congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) are
> using wait_on_page_writeback() and it provide sufficient waiting.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
  2010-08-05  6:13   ` KOSAKI Motohiro
@ 2010-08-05 14:17     ` Minchan Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 14:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:13:39PM +0900, KOSAKI Motohiro wrote:
> When synchrounous lumpy reclaim, there is no reason to give up to
> reclaim pages even if page is locked. We use lock_page() instead
> trylock_page() in this case.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/vmscan.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1cdc3db..833b6ad 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -665,7 +665,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> -		if (!trylock_page(page))
> +		if (sync_writeback == PAGEOUT_IO_SYNC)
> +			lock_page(page);
> +		else if (!trylock_page(page))
>  			goto keep;
>  
>  		VM_BUG_ON(PageActive(page));
> -- 
> 1.6.5.2
> 
> 
> 

Hmm. We can make sure lumpy already doesn't select the page locked?
I mean below scenario. 

LRU head -> page A -> page B -> LRU tail

lock_page(page A)
some_function()
direct reclaim
select victim page B
enter lumpy mode 
select victim page A as well as page B
shrink_page_list
lock_page(page A)


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
@ 2010-08-05 14:17     ` Minchan Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 14:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:13:39PM +0900, KOSAKI Motohiro wrote:
> When synchrounous lumpy reclaim, there is no reason to give up to
> reclaim pages even if page is locked. We use lock_page() instead
> trylock_page() in this case.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/vmscan.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1cdc3db..833b6ad 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -665,7 +665,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> -		if (!trylock_page(page))
> +		if (sync_writeback == PAGEOUT_IO_SYNC)
> +			lock_page(page);
> +		else if (!trylock_page(page))
>  			goto keep;
>  
>  		VM_BUG_ON(PageActive(page));
> -- 
> 1.6.5.2
> 
> 
> 

Hmm. We can make sure lumpy already doesn't select the page locked?
I mean below scenario. 

LRU head -> page A -> page B -> LRU tail

lock_page(page A)
some_function()
direct reclaim
select victim page B
enter lumpy mode 
select victim page A as well as page B
shrink_page_list
lock_page(page A)


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-08-05  6:14   ` KOSAKI Motohiro
@ 2010-08-05 14:59     ` Minchan Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 14:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:14:13PM +0900, KOSAKI Motohiro wrote:
> Now, mainly shrink_page_list() give up to reclaim the page by following
> 10 reasons.
> 
>   1. trylock_page() failure
>   2. page is unevictable
>   3. zone reclaim and page is mapped
>   4. PageWriteback() is true
>   5. page is swapbacked and swap is full
>   6. add_to_swap() failure
>   7. page is dirty and gfpmask don't have GFP_IO, GFP_FS
>   8. page is pinned
>   9. IO queue is congested
>  10. pageout() start IO, but not finished
> 
> When lumpy reclaim, all of failure cause synchronous lumpy reclaim but
> It's slightly stupid. At least, case (2), (3), (5), (6), (7) and (8)
> don't have any worth to retry. Then, This patch implement to disable
> lumpy reclaim while reclaim processing.

Agree. 

> 
> Case (9) is more interesting. current behavior is,
>   1. start shrink_page_list(async)
>   2. found queue_congested()
>   3. skip pageout write
>   4. even tough start shrink_page_list(sync)
>   5. wait a lot of pages
>   6. again, found queue_congested()
>   7. give up pageout write again
> 
> So, it's meaningless time wasting. however just skipping page reclaim
> seems no good idea for high order lumpy reclaim. because, example, x86
> hugepage need 512 contenious memory. They often have much dirty pages
> than queue congestion threshold (~=128).
> 
> After this patch, pageout() retrieve queueue congestion following
> 
>  - If order > PAGE_ALLOC_COSTLY_ORDER
> 	Ignore queue congestion always.
>  - If order <= PAGE_ALLOC_COSTLY_ORDER
> 	skip write page and disable lumpy reclaim.
> 

If Mel's patch which skip pageout in direct reclaim works well, still we need this?
file pages doesn't page out any more in direct reclaim path. 
AFAIK, swap page's writeout doesn't consider queue_congested as we discuss it 
at Wu's thread on some day ago. so it's no problem. 

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/vmscan.c |  122 ++++++++++++++++++++++++++++++++++++----------------------
>  1 files changed, 76 insertions(+), 46 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 833b6ad..f7aabd2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -51,6 +51,12 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/vmscan.h>
>  
> +enum lumpy_mode {
> +	LUMPY_MODE_NONE,
> +	LUMPY_MODE_ASYNC,
> +	LUMPY_MODE_SYNC,
> +};
> +
>  struct scan_control {
>  	/* Incremented by the number of inactive pages that were scanned */
>  	unsigned long nr_scanned;
> @@ -82,7 +88,7 @@ struct scan_control {
>  	 * Intend to reclaim enough contenious memory rather than to reclaim
>  	 * enough amount memory. I.e, it's the mode for high order allocation.
>  	 */
> -	bool lumpy_reclaim_mode;
> +	enum lumpy_mode lumpy_reclaim_mode;
>  
>  	/* Which cgroup do we reclaim from */
>  	struct mem_cgroup *mem_cgroup;
> @@ -265,6 +271,36 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>  	return ret;
>  }
>  
> +static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> +				   bool sync)
> +{
> +	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> +
> +	/*
> +	 * Some reclaim have alredy been failed. No worth to try synchronous
> +	 * lumpy reclaim.
> +	 */
> +	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> +		return;
> +
> +	/*
> +	 * If we need a large contiguous chunk of memory, or have
> +	 * trouble getting a small set of contiguous pages, we
> +	 * will reclaim both active and inactive pages.
> +	 */
> +	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +		sc->lumpy_reclaim_mode = mode;
> +	else if (sc->order && priority < DEF_PRIORITY - 2)
> +		sc->lumpy_reclaim_mode = mode;
> +	else
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}
> +
> +static void disable_lumpy_reclaim_mode(struct scan_control *sc)
> +{
> +	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}
> +
>  static inline int is_page_cache_freeable(struct page *page)
>  {
>  	/*
> @@ -275,7 +311,8 @@ static inline int is_page_cache_freeable(struct page *page)
>  	return page_count(page) - page_has_private(page) == 2;
>  }
>  
> -static int may_write_to_queue(struct backing_dev_info *bdi)
> +static int may_write_to_queue(struct backing_dev_info *bdi,
> +			      struct scan_control *sc)
>  {
>  	if (current->flags & PF_SWAPWRITE)
>  		return 1;
> @@ -283,6 +320,10 @@ static int may_write_to_queue(struct backing_dev_info *bdi)
>  		return 1;
>  	if (bdi == current->backing_dev_info)
>  		return 1;
> +
> +	/* lumpy reclaim for hugepage often need a lot of write */
> +	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +		return 1;
>  	return 0;
>  }
>  
> @@ -307,12 +348,6 @@ static void handle_write_error(struct address_space *mapping,
>  	unlock_page(page);
>  }
>  
> -/* Request for sync pageout. */
> -enum pageout_io {
> -	PAGEOUT_IO_ASYNC,
> -	PAGEOUT_IO_SYNC,
> -};
> -
>  /* possible outcome of pageout() */
>  typedef enum {
>  	/* failed to write page out, page is locked */
> @@ -330,7 +365,7 @@ typedef enum {
>   * Calls ->writepage().
>   */
>  static pageout_t pageout(struct page *page, struct address_space *mapping,
> -						enum pageout_io sync_writeback)
> +			 struct scan_control *sc)
>  {
>  	/*
>  	 * If the page is dirty, only perform writeback if that write
> @@ -366,8 +401,10 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  	}
>  	if (mapping->a_ops->writepage == NULL)
>  		return PAGE_ACTIVATE;
> -	if (!may_write_to_queue(mapping->backing_dev_info))
> +	if (!may_write_to_queue(mapping->backing_dev_info, sc)) {
> +		disable_lumpy_reclaim_mode(sc);
>  		return PAGE_KEEP;
> +	}
>  
>  	if (clear_page_dirty_for_io(page)) {
>  		int res;
> @@ -394,7 +431,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  		 * direct reclaiming a large contiguous area and the
>  		 * first attempt to free a range of pages fails.
>  		 */
> -		if (PageWriteback(page) && sync_writeback == PAGEOUT_IO_SYNC)
> +		if (PageWriteback(page) &&
> +		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
>  			wait_on_page_writeback(page);
>  
>  		if (!PageWriteback(page)) {
> @@ -402,7 +440,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  			ClearPageReclaim(page);
>  		}
>  		trace_mm_vmscan_writepage(page,
> -			sync_writeback == PAGEOUT_IO_SYNC);
> +				sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC);
>  		inc_zone_page_state(page, NR_VMSCAN_WRITE);
>  		return PAGE_SUCCESS;
>  	}
> @@ -580,7 +618,7 @@ static enum page_references page_check_references(struct page *page,
>  	referenced_page = TestClearPageReferenced(page);
>  
>  	/* Lumpy reclaim - ignore references */
> -	if (sc->lumpy_reclaim_mode)
> +	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
>  		return PAGEREF_RECLAIM;
>  
>  	/*
> @@ -644,8 +682,7 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
>   * shrink_page_list() returns the number of reclaimed pages
>   */
>  static unsigned long shrink_page_list(struct list_head *page_list,
> -					struct scan_control *sc,
> -					enum pageout_io sync_writeback)
> +				      struct scan_control *sc)
>  {
>  	LIST_HEAD(ret_pages);
>  	LIST_HEAD(free_pages);
> @@ -665,7 +702,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> -		if (sync_writeback == PAGEOUT_IO_SYNC)
> +		if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
>  			lock_page(page);
>  		else if (!trylock_page(page))
>  			goto keep;
> @@ -696,10 +733,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  			 * for any page for which writeback has already
>  			 * started.
>  			 */
> -			if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
> +			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
> +			    may_enter_fs)
>  				wait_on_page_writeback(page);
> -			else
> -				goto keep_locked;
> +			else {
> +				unlock_page(page);
> +				goto keep_lumpy;
> +			}
>  		}
>  
>  		references = page_check_references(page, sc);
> @@ -753,14 +793,17 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  				goto keep_locked;
>  
>  			/* Page is dirty, try to write it out here */
> -			switch (pageout(page, mapping, sync_writeback)) {
> +			switch (pageout(page, mapping, sc)) {
>  			case PAGE_KEEP:
>  				goto keep_locked;
>  			case PAGE_ACTIVATE:
>  				goto activate_locked;
>  			case PAGE_SUCCESS:
> -				if (PageWriteback(page) || PageDirty(page))
> +				if (PageWriteback(page))
> +					goto keep_lumpy;
> +				if (PageDirty(page))
>  					goto keep;
> +
>  				/*
>  				 * A synchronous write - probably a ramdisk.  Go
>  				 * ahead and try to reclaim the page.
> @@ -843,6 +886,7 @@ cull_mlocked:
>  			try_to_free_swap(page);
>  		unlock_page(page);
>  		putback_lru_page(page);
> +		disable_lumpy_reclaim_mode(sc);
>  		continue;
>  
>  activate_locked:
> @@ -855,6 +899,8 @@ activate_locked:
>  keep_locked:
>  		unlock_page(page);
>  keep:
> +		disable_lumpy_reclaim_mode(sc);
> +keep_lumpy:
>  		list_add(&page->lru, &ret_pages);
>  		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
>  	}
> @@ -1255,7 +1301,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
>  		return false;
>  
>  	/* Only stall on lumpy reclaim */
> -	if (!sc->lumpy_reclaim_mode)
> +	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
>  		return false;
>  
>  	/* If we have relaimed everything on the isolated list, no stall */
> @@ -1300,15 +1346,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  			return SWAP_CLUSTER_MAX;
>  	}
>  
> -
> +	set_lumpy_reclaim_mode(priority, sc, false);
>  	lru_add_drain();
>  	spin_lock_irq(&zone->lru_lock);
>  
>  	if (scanning_global_lru(sc)) {
>  		nr_taken = isolate_pages_global(nr_to_scan,
>  			&page_list, &nr_scanned, sc->order,
> -			sc->lumpy_reclaim_mode ?
> -				ISOLATE_BOTH : ISOLATE_INACTIVE,
> +			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
> +					ISOLATE_INACTIVE: ISOLATE_BOTH,
>  			zone, 0, file);
>  		zone->pages_scanned += nr_scanned;
>  		if (current_is_kswapd())
> @@ -1320,8 +1366,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  	} else {
>  		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
>  			&page_list, &nr_scanned, sc->order,
> -			sc->lumpy_reclaim_mode ?
> -				ISOLATE_BOTH : ISOLATE_INACTIVE,
> +			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
> +					ISOLATE_INACTIVE: ISOLATE_BOTH,
>  			zone, sc->mem_cgroup,
>  			0, file);
>  		/*
> @@ -1339,7 +1385,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  
>  	spin_unlock_irq(&zone->lru_lock);
>  
> -	nr_reclaimed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
> +	nr_reclaimed = shrink_page_list(&page_list, sc);
>  
>  	/* Check if we should syncronously wait for writeback */
>  	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
> @@ -1350,7 +1396,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  		nr_active = clear_active_flags(&page_list, NULL);
>  		count_vm_events(PGDEACTIVATE, nr_active);
>  
> -		nr_reclaimed += shrink_page_list(&page_list, sc, PAGEOUT_IO_SYNC);
> +		set_lumpy_reclaim_mode(priority, sc, true);
> +		nr_reclaimed += shrink_page_list(&page_list, sc);
>  	}
>  
>  	local_irq_disable();
> @@ -1721,21 +1768,6 @@ out:
>  	}
>  }
>  
> -static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc)
> -{
> -	/*
> -	 * If we need a large contiguous chunk of memory, or have
> -	 * trouble getting a small set of contiguous pages, we
> -	 * will reclaim both active and inactive pages.
> -	 */
> -	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> -		sc->lumpy_reclaim_mode = 1;
> -	else if (sc->order && priority < DEF_PRIORITY - 2)
> -		sc->lumpy_reclaim_mode = 1;
> -	else
> -		sc->lumpy_reclaim_mode = 0;
> -}
> -
>  /*
>   * This is a basic per-zone page freer.  Used by both kswapd and direct reclaim.
>   */
> @@ -1750,8 +1782,6 @@ static void shrink_zone(int priority, struct zone *zone,
>  
>  	get_scan_count(zone, sc, nr, priority);
>  
> -	set_lumpy_reclaim_mode(priority, sc);
> -
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {
>  		for_each_evictable_lru(l) {
> -- 
> 1.6.5.2
> 
> 
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-08-05 14:59     ` Minchan Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 14:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:14:13PM +0900, KOSAKI Motohiro wrote:
> Now, mainly shrink_page_list() give up to reclaim the page by following
> 10 reasons.
> 
>   1. trylock_page() failure
>   2. page is unevictable
>   3. zone reclaim and page is mapped
>   4. PageWriteback() is true
>   5. page is swapbacked and swap is full
>   6. add_to_swap() failure
>   7. page is dirty and gfpmask don't have GFP_IO, GFP_FS
>   8. page is pinned
>   9. IO queue is congested
>  10. pageout() start IO, but not finished
> 
> When lumpy reclaim, all of failure cause synchronous lumpy reclaim but
> It's slightly stupid. At least, case (2), (3), (5), (6), (7) and (8)
> don't have any worth to retry. Then, This patch implement to disable
> lumpy reclaim while reclaim processing.

Agree. 

> 
> Case (9) is more interesting. current behavior is,
>   1. start shrink_page_list(async)
>   2. found queue_congested()
>   3. skip pageout write
>   4. even tough start shrink_page_list(sync)
>   5. wait a lot of pages
>   6. again, found queue_congested()
>   7. give up pageout write again
> 
> So, it's meaningless time wasting. however just skipping page reclaim
> seems no good idea for high order lumpy reclaim. because, example, x86
> hugepage need 512 contenious memory. They often have much dirty pages
> than queue congestion threshold (~=128).
> 
> After this patch, pageout() retrieve queueue congestion following
> 
>  - If order > PAGE_ALLOC_COSTLY_ORDER
> 	Ignore queue congestion always.
>  - If order <= PAGE_ALLOC_COSTLY_ORDER
> 	skip write page and disable lumpy reclaim.
> 

If Mel's patch which skip pageout in direct reclaim works well, still we need this?
file pages doesn't page out any more in direct reclaim path. 
AFAIK, swap page's writeout doesn't consider queue_congested as we discuss it 
at Wu's thread on some day ago. so it's no problem. 

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/vmscan.c |  122 ++++++++++++++++++++++++++++++++++++----------------------
>  1 files changed, 76 insertions(+), 46 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 833b6ad..f7aabd2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -51,6 +51,12 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/vmscan.h>
>  
> +enum lumpy_mode {
> +	LUMPY_MODE_NONE,
> +	LUMPY_MODE_ASYNC,
> +	LUMPY_MODE_SYNC,
> +};
> +
>  struct scan_control {
>  	/* Incremented by the number of inactive pages that were scanned */
>  	unsigned long nr_scanned;
> @@ -82,7 +88,7 @@ struct scan_control {
>  	 * Intend to reclaim enough contenious memory rather than to reclaim
>  	 * enough amount memory. I.e, it's the mode for high order allocation.
>  	 */
> -	bool lumpy_reclaim_mode;
> +	enum lumpy_mode lumpy_reclaim_mode;
>  
>  	/* Which cgroup do we reclaim from */
>  	struct mem_cgroup *mem_cgroup;
> @@ -265,6 +271,36 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>  	return ret;
>  }
>  
> +static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> +				   bool sync)
> +{
> +	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> +
> +	/*
> +	 * Some reclaim have alredy been failed. No worth to try synchronous
> +	 * lumpy reclaim.
> +	 */
> +	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> +		return;
> +
> +	/*
> +	 * If we need a large contiguous chunk of memory, or have
> +	 * trouble getting a small set of contiguous pages, we
> +	 * will reclaim both active and inactive pages.
> +	 */
> +	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +		sc->lumpy_reclaim_mode = mode;
> +	else if (sc->order && priority < DEF_PRIORITY - 2)
> +		sc->lumpy_reclaim_mode = mode;
> +	else
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}
> +
> +static void disable_lumpy_reclaim_mode(struct scan_control *sc)
> +{
> +	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}
> +
>  static inline int is_page_cache_freeable(struct page *page)
>  {
>  	/*
> @@ -275,7 +311,8 @@ static inline int is_page_cache_freeable(struct page *page)
>  	return page_count(page) - page_has_private(page) == 2;
>  }
>  
> -static int may_write_to_queue(struct backing_dev_info *bdi)
> +static int may_write_to_queue(struct backing_dev_info *bdi,
> +			      struct scan_control *sc)
>  {
>  	if (current->flags & PF_SWAPWRITE)
>  		return 1;
> @@ -283,6 +320,10 @@ static int may_write_to_queue(struct backing_dev_info *bdi)
>  		return 1;
>  	if (bdi == current->backing_dev_info)
>  		return 1;
> +
> +	/* lumpy reclaim for hugepage often need a lot of write */
> +	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +		return 1;
>  	return 0;
>  }
>  
> @@ -307,12 +348,6 @@ static void handle_write_error(struct address_space *mapping,
>  	unlock_page(page);
>  }
>  
> -/* Request for sync pageout. */
> -enum pageout_io {
> -	PAGEOUT_IO_ASYNC,
> -	PAGEOUT_IO_SYNC,
> -};
> -
>  /* possible outcome of pageout() */
>  typedef enum {
>  	/* failed to write page out, page is locked */
> @@ -330,7 +365,7 @@ typedef enum {
>   * Calls ->writepage().
>   */
>  static pageout_t pageout(struct page *page, struct address_space *mapping,
> -						enum pageout_io sync_writeback)
> +			 struct scan_control *sc)
>  {
>  	/*
>  	 * If the page is dirty, only perform writeback if that write
> @@ -366,8 +401,10 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  	}
>  	if (mapping->a_ops->writepage == NULL)
>  		return PAGE_ACTIVATE;
> -	if (!may_write_to_queue(mapping->backing_dev_info))
> +	if (!may_write_to_queue(mapping->backing_dev_info, sc)) {
> +		disable_lumpy_reclaim_mode(sc);
>  		return PAGE_KEEP;
> +	}
>  
>  	if (clear_page_dirty_for_io(page)) {
>  		int res;
> @@ -394,7 +431,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  		 * direct reclaiming a large contiguous area and the
>  		 * first attempt to free a range of pages fails.
>  		 */
> -		if (PageWriteback(page) && sync_writeback == PAGEOUT_IO_SYNC)
> +		if (PageWriteback(page) &&
> +		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
>  			wait_on_page_writeback(page);
>  
>  		if (!PageWriteback(page)) {
> @@ -402,7 +440,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  			ClearPageReclaim(page);
>  		}
>  		trace_mm_vmscan_writepage(page,
> -			sync_writeback == PAGEOUT_IO_SYNC);
> +				sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC);
>  		inc_zone_page_state(page, NR_VMSCAN_WRITE);
>  		return PAGE_SUCCESS;
>  	}
> @@ -580,7 +618,7 @@ static enum page_references page_check_references(struct page *page,
>  	referenced_page = TestClearPageReferenced(page);
>  
>  	/* Lumpy reclaim - ignore references */
> -	if (sc->lumpy_reclaim_mode)
> +	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
>  		return PAGEREF_RECLAIM;
>  
>  	/*
> @@ -644,8 +682,7 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
>   * shrink_page_list() returns the number of reclaimed pages
>   */
>  static unsigned long shrink_page_list(struct list_head *page_list,
> -					struct scan_control *sc,
> -					enum pageout_io sync_writeback)
> +				      struct scan_control *sc)
>  {
>  	LIST_HEAD(ret_pages);
>  	LIST_HEAD(free_pages);
> @@ -665,7 +702,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> -		if (sync_writeback == PAGEOUT_IO_SYNC)
> +		if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
>  			lock_page(page);
>  		else if (!trylock_page(page))
>  			goto keep;
> @@ -696,10 +733,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  			 * for any page for which writeback has already
>  			 * started.
>  			 */
> -			if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
> +			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
> +			    may_enter_fs)
>  				wait_on_page_writeback(page);
> -			else
> -				goto keep_locked;
> +			else {
> +				unlock_page(page);
> +				goto keep_lumpy;
> +			}
>  		}
>  
>  		references = page_check_references(page, sc);
> @@ -753,14 +793,17 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  				goto keep_locked;
>  
>  			/* Page is dirty, try to write it out here */
> -			switch (pageout(page, mapping, sync_writeback)) {
> +			switch (pageout(page, mapping, sc)) {
>  			case PAGE_KEEP:
>  				goto keep_locked;
>  			case PAGE_ACTIVATE:
>  				goto activate_locked;
>  			case PAGE_SUCCESS:
> -				if (PageWriteback(page) || PageDirty(page))
> +				if (PageWriteback(page))
> +					goto keep_lumpy;
> +				if (PageDirty(page))
>  					goto keep;
> +
>  				/*
>  				 * A synchronous write - probably a ramdisk.  Go
>  				 * ahead and try to reclaim the page.
> @@ -843,6 +886,7 @@ cull_mlocked:
>  			try_to_free_swap(page);
>  		unlock_page(page);
>  		putback_lru_page(page);
> +		disable_lumpy_reclaim_mode(sc);
>  		continue;
>  
>  activate_locked:
> @@ -855,6 +899,8 @@ activate_locked:
>  keep_locked:
>  		unlock_page(page);
>  keep:
> +		disable_lumpy_reclaim_mode(sc);
> +keep_lumpy:
>  		list_add(&page->lru, &ret_pages);
>  		VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
>  	}
> @@ -1255,7 +1301,7 @@ static inline bool should_reclaim_stall(unsigned long nr_taken,
>  		return false;
>  
>  	/* Only stall on lumpy reclaim */
> -	if (!sc->lumpy_reclaim_mode)
> +	if (sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
>  		return false;
>  
>  	/* If we have relaimed everything on the isolated list, no stall */
> @@ -1300,15 +1346,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  			return SWAP_CLUSTER_MAX;
>  	}
>  
> -
> +	set_lumpy_reclaim_mode(priority, sc, false);
>  	lru_add_drain();
>  	spin_lock_irq(&zone->lru_lock);
>  
>  	if (scanning_global_lru(sc)) {
>  		nr_taken = isolate_pages_global(nr_to_scan,
>  			&page_list, &nr_scanned, sc->order,
> -			sc->lumpy_reclaim_mode ?
> -				ISOLATE_BOTH : ISOLATE_INACTIVE,
> +			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
> +					ISOLATE_INACTIVE: ISOLATE_BOTH,
>  			zone, 0, file);
>  		zone->pages_scanned += nr_scanned;
>  		if (current_is_kswapd())
> @@ -1320,8 +1366,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  	} else {
>  		nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
>  			&page_list, &nr_scanned, sc->order,
> -			sc->lumpy_reclaim_mode ?
> -				ISOLATE_BOTH : ISOLATE_INACTIVE,
> +			sc->lumpy_reclaim_mode == LUMPY_MODE_NONE ?
> +					ISOLATE_INACTIVE: ISOLATE_BOTH,
>  			zone, sc->mem_cgroup,
>  			0, file);
>  		/*
> @@ -1339,7 +1385,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  
>  	spin_unlock_irq(&zone->lru_lock);
>  
> -	nr_reclaimed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
> +	nr_reclaimed = shrink_page_list(&page_list, sc);
>  
>  	/* Check if we should syncronously wait for writeback */
>  	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
> @@ -1350,7 +1396,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  		nr_active = clear_active_flags(&page_list, NULL);
>  		count_vm_events(PGDEACTIVATE, nr_active);
>  
> -		nr_reclaimed += shrink_page_list(&page_list, sc, PAGEOUT_IO_SYNC);
> +		set_lumpy_reclaim_mode(priority, sc, true);
> +		nr_reclaimed += shrink_page_list(&page_list, sc);
>  	}
>  
>  	local_irq_disable();
> @@ -1721,21 +1768,6 @@ out:
>  	}
>  }
>  
> -static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc)
> -{
> -	/*
> -	 * If we need a large contiguous chunk of memory, or have
> -	 * trouble getting a small set of contiguous pages, we
> -	 * will reclaim both active and inactive pages.
> -	 */
> -	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> -		sc->lumpy_reclaim_mode = 1;
> -	else if (sc->order && priority < DEF_PRIORITY - 2)
> -		sc->lumpy_reclaim_mode = 1;
> -	else
> -		sc->lumpy_reclaim_mode = 0;
> -}
> -
>  /*
>   * This is a basic per-zone page freer.  Used by both kswapd and direct reclaim.
>   */
> @@ -1750,8 +1782,6 @@ static void shrink_zone(int priority, struct zone *zone,
>  
>  	get_scan_count(zone, sc, nr, priority);
>  
> -	set_lumpy_reclaim_mode(priority, sc);
> -
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {
>  		for_each_evictable_lru(l) {
> -- 
> 1.6.5.2
> 
> 
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
  2010-08-05  6:12   ` KOSAKI Motohiro
@ 2010-08-05 15:02     ` Mel Gorman
  -1 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:02 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:12:22PM +0900, KOSAKI Motohiro wrote:
> From: Wu Fengguang <fengguang.wu@intel.com>
> 
> Fix "system goes unresponsive under memory pressure and lots of
> dirty/writeback pages" bug.
> 
> 	http://lkml.org/lkml/2010/4/4/86
> 
> In the above thread, Andreas Mohr described that
> 
> 	Invoking any command locked up for minutes (note that I'm
> 	talking about attempted additional I/O to the _other_,
> 	_unaffected_ main system HDD - such as loading some shell
> 	binaries -, NOT the external SSD18M!!).
> 
> This happens when the two conditions are both meet:
> - under memory pressure
> - writing heavily to a slow device
> 
> <SNIP>

Other than an unnecessary whitespace removal at the end of the patch, I see
no problem with letting this patch stand on it's own as we are
reasonably sure this patch fixes a problem on its own. Patches 2-7 might
further improve the situation but can be considered as a new series.

This patch (as well as most of the series) will reject against current mmotm
because of other reclaim-related patches already in there. The resolutions
are not too hard but bear it in mind.

> <SNIP>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
@ 2010-08-05 15:02     ` Mel Gorman
  0 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:02 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:12:22PM +0900, KOSAKI Motohiro wrote:
> From: Wu Fengguang <fengguang.wu@intel.com>
> 
> Fix "system goes unresponsive under memory pressure and lots of
> dirty/writeback pages" bug.
> 
> 	http://lkml.org/lkml/2010/4/4/86
> 
> In the above thread, Andreas Mohr described that
> 
> 	Invoking any command locked up for minutes (note that I'm
> 	talking about attempted additional I/O to the _other_,
> 	_unaffected_ main system HDD - such as loading some shell
> 	binaries -, NOT the external SSD18M!!).
> 
> This happens when the two conditions are both meet:
> - under memory pressure
> - writing heavily to a slow device
> 
> <SNIP>

Other than an unnecessary whitespace removal at the end of the patch, I see
no problem with letting this patch stand on it's own as we are
reasonably sure this patch fixes a problem on its own. Patches 2-7 might
further improve the situation but can be considered as a new series.

This patch (as well as most of the series) will reject against current mmotm
because of other reclaim-related patches already in there. The resolutions
are not too hard but bear it in mind.

> <SNIP>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait()
  2010-08-05  6:13   ` KOSAKI Motohiro
@ 2010-08-05 15:05     ` Rik van Riel
  -1 siblings, 0 replies; 66+ messages in thread
From: Rik van Riel @ 2010-08-05 15:05 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim

On 08/05/2010 02:13 AM, KOSAKI Motohiro wrote:
> congestion_wait() mean "waiting quueue congestion is cleared".
> That said, if the system have plenty dirty pages and flusher thread push
> new request to IO queue conteniously, IO queue are not cleared
> congestion status for long time. thus, congestion_wait(HZ/10) become
> almostly equivalent schedule_timeout(HZ/10).
>
> However, synchronous lumpy reclaim donesn't need this
> congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) are
> using wait_on_page_writeback() and it provide sufficient waiting.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait()
@ 2010-08-05 15:05     ` Rik van Riel
  0 siblings, 0 replies; 66+ messages in thread
From: Rik van Riel @ 2010-08-05 15:05 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim

On 08/05/2010 02:13 AM, KOSAKI Motohiro wrote:
> congestion_wait() mean "waiting quueue congestion is cleared".
> That said, if the system have plenty dirty pages and flusher thread push
> new request to IO queue conteniously, IO queue are not cleared
> congestion status for long time. thus, congestion_wait(HZ/10) become
> almostly equivalent schedule_timeout(HZ/10).
>
> However, synchronous lumpy reclaim donesn't need this
> congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) are
> using wait_on_page_writeback() and it provide sufficient waiting.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait()
  2010-08-05  6:13   ` KOSAKI Motohiro
@ 2010-08-05 15:06     ` Mel Gorman
  -1 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:06 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:13:03PM +0900, KOSAKI Motohiro wrote:
> congestion_wait() mean "waiting quueue congestion is cleared".
> That said, if the system have plenty dirty pages and flusher thread push
> new request to IO queue conteniously, IO queue are not cleared
> congestion status for long time. thus, congestion_wait(HZ/10) become
> almostly equivalent schedule_timeout(HZ/10).
> 
> However, synchronous lumpy reclaim donesn't need this
> congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) are
> using wait_on_page_writeback() and it provide sufficient waiting.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Needs rebasing for mmotm but otherwise;

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  mm/vmscan.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cf51d62..1cdc3db 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1341,7 +1341,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  
>  	/* Check if we should syncronously wait for writeback */
>  	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
> -		congestion_wait(BLK_RW_ASYNC, HZ/10);
>  		/*
>  		 * The attempt at page out may have made some
>  		 * of the pages active, mark them inactive again.
> -- 
> 1.6.5.2
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait()
@ 2010-08-05 15:06     ` Mel Gorman
  0 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:06 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:13:03PM +0900, KOSAKI Motohiro wrote:
> congestion_wait() mean "waiting quueue congestion is cleared".
> That said, if the system have plenty dirty pages and flusher thread push
> new request to IO queue conteniously, IO queue are not cleared
> congestion status for long time. thus, congestion_wait(HZ/10) become
> almostly equivalent schedule_timeout(HZ/10).
> 
> However, synchronous lumpy reclaim donesn't need this
> congestion_wait() at all. shrink_page_list(PAGEOUT_IO_SYNC) are
> using wait_on_page_writeback() and it provide sufficient waiting.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Needs rebasing for mmotm but otherwise;

Acked-by: Mel Gorman <mel@csn.ul.ie>

> ---
>  mm/vmscan.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cf51d62..1cdc3db 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1341,7 +1341,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>  
>  	/* Check if we should syncronously wait for writeback */
>  	if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
> -		congestion_wait(BLK_RW_ASYNC, HZ/10);
>  		/*
>  		 * The attempt at page out may have made some
>  		 * of the pages active, mark them inactive again.
> -- 
> 1.6.5.2
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 5/7] vmscan: kill dead code in shrink_inactive_list()
  2010-08-05  6:14   ` KOSAKI Motohiro
@ 2010-08-05 15:08     ` Minchan Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 15:08 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:14:47PM +0900, KOSAKI Motohiro wrote:
> When synchrounous lumy reclaim occur, page_list have gurantee to
> don't have active page because now page activation in shrink_page_list()
> always disable lumpy reclaim.
> 
> Then, This patch remove virtual dead code.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 5/7] vmscan: kill dead code in shrink_inactive_list()
@ 2010-08-05 15:08     ` Minchan Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 15:08 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:14:47PM +0900, KOSAKI Motohiro wrote:
> When synchrounous lumy reclaim occur, page_list have gurantee to
> don't have active page because now page activation in shrink_page_list()
> always disable lumpy reclaim.
> 
> Then, This patch remove virtual dead code.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
  2010-08-05  6:13   ` KOSAKI Motohiro
@ 2010-08-05 15:12     ` Mel Gorman
  -1 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:12 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:13:39PM +0900, KOSAKI Motohiro wrote:
> When synchrounous lumpy reclaim, there is no reason to give up to
> reclaim pages even if page is locked. We use lock_page() instead
> trylock_page() in this case.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

The intention of the code looks fine so;

Acked-by: Mel Gorman <mel@csn.ul.ie>

Something like the following might just be easier on the eye but it's a
question of personal taste.

/* Returns true if the page is locked */
static bool lru_lock_page(struct page *page, enum pageout_io sync_writeback)
{
	if (likely(sync_writeback == PAGEOUT_IO_ASYNC))
		return trylock_page(page);

	lock_page(page);
	return true;
}

then replace trylock_page() with lru_lock_page(). The naming is vaguely
similar to other helpers like lru_to_page

> ---
>  mm/vmscan.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1cdc3db..833b6ad 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -665,7 +665,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> -		if (!trylock_page(page))
> +		if (sync_writeback == PAGEOUT_IO_SYNC)
> +			lock_page(page);
> +		else if (!trylock_page(page))
>  			goto keep;
>  
>  		VM_BUG_ON(PageActive(page));
> -- 
> 1.6.5.2
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
@ 2010-08-05 15:12     ` Mel Gorman
  0 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:12 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:13:39PM +0900, KOSAKI Motohiro wrote:
> When synchrounous lumpy reclaim, there is no reason to give up to
> reclaim pages even if page is locked. We use lock_page() instead
> trylock_page() in this case.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

The intention of the code looks fine so;

Acked-by: Mel Gorman <mel@csn.ul.ie>

Something like the following might just be easier on the eye but it's a
question of personal taste.

/* Returns true if the page is locked */
static bool lru_lock_page(struct page *page, enum pageout_io sync_writeback)
{
	if (likely(sync_writeback == PAGEOUT_IO_ASYNC))
		return trylock_page(page);

	lock_page(page);
	return true;
}

then replace trylock_page() with lru_lock_page(). The naming is vaguely
similar to other helpers like lru_to_page

> ---
>  mm/vmscan.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1cdc3db..833b6ad 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -665,7 +665,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> -		if (!trylock_page(page))
> +		if (sync_writeback == PAGEOUT_IO_SYNC)
> +			lock_page(page);
> +		else if (!trylock_page(page))
>  			goto keep;
>  
>  		VM_BUG_ON(PageActive(page));
> -- 
> 1.6.5.2
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 5/7] vmscan: kill dead code in shrink_inactive_list()
  2010-08-05  6:14   ` KOSAKI Motohiro
@ 2010-08-05 15:14     ` Mel Gorman
  -1 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:14 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:14:47PM +0900, KOSAKI Motohiro wrote:
> When synchrounous lumy reclaim occur, page_list have gurantee to
> don't have active page because now page activation in shrink_page_list()
> always disable lumpy reclaim.
> 
> Then, This patch remove virtual dead code.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Seems fine.

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 5/7] vmscan: kill dead code in shrink_inactive_list()
@ 2010-08-05 15:14     ` Mel Gorman
  0 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:14 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:14:47PM +0900, KOSAKI Motohiro wrote:
> When synchrounous lumy reclaim occur, page_list have gurantee to
> don't have active page because now page activation in shrink_page_list()
> always disable lumpy reclaim.
> 
> Then, This patch remove virtual dead code.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Seems fine.

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
  2010-08-05  6:12   ` KOSAKI Motohiro
@ 2010-08-05 15:19     ` Rik van Riel
  -1 siblings, 0 replies; 66+ messages in thread
From: Rik van Riel @ 2010-08-05 15:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim

On 08/05/2010 02:12 AM, KOSAKI Motohiro wrote:
> From: Wu Fengguang<fengguang.wu@intel.com>
>
> Fix "system goes unresponsive under memory pressure and lots of
> dirty/writeback pages" bug.
>
> 	http://lkml.org/lkml/2010/4/4/86
>
> In the above thread, Andreas Mohr described that
>
> 	Invoking any command locked up for minutes (note that I'm
> 	talking about attempted additional I/O to the _other_,
> 	_unaffected_ main system HDD - such as loading some shell
> 	binaries -, NOT the external SSD18M!!).
>
> This happens when the two conditions are both meet:
> - under memory pressure
> - writing heavily to a slow device

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
@ 2010-08-05 15:19     ` Rik van Riel
  0 siblings, 0 replies; 66+ messages in thread
From: Rik van Riel @ 2010-08-05 15:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim

On 08/05/2010 02:12 AM, KOSAKI Motohiro wrote:
> From: Wu Fengguang<fengguang.wu@intel.com>
>
> Fix "system goes unresponsive under memory pressure and lots of
> dirty/writeback pages" bug.
>
> 	http://lkml.org/lkml/2010/4/4/86
>
> In the above thread, Andreas Mohr described that
>
> 	Invoking any command locked up for minutes (note that I'm
> 	talking about attempted additional I/O to the _other_,
> 	_unaffected_ main system HDD - such as loading some shell
> 	binaries -, NOT the external SSD18M!!).
>
> This happens when the two conditions are both meet:
> - under memory pressure
> - writing heavily to a slow device

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 7/7] vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated
  2010-08-05  6:16   ` KOSAKI Motohiro
@ 2010-08-05 15:25     ` Mel Gorman
  -1 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:16:06PM +0900, KOSAKI Motohiro wrote:
> isolate_lru_pages() doesn't only isolate LRU tail pages, but also
> isolate neighbor pages of the eviction page.
> 
> Now, the neighbor search don't stop even if neighbors can't be isolated.
> It is silly. successful higher order allocation need full contenious
> memory, even though only one page reclaim failure mean to fail making
> enough contenious memory.
> 
> Then, isolate_lru_pages() should stop to search PFN neighbor pages and
> try to search next page on LRU soon. This patch does it. Also all of
> lumpy reclaim failure account nr_lumpy_failed.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Seems reasonable.

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 7/7] vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated
@ 2010-08-05 15:25     ` Mel Gorman
  0 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-08-05 15:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Wu Fengguang,
	Minchan Kim, Rik van Riel

On Thu, Aug 05, 2010 at 03:16:06PM +0900, KOSAKI Motohiro wrote:
> isolate_lru_pages() doesn't only isolate LRU tail pages, but also
> isolate neighbor pages of the eviction page.
> 
> Now, the neighbor search don't stop even if neighbors can't be isolated.
> It is silly. successful higher order allocation need full contenious
> memory, even though only one page reclaim failure mean to fail making
> enough contenious memory.
> 
> Then, isolate_lru_pages() should stop to search PFN neighbor pages and
> try to search next page on LRU soon. This patch does it. Also all of
> lumpy reclaim failure account nr_lumpy_failed.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Seems reasonable.

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
  2010-08-05  6:13   ` KOSAKI Motohiro
@ 2010-08-05 15:26     ` Rik van Riel
  -1 siblings, 0 replies; 66+ messages in thread
From: Rik van Riel @ 2010-08-05 15:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim

On 08/05/2010 02:13 AM, KOSAKI Motohiro wrote:
> When synchrounous lumpy reclaim, there is no reason to give up to
> reclaim pages even if page is locked. We use lock_page() instead
> trylock_page() in this case.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
@ 2010-08-05 15:26     ` Rik van Riel
  0 siblings, 0 replies; 66+ messages in thread
From: Rik van Riel @ 2010-08-05 15:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim

On 08/05/2010 02:13 AM, KOSAKI Motohiro wrote:
> When synchrounous lumpy reclaim, there is no reason to give up to
> reclaim pages even if page is locked. We use lock_page() instead
> trylock_page() in this case.
>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 7/7] vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated
  2010-08-05  6:16   ` KOSAKI Motohiro
@ 2010-08-05 15:40     ` Minchan Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 15:40 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:16:06PM +0900, KOSAKI Motohiro wrote:
> isolate_lru_pages() doesn't only isolate LRU tail pages, but also
> isolate neighbor pages of the eviction page.
> 
> Now, the neighbor search don't stop even if neighbors can't be isolated.
> It is silly. successful higher order allocation need full contenious
> memory, even though only one page reclaim failure mean to fail making
> enough contenious memory.
> 
> Then, isolate_lru_pages() should stop to search PFN neighbor pages and
> try to search next page on LRU soon. This patch does it. Also all of
> lumpy reclaim failure account nr_lumpy_failed.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

I agree this patch. 
But I have a one question. 

> ---
>  mm/vmscan.c |   24 ++++++++++++++++--------
>  1 files changed, 16 insertions(+), 8 deletions(-)
> 
<snip>

  
>  			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
>  				list_move(&cursor_page->lru, dst);
> @@ -1074,9 +1080,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  					nr_lumpy_dirty++;
>  				scan++;
>  			} else {
> -				if (mode == ISOLATE_BOTH &&
> -						page_count(cursor_page))
> -					nr_lumpy_failed++;

sc->order = 1;
shrink_inactive_list;
isolate_pages_global with ISOLATE_INACTIVE(I mean no lumpy relcaim mode);
lumpy relcaim in inactive list in isolate_lru_pages;
(But I am not sure we can call it as lumpy reclaim. but at lesat I think 
it a part of lumpy reclaim)
I mean it can reclaim physical pfn order not LRU order in inactive list since
it only consider sc->order.  Is it a intentional?

I guess it's intentional since we care of ISOLATE_BOTH when we increase nr_lumpy_failed. 
If it is, Shouldn't we care of ISOLATE_BOTH still?


> +				/* the page is freed already. */
> +				if (!page_count(cursor_page))
> +					continue;
> +				nr_lumpy_failed++;
> +				break;
>  			}
>  		}
>  	}
> -- 
> 1.6.5.2
> 


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 7/7] vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated
@ 2010-08-05 15:40     ` Minchan Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-05 15:40 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 05, 2010 at 03:16:06PM +0900, KOSAKI Motohiro wrote:
> isolate_lru_pages() doesn't only isolate LRU tail pages, but also
> isolate neighbor pages of the eviction page.
> 
> Now, the neighbor search don't stop even if neighbors can't be isolated.
> It is silly. successful higher order allocation need full contenious
> memory, even though only one page reclaim failure mean to fail making
> enough contenious memory.
> 
> Then, isolate_lru_pages() should stop to search PFN neighbor pages and
> try to search next page on LRU soon. This patch does it. Also all of
> lumpy reclaim failure account nr_lumpy_failed.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

I agree this patch. 
But I have a one question. 

> ---
>  mm/vmscan.c |   24 ++++++++++++++++--------
>  1 files changed, 16 insertions(+), 8 deletions(-)
> 
<snip>

  
>  			if (__isolate_lru_page(cursor_page, mode, file) == 0) {
>  				list_move(&cursor_page->lru, dst);
> @@ -1074,9 +1080,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>  					nr_lumpy_dirty++;
>  				scan++;
>  			} else {
> -				if (mode == ISOLATE_BOTH &&
> -						page_count(cursor_page))
> -					nr_lumpy_failed++;

sc->order = 1;
shrink_inactive_list;
isolate_pages_global with ISOLATE_INACTIVE(I mean no lumpy relcaim mode);
lumpy relcaim in inactive list in isolate_lru_pages;
(But I am not sure we can call it as lumpy reclaim. but at lesat I think 
it a part of lumpy reclaim)
I mean it can reclaim physical pfn order not LRU order in inactive list since
it only consider sc->order.  Is it a intentional?

I guess it's intentional since we care of ISOLATE_BOTH when we increase nr_lumpy_failed. 
If it is, Shouldn't we care of ISOLATE_BOTH still?


> +				/* the page is freed already. */
> +				if (!page_count(cursor_page))
> +					continue;
> +				nr_lumpy_failed++;
> +				break;
>  			}
>  		}
>  	}
> -- 
> 1.6.5.2
> 


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page()  instead trylock_page()
  2010-08-05 14:17     ` Minchan Kim
@ 2010-08-06  0:52       ` Minchan Kim
  -1 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-06  0:52 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 5, 2010 at 11:17 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Thu, Aug 05, 2010 at 03:13:39PM +0900, KOSAKI Motohiro wrote:
>> When synchrounous lumpy reclaim, there is no reason to give up to
>> reclaim pages even if page is locked. We use lock_page() instead
>> trylock_page() in this case.
>>
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>>  mm/vmscan.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 1cdc3db..833b6ad 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -665,7 +665,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>>               page = lru_to_page(page_list);
>>               list_del(&page->lru);
>>
>> -             if (!trylock_page(page))
>> +             if (sync_writeback == PAGEOUT_IO_SYNC)
>> +                     lock_page(page);
>> +             else if (!trylock_page(page))
>>                       goto keep;
>>
>>               VM_BUG_ON(PageActive(page));
>> --
>> 1.6.5.2
>>
>>
>>
>
> Hmm. We can make sure lumpy already doesn't select the page locked?
> I mean below scenario.
>
> LRU head -> page A -> page B -> LRU tail
>
> lock_page(page A)
> some_function()
> direct reclaim
> select victim page B
> enter lumpy mode
> select victim page A as well as page B
> shrink_page_list
> lock_page(page A)
>
>
> --
> Kind regards,
> Minchan Kim
>

Ignore above comment.
lock_page doesn't have a deadlock problem. My bad.

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page()
@ 2010-08-06  0:52       ` Minchan Kim
  0 siblings, 0 replies; 66+ messages in thread
From: Minchan Kim @ 2010-08-06  0:52 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Rik van Riel

On Thu, Aug 5, 2010 at 11:17 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Thu, Aug 05, 2010 at 03:13:39PM +0900, KOSAKI Motohiro wrote:
>> When synchrounous lumpy reclaim, there is no reason to give up to
>> reclaim pages even if page is locked. We use lock_page() instead
>> trylock_page() in this case.
>>
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>>  mm/vmscan.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 1cdc3db..833b6ad 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -665,7 +665,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>>               page = lru_to_page(page_list);
>>               list_del(&page->lru);
>>
>> -             if (!trylock_page(page))
>> +             if (sync_writeback == PAGEOUT_IO_SYNC)
>> +                     lock_page(page);
>> +             else if (!trylock_page(page))
>>                       goto keep;
>>
>>               VM_BUG_ON(PageActive(page));
>> --
>> 1.6.5.2
>>
>>
>>
>
> Hmm. We can make sure lumpy already doesn't select the page locked?
> I mean below scenario.
>
> LRU head -> page A -> page B -> LRU tail
>
> lock_page(page A)
> some_function()
> direct reclaim
> select victim page B
> enter lumpy mode
> select victim page A as well as page B
> shrink_page_list
> lock_page(page A)
>
>
> --
> Kind regards,
> Minchan Kim
>

Ignore above comment.
lock_page doesn't have a deadlock problem. My bad.

Reviewed-by: Minchan Kim <minchan.kim@gmail.com>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
  2010-08-05 15:02     ` Mel Gorman
@ 2010-08-08  6:42       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-08  6:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

> This patch (as well as most of the series) will reject against current mmotm
> because of other reclaim-related patches already in there. The resolutions
> are not too hard but bear it in mind.

I was working on latest published mmotm. but yes, current akpm private mmotm
incluse your reclaim related change.

That said, I need to rework them later.




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls
@ 2010-08-08  6:42       ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-08-08  6:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

> This patch (as well as most of the series) will reject against current mmotm
> because of other reclaim-related patches already in there. The resolutions
> are not too hard but bear it in mind.

I was working on latest published mmotm. but yes, current akpm private mmotm
incluse your reclaim related change.

That said, I need to rework them later.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-08-05  6:14   ` KOSAKI Motohiro
@ 2010-10-27 16:41     ` Andrea Arcangeli
  -1 siblings, 0 replies; 66+ messages in thread
From: Andrea Arcangeli @ 2010-10-27 16:41 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel

Hi,

On Thu, Aug 05, 2010 at 03:14:13PM +0900, KOSAKI Motohiro wrote:
> @@ -265,6 +271,36 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>  	return ret;
>  }
>  
> +static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> +				   bool sync)
> +{
> +	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> +
> +	/*
> +	 * Some reclaim have alredy been failed. No worth to try synchronous
> +	 * lumpy reclaim.
> +	 */
> +	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> +		return;
> +
> +	/*
> +	 * If we need a large contiguous chunk of memory, or have
> +	 * trouble getting a small set of contiguous pages, we
> +	 * will reclaim both active and inactive pages.
> +	 */
> +	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +		sc->lumpy_reclaim_mode = mode;
> +	else if (sc->order && priority < DEF_PRIORITY - 2)
> +		sc->lumpy_reclaim_mode = mode;
> +	else
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}
> +
> +static void disable_lumpy_reclaim_mode(struct scan_control *sc)
> +{
> +	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}
> +
>  static inline int is_page_cache_freeable(struct page *page)
>  {
>  	/*
> @@ -275,7 +311,8 @@ static inline int is_page_cache_freeable(struct page *page)
>  	return page_count(page) - page_has_private(page) == 2;
>  }
>  
> -static int may_write_to_queue(struct backing_dev_info *bdi)
> +static int may_write_to_queue(struct backing_dev_info *bdi,
> +			      struct scan_control *sc)
>  {
>  	if (current->flags & PF_SWAPWRITE)
>  		return 1;
> @@ -283,6 +320,10 @@ static int may_write_to_queue(struct backing_dev_info *bdi)
>  		return 1;
>  	if (bdi == current->backing_dev_info)
>  		return 1;
> +
> +	/* lumpy reclaim for hugepage often need a lot of write */
> +	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +		return 1;
>  	return 0;
>  }
>  
> @@ -307,12 +348,6 @@ static void handle_write_error(struct address_space *mapping,
>  	unlock_page(page);
>  }
>  
> -/* Request for sync pageout. */
> -enum pageout_io {
> -	PAGEOUT_IO_ASYNC,
> -	PAGEOUT_IO_SYNC,
> -};
> -
>  /* possible outcome of pageout() */
>  typedef enum {
>  	/* failed to write page out, page is locked */
> @@ -330,7 +365,7 @@ typedef enum {
>   * Calls ->writepage().
>   */
>  static pageout_t pageout(struct page *page, struct address_space *mapping,
> -						enum pageout_io sync_writeback)
> +			 struct scan_control *sc)
>  {
>  	/*
>  	 * If the page is dirty, only perform writeback if that write
> @@ -366,8 +401,10 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  	}
>  	if (mapping->a_ops->writepage == NULL)
>  		return PAGE_ACTIVATE;
> -	if (!may_write_to_queue(mapping->backing_dev_info))
> +	if (!may_write_to_queue(mapping->backing_dev_info, sc)) {
> +		disable_lumpy_reclaim_mode(sc);
>  		return PAGE_KEEP;
> +	}
>  
>  	if (clear_page_dirty_for_io(page)) {
>  		int res;
> @@ -394,7 +431,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  		 * direct reclaiming a large contiguous area and the
>  		 * first attempt to free a range of pages fails.
>  		 */
> -		if (PageWriteback(page) && sync_writeback == PAGEOUT_IO_SYNC)
> +		if (PageWriteback(page) &&
> +		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
>  			wait_on_page_writeback(page);
>  
>  		if (!PageWriteback(page)) {
> @@ -402,7 +440,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  			ClearPageReclaim(page);
>  		}
>  		trace_mm_vmscan_writepage(page,
> -			sync_writeback == PAGEOUT_IO_SYNC);
> +				sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC);
>  		inc_zone_page_state(page, NR_VMSCAN_WRITE);
>  		return PAGE_SUCCESS;
>  	}
> @@ -580,7 +618,7 @@ static enum page_references page_check_references(struct page *page,
>  	referenced_page = TestClearPageReferenced(page);
>  
>  	/* Lumpy reclaim - ignore references */
> -	if (sc->lumpy_reclaim_mode)
> +	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
>  		return PAGEREF_RECLAIM;
>  
>  	/*
> @@ -644,8 +682,7 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
>   * shrink_page_list() returns the number of reclaimed pages
>   */
>  static unsigned long shrink_page_list(struct list_head *page_list,
> -					struct scan_control *sc,
> -					enum pageout_io sync_writeback)
> +				      struct scan_control *sc)
>  {
>  	LIST_HEAD(ret_pages);
>  	LIST_HEAD(free_pages);
> @@ -665,7 +702,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> -		if (sync_writeback == PAGEOUT_IO_SYNC)
> +		if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
>  			lock_page(page);
>  		else if (!trylock_page(page))
>  			goto keep;
> @@ -696,10 +733,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  			 * for any page for which writeback has already
>  			 * started.
>  			 */
> -			if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
> +			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
> +			    may_enter_fs)
>  				wait_on_page_writeback(page);
> -			else
> -				goto keep_locked;
> +			else {
> +				unlock_page(page);
> +				goto keep_lumpy;
> +			}
>  		}
>  
>  		references = page_check_references(page, sc);

[...]

this rejects on THP code, lumpy is unusable with hugepages, it grinds
the system to an halt, and there's no reason to let it survive. Lumpy
is like compaction done with an hammer while blindfolded.

I don't know why community insists on improving lumpy when it has to
be removed completely, especially now that we have memory compaction.

I'll keep deleting on my tree...

I hope lumpy work stops here and that it goes away whenever THP is
merged.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-27 16:41     ` Andrea Arcangeli
  0 siblings, 0 replies; 66+ messages in thread
From: Andrea Arcangeli @ 2010-10-27 16:41 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Johannes Weiner, Mel Gorman,
	Wu Fengguang, Minchan Kim, Rik van Riel

Hi,

On Thu, Aug 05, 2010 at 03:14:13PM +0900, KOSAKI Motohiro wrote:
> @@ -265,6 +271,36 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
>  	return ret;
>  }
>  
> +static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> +				   bool sync)
> +{
> +	enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> +
> +	/*
> +	 * Some reclaim have alredy been failed. No worth to try synchronous
> +	 * lumpy reclaim.
> +	 */
> +	if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> +		return;
> +
> +	/*
> +	 * If we need a large contiguous chunk of memory, or have
> +	 * trouble getting a small set of contiguous pages, we
> +	 * will reclaim both active and inactive pages.
> +	 */
> +	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +		sc->lumpy_reclaim_mode = mode;
> +	else if (sc->order && priority < DEF_PRIORITY - 2)
> +		sc->lumpy_reclaim_mode = mode;
> +	else
> +		sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}
> +
> +static void disable_lumpy_reclaim_mode(struct scan_control *sc)
> +{
> +	sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}
> +
>  static inline int is_page_cache_freeable(struct page *page)
>  {
>  	/*
> @@ -275,7 +311,8 @@ static inline int is_page_cache_freeable(struct page *page)
>  	return page_count(page) - page_has_private(page) == 2;
>  }
>  
> -static int may_write_to_queue(struct backing_dev_info *bdi)
> +static int may_write_to_queue(struct backing_dev_info *bdi,
> +			      struct scan_control *sc)
>  {
>  	if (current->flags & PF_SWAPWRITE)
>  		return 1;
> @@ -283,6 +320,10 @@ static int may_write_to_queue(struct backing_dev_info *bdi)
>  		return 1;
>  	if (bdi == current->backing_dev_info)
>  		return 1;
> +
> +	/* lumpy reclaim for hugepage often need a lot of write */
> +	if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +		return 1;
>  	return 0;
>  }
>  
> @@ -307,12 +348,6 @@ static void handle_write_error(struct address_space *mapping,
>  	unlock_page(page);
>  }
>  
> -/* Request for sync pageout. */
> -enum pageout_io {
> -	PAGEOUT_IO_ASYNC,
> -	PAGEOUT_IO_SYNC,
> -};
> -
>  /* possible outcome of pageout() */
>  typedef enum {
>  	/* failed to write page out, page is locked */
> @@ -330,7 +365,7 @@ typedef enum {
>   * Calls ->writepage().
>   */
>  static pageout_t pageout(struct page *page, struct address_space *mapping,
> -						enum pageout_io sync_writeback)
> +			 struct scan_control *sc)
>  {
>  	/*
>  	 * If the page is dirty, only perform writeback if that write
> @@ -366,8 +401,10 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  	}
>  	if (mapping->a_ops->writepage == NULL)
>  		return PAGE_ACTIVATE;
> -	if (!may_write_to_queue(mapping->backing_dev_info))
> +	if (!may_write_to_queue(mapping->backing_dev_info, sc)) {
> +		disable_lumpy_reclaim_mode(sc);
>  		return PAGE_KEEP;
> +	}
>  
>  	if (clear_page_dirty_for_io(page)) {
>  		int res;
> @@ -394,7 +431,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  		 * direct reclaiming a large contiguous area and the
>  		 * first attempt to free a range of pages fails.
>  		 */
> -		if (PageWriteback(page) && sync_writeback == PAGEOUT_IO_SYNC)
> +		if (PageWriteback(page) &&
> +		    sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
>  			wait_on_page_writeback(page);
>  
>  		if (!PageWriteback(page)) {
> @@ -402,7 +440,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
>  			ClearPageReclaim(page);
>  		}
>  		trace_mm_vmscan_writepage(page,
> -			sync_writeback == PAGEOUT_IO_SYNC);
> +				sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC);
>  		inc_zone_page_state(page, NR_VMSCAN_WRITE);
>  		return PAGE_SUCCESS;
>  	}
> @@ -580,7 +618,7 @@ static enum page_references page_check_references(struct page *page,
>  	referenced_page = TestClearPageReferenced(page);
>  
>  	/* Lumpy reclaim - ignore references */
> -	if (sc->lumpy_reclaim_mode)
> +	if (sc->lumpy_reclaim_mode != LUMPY_MODE_NONE)
>  		return PAGEREF_RECLAIM;
>  
>  	/*
> @@ -644,8 +682,7 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
>   * shrink_page_list() returns the number of reclaimed pages
>   */
>  static unsigned long shrink_page_list(struct list_head *page_list,
> -					struct scan_control *sc,
> -					enum pageout_io sync_writeback)
> +				      struct scan_control *sc)
>  {
>  	LIST_HEAD(ret_pages);
>  	LIST_HEAD(free_pages);
> @@ -665,7 +702,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		page = lru_to_page(page_list);
>  		list_del(&page->lru);
>  
> -		if (sync_writeback == PAGEOUT_IO_SYNC)
> +		if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC)
>  			lock_page(page);
>  		else if (!trylock_page(page))
>  			goto keep;
> @@ -696,10 +733,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  			 * for any page for which writeback has already
>  			 * started.
>  			 */
> -			if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
> +			if (sc->lumpy_reclaim_mode == LUMPY_MODE_SYNC &&
> +			    may_enter_fs)
>  				wait_on_page_writeback(page);
> -			else
> -				goto keep_locked;
> +			else {
> +				unlock_page(page);
> +				goto keep_lumpy;
> +			}
>  		}
>  
>  		references = page_check_references(page, sc);

[...]

this rejects on THP code, lumpy is unusable with hugepages, it grinds
the system to an halt, and there's no reason to let it survive. Lumpy
is like compaction done with an hammer while blindfolded.

I don't know why community insists on improving lumpy when it has to
be removed completely, especially now that we have memory compaction.

I'll keep deleting on my tree...

I hope lumpy work stops here and that it goes away whenever THP is
merged.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-27 16:41     ` Andrea Arcangeli
@ 2010-10-27 17:16       ` Mel Gorman
  -1 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-10-27 17:16 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

On Wed, Oct 27, 2010 at 06:41:38PM +0200, Andrea Arcangeli wrote:
> Hi,
> 
> > <SNIP>
> 
> [...]
> 
> this rejects on THP code, lumpy is unusable with hugepages, it grinds
> the system to an halt, and there's no reason to let it survive. Lumpy
> is like compaction done with an hammer while blindfolded.
> 

The series drastically limits the level of hammering lumpy does to the
system. I'm currently keeping it alive because lumpy reclaim has received a lot
more testing than compaction has. While I ultimately see it going away, I am
resisting it being deleted until compaction has been around for a few releases.

> I don't know why community insists on improving lumpy when it has to
> be removed completely, especially now that we have memory compaction.
> 

Simply because it has been tested and even with compaction there were cases
envisoned where it would be used - low memory or when compaction is not
configured in for example. The ideal is that compaction is used until lumpy
is necessary although this applies more to the static resizing of the huge
page pool than THP which I'd expect to backoff without using lumpy reclaim
i.e. fail the allocation rather than using lumpy reclaim.

> I'll keep deleting on my tree...
> 
> I hope lumpy work stops here and that it goes away whenever THP is
> merged.
> 

Uhhh, I have one more modification in mind when lumpy is involved and
it's to relax the zone watermark slightly to only obey up to
PAGE_ALLOC_COSTLY_ORDER. At the moment, it is freeing more pages than
are necessary to satisfy an allocation request and hits the system
harder than it should. Similar logic should apply to compaction.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-27 17:16       ` Mel Gorman
  0 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-10-27 17:16 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

On Wed, Oct 27, 2010 at 06:41:38PM +0200, Andrea Arcangeli wrote:
> Hi,
> 
> > <SNIP>
> 
> [...]
> 
> this rejects on THP code, lumpy is unusable with hugepages, it grinds
> the system to an halt, and there's no reason to let it survive. Lumpy
> is like compaction done with an hammer while blindfolded.
> 

The series drastically limits the level of hammering lumpy does to the
system. I'm currently keeping it alive because lumpy reclaim has received a lot
more testing than compaction has. While I ultimately see it going away, I am
resisting it being deleted until compaction has been around for a few releases.

> I don't know why community insists on improving lumpy when it has to
> be removed completely, especially now that we have memory compaction.
> 

Simply because it has been tested and even with compaction there were cases
envisoned where it would be used - low memory or when compaction is not
configured in for example. The ideal is that compaction is used until lumpy
is necessary although this applies more to the static resizing of the huge
page pool than THP which I'd expect to backoff without using lumpy reclaim
i.e. fail the allocation rather than using lumpy reclaim.

> I'll keep deleting on my tree...
> 
> I hope lumpy work stops here and that it goes away whenever THP is
> merged.
> 

Uhhh, I have one more modification in mind when lumpy is involved and
it's to relax the zone watermark slightly to only obey up to
PAGE_ALLOC_COSTLY_ORDER. At the moment, it is freeing more pages than
are necessary to satisfy an allocation request and hits the system
harder than it should. Similar logic should apply to compaction.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-27 17:16       ` Mel Gorman
@ 2010-10-27 18:03         ` Andrea Arcangeli
  -1 siblings, 0 replies; 66+ messages in thread
From: Andrea Arcangeli @ 2010-10-27 18:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

Hi Mel,

On Wed, Oct 27, 2010 at 06:16:43PM +0100, Mel Gorman wrote:
> The series drastically limits the level of hammering lumpy does to the
> system. I'm currently keeping it alive because lumpy reclaim has received a lot
> more testing than compaction has. While I ultimately see it going away, I am
> resisting it being deleted until compaction has been around for a few releases.

I admit I didn't yet test this modification yet to verify how
"drastical" is the drastical change. But "less unusable lumpy" I doubt
will translate to "as usable as without lumpy". And I doubt lumpy will
ever lead to something "usable" when order 9 allocations are the norm
and more frequent than order 0 allocations.

> Simply because it has been tested and even with compaction there were cases
> envisoned where it would be used - low memory or when compaction is not
> configured in for example. The ideal is that compaction is used until lumpy

Compaction should always be configured in. All archs supports
migration. Only reason to disable compaction is for debugging
purposes, and should go in kernel hacking section. Or alternatively if
it's not important that order >0 allocation succeeds (some embedded
may be in that lucky situation and they can save some bytecode).

Keeping lumpy in and activated for all high order allocations like
this, can only _hide_ bugs and inefficiencies in compaction in my view
so in addition to damaging the runtime, it fragment userbase and
debuggability and I see zero good out of lumpy for all normal
allocations.

> is necessary although this applies more to the static resizing of the huge
> page pool than THP which I'd expect to backoff without using lumpy reclaim
> i.e. fail the allocation rather than using lumpy reclaim.

I agree lumpy is more drastic and aggressive than reclaim and it may
be quicker to generate hugepages by throwing its blind hammer, in turn
destroying everything else running and hanging the system for a long
while. I wouldn't be so against lumpy if it was only activated by a
special __GFP_LUMPY flag that only hugetlbfs pool resizing uses.
hugetlbfs is the very special case, not all other normal
allocations.

> Uhhh, I have one more modification in mind when lumpy is involved and
> it's to relax the zone watermark slightly to only obey up to
> PAGE_ALLOC_COSTLY_ORDER. At the moment, it is freeing more pages than
> are necessary to satisfy an allocation request and hits the system
> harder than it should. Similar logic should apply to compaction.

On a side note I want to remove the PAGE_ALLOC_COSTLY_ORDER too, that
is a flawed concept in the first place. A VM that behaves radically
(radically as in grinding system to an halt and being unusable and
creating swap storms) different when the order of allocation raises
from 3 to 4 is hackish and fundamentally incompatible with logics that
uses frequent order 9 allocations and makes them the default.

Basically anybody asking an order 9 during the normal runtime (not
some magic sysfs control) has to be ok if it fails and only relay on
compaction, or it's in some corner case and as such shall be threated
instead of mandating the default VM behavior for >=4 order allocation
for everything else.

The PAGE_ALLOC_COSTLY_ORDER was in practice a not stack-local
per-process equivalent of what I recommended as the way to trigger
lumpy (i.e. __GFP_LUMPY), but it's not a good enough approximation
anymore. So the "activation" for
blindfolded-hammer-algorithm-creating-swap-storms has to be in
function of the caller stack, and not in function of the allocation
order. If that change is done, I won't be forced to drop lumpy
anymore! But even then I find it hard to justify to keep lumpy alive
unless it is proven to be more efficient than compaction. But I could
avoid touching the lumpy code at least.

My tree uses compaction in a fine way inside kswapd too and tons of
systems are running without lumpy and floods of order 9 allocations
with only compaction (in direct reclaim and kswapd) without the
slighest problem. Furthermore I extended compaction for all
allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
not to use compaction for every allocation including 1,2,3, and things
works fine this way.

For now, to fixup the reject I think I'll go ahead remove these new
lumpy changes, which also guarantees me the most tested configuration
that I'm sure works fine without having to test how "less unusable"
lumpy has become. If later I'll be asked to retain lumpy in order to
merge THP I'll simply add the __GFP_LUMPY and I'll restrict lumpy in
the sysfs tweaking corner case.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-27 18:03         ` Andrea Arcangeli
  0 siblings, 0 replies; 66+ messages in thread
From: Andrea Arcangeli @ 2010-10-27 18:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

Hi Mel,

On Wed, Oct 27, 2010 at 06:16:43PM +0100, Mel Gorman wrote:
> The series drastically limits the level of hammering lumpy does to the
> system. I'm currently keeping it alive because lumpy reclaim has received a lot
> more testing than compaction has. While I ultimately see it going away, I am
> resisting it being deleted until compaction has been around for a few releases.

I admit I didn't yet test this modification yet to verify how
"drastical" is the drastical change. But "less unusable lumpy" I doubt
will translate to "as usable as without lumpy". And I doubt lumpy will
ever lead to something "usable" when order 9 allocations are the norm
and more frequent than order 0 allocations.

> Simply because it has been tested and even with compaction there were cases
> envisoned where it would be used - low memory or when compaction is not
> configured in for example. The ideal is that compaction is used until lumpy

Compaction should always be configured in. All archs supports
migration. Only reason to disable compaction is for debugging
purposes, and should go in kernel hacking section. Or alternatively if
it's not important that order >0 allocation succeeds (some embedded
may be in that lucky situation and they can save some bytecode).

Keeping lumpy in and activated for all high order allocations like
this, can only _hide_ bugs and inefficiencies in compaction in my view
so in addition to damaging the runtime, it fragment userbase and
debuggability and I see zero good out of lumpy for all normal
allocations.

> is necessary although this applies more to the static resizing of the huge
> page pool than THP which I'd expect to backoff without using lumpy reclaim
> i.e. fail the allocation rather than using lumpy reclaim.

I agree lumpy is more drastic and aggressive than reclaim and it may
be quicker to generate hugepages by throwing its blind hammer, in turn
destroying everything else running and hanging the system for a long
while. I wouldn't be so against lumpy if it was only activated by a
special __GFP_LUMPY flag that only hugetlbfs pool resizing uses.
hugetlbfs is the very special case, not all other normal
allocations.

> Uhhh, I have one more modification in mind when lumpy is involved and
> it's to relax the zone watermark slightly to only obey up to
> PAGE_ALLOC_COSTLY_ORDER. At the moment, it is freeing more pages than
> are necessary to satisfy an allocation request and hits the system
> harder than it should. Similar logic should apply to compaction.

On a side note I want to remove the PAGE_ALLOC_COSTLY_ORDER too, that
is a flawed concept in the first place. A VM that behaves radically
(radically as in grinding system to an halt and being unusable and
creating swap storms) different when the order of allocation raises
from 3 to 4 is hackish and fundamentally incompatible with logics that
uses frequent order 9 allocations and makes them the default.

Basically anybody asking an order 9 during the normal runtime (not
some magic sysfs control) has to be ok if it fails and only relay on
compaction, or it's in some corner case and as such shall be threated
instead of mandating the default VM behavior for >=4 order allocation
for everything else.

The PAGE_ALLOC_COSTLY_ORDER was in practice a not stack-local
per-process equivalent of what I recommended as the way to trigger
lumpy (i.e. __GFP_LUMPY), but it's not a good enough approximation
anymore. So the "activation" for
blindfolded-hammer-algorithm-creating-swap-storms has to be in
function of the caller stack, and not in function of the allocation
order. If that change is done, I won't be forced to drop lumpy
anymore! But even then I find it hard to justify to keep lumpy alive
unless it is proven to be more efficient than compaction. But I could
avoid touching the lumpy code at least.

My tree uses compaction in a fine way inside kswapd too and tons of
systems are running without lumpy and floods of order 9 allocations
with only compaction (in direct reclaim and kswapd) without the
slighest problem. Furthermore I extended compaction for all
allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
not to use compaction for every allocation including 1,2,3, and things
works fine this way.

For now, to fixup the reject I think I'll go ahead remove these new
lumpy changes, which also guarantees me the most tested configuration
that I'm sure works fine without having to test how "less unusable"
lumpy has become. If later I'll be asked to retain lumpy in order to
merge THP I'll simply add the __GFP_LUMPY and I'll restrict lumpy in
the sysfs tweaking corner case.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-27 16:41     ` Andrea Arcangeli
@ 2010-10-28  2:31       ` Ed Tomlinson
  -1 siblings, 0 replies; 66+ messages in thread
From: Ed Tomlinson @ 2010-10-28  2:31 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Mel Gorman, Wu Fengguang, Minchan Kim, Rik van Riel

On Wednesday 27 October 2010 12:41:38 Andrea Arcangeli wrote:
> I hope lumpy work stops here and that it goes away whenever THP is
> merged.

Andrea,

I've been running THP here for since May (#25).  Here it does its job as it should.

When do you see it as ready for inclusion?

Thanks
Ed


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-28  2:31       ` Ed Tomlinson
  0 siblings, 0 replies; 66+ messages in thread
From: Ed Tomlinson @ 2010-10-28  2:31 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Mel Gorman, Wu Fengguang, Minchan Kim, Rik van Riel

On Wednesday 27 October 2010 12:41:38 Andrea Arcangeli wrote:
> I hope lumpy work stops here and that it goes away whenever THP is
> merged.

Andrea,

I've been running THP here for since May (#25).  Here it does its job as it should.

When do you see it as ready for inclusion?

Thanks
Ed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-27 18:03         ` Andrea Arcangeli
@ 2010-10-28  8:00           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-10-28  8:00 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kosaki.motohiro, Mel Gorman, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Wu Fengguang, Minchan Kim, Rik van Riel

Hi

> My tree uses compaction in a fine way inside kswapd too and tons of
> systems are running without lumpy and floods of order 9 allocations
> with only compaction (in direct reclaim and kswapd) without the
> slighest problem. Furthermore I extended compaction for all
> allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
> removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
> not to use compaction for every allocation including 1,2,3, and things
> works fine this way.

Interesting. I parsed this you have compaction improvement. If so,
can you please post them? Generically, 1) improve the feature 2) remove
unused one is safety order. In the other hand, reverse order seems to has
regression risk.




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-28  8:00           ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-10-28  8:00 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kosaki.motohiro, Mel Gorman, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Wu Fengguang, Minchan Kim, Rik van Riel

Hi

> My tree uses compaction in a fine way inside kswapd too and tons of
> systems are running without lumpy and floods of order 9 allocations
> with only compaction (in direct reclaim and kswapd) without the
> slighest problem. Furthermore I extended compaction for all
> allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
> removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
> not to use compaction for every allocation including 1,2,3, and things
> works fine this way.

Interesting. I parsed this you have compaction improvement. If so,
can you please post them? Generically, 1) improve the feature 2) remove
unused one is safety order. In the other hand, reverse order seems to has
regression risk.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-27 18:03         ` Andrea Arcangeli
@ 2010-10-28 10:20           ` Mel Gorman
  -1 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-10-28 10:20 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

On Wed, Oct 27, 2010 at 08:03:33PM +0200, Andrea Arcangeli wrote:
> Hi Mel,
> 
> On Wed, Oct 27, 2010 at 06:16:43PM +0100, Mel Gorman wrote:
> > The series drastically limits the level of hammering lumpy does to the
> > system. I'm currently keeping it alive because lumpy reclaim has received a lot
> > more testing than compaction has. While I ultimately see it going away, I am
> > resisting it being deleted until compaction has been around for a few releases.
> 
> I admit I didn't yet test this modification yet to verify how
> "drastical" is the drastical change. But "less unusable lumpy" I doubt
> will translate to "as usable as without lumpy". And I doubt lumpy will
> ever lead to something "usable" when order 9 allocations are the norm
> and more frequent than order 0 allocations.
> 

Except in the case that this is what you really want - you want those order-9
pages even if it means there is a high allocating cost. This applies when
resizing the static huge page pool to start a very long-lived job for
example.

However, I do agree that lumpy really was bashing the system way too hard
which led to that series of patches. It still could be a lot lighter but
it's much better than it was.


> > Simply because it has been tested and even with compaction there were cases
> > envisoned where it would be used - low memory or when compaction is not
> > configured in for example. The ideal is that compaction is used until lumpy
> 
> Compaction should always be configured in.

That is not the same as guaranteed and it's disabled by default. While I'd
expect distros to enable it, they might not.

> All archs supports
> migration. Only reason to disable compaction is for debugging
> purposes, and should go in kernel hacking section. Or alternatively if
> it's not important that order >0 allocation succeeds (some embedded
> may be in that lucky situation and they can save some bytecode).
> 
> Keeping lumpy in and activated for all high order allocations like

This is not true. We always compact first if it is possible and only if
that fails do we fall back to direct reclaim leading to lumpy reclaim.

> this, can only _hide_ bugs and inefficiencies in compaction in my view
> so in addition to damaging the runtime, it fragment userbase and
> debuggability and I see zero good out of lumpy for all normal
> allocations.
> 

As both code paths are hit, I do not see how it fragments the userbase.
As for debuggability to see if lumpy reclaim is being relied on, one can
monitor if lumpy reclaim using the mm_vmscan_lru_isolate tracepoint and
checking the values for the "lumpy" fields or the order field there.

That said, if it was possible, I'd be also watching to see how often
lumpy reclaim or compaction is being used for small orders because if
it is a common occurance, there is something else possibly wrong (e.g.
MIGRATE_RESERVE broken again).

> > is necessary although this applies more to the static resizing of the huge
> > page pool than THP which I'd expect to backoff without using lumpy reclaim
> > i.e. fail the allocation rather than using lumpy reclaim.
> 
> I agree lumpy is more drastic and aggressive than reclaim and it may
> be quicker to generate hugepages by throwing its blind hammer, in turn
> destroying everything else running and hanging the system for a long
> while.

The point of the series was to be more precise about the hammering and
reduce the allocation latency while also reducing the number of pages
reclaimed and the disruption to the system.

> I wouldn't be so against lumpy if it was only activated by a
> special __GFP_LUMPY flag that only hugetlbfs pool resizing uses.
> hugetlbfs is the very special case, not all other normal
> allocations.
> 

Then by all means try taking it in this direction. I would prefer it over
deletion. Just be careful on what has to happen when compaction is not
available or when it fails because there was not enough memory free to both
allocate the page and satisfy watermarks. Returning "fail" only suits THP. It
doesn't suit static huge page pool resizing and it doesn't suit users of
dynamic huge page pool resizing either.

> > Uhhh, I have one more modification in mind when lumpy is involved and
> > it's to relax the zone watermark slightly to only obey up to
> > PAGE_ALLOC_COSTLY_ORDER. At the moment, it is freeing more pages than
> > are necessary to satisfy an allocation request and hits the system
> > harder than it should. Similar logic should apply to compaction.
> 
> On a side note I want to remove the PAGE_ALLOC_COSTLY_ORDER too, that
> is a flawed concept in the first place.

I didn't say remove. In zone_watermark_ok(), we do not consider any pages of
the lower orders to be free. Depending on the required watermark level, the
system can end up freeing multiple order-9 pages for 1 request unnecessarily.

> A VM that behaves radically
> (radically as in grinding system to an halt and being unusable and
> creating swap storms) different when the order of allocation raises
> from 3 to 4 is hackish and fundamentally incompatible with logics that
> uses frequent order 9 allocations and makes them the default.
> 

Are order-4 allocations common? If so, that in itself needs to be examined. If
they are kernel allocations, both compaction and lumpy reclaim are going
to hit a wall eventually as large numbers of long-lived high-order kernel
allocations impair anti-fragmentation.

> Basically anybody asking an order 9 during the normal runtime (not
> some magic sysfs control) has to be ok if it fails and only relay on
> compaction, or it's in some corner case and as such shall be threated
> instead of mandating the default VM behavior for >=4 order allocation
> for everything else.
> 

And as I've said before, there are users that are happy to wait while those
order-9 allocations happen because it occurs early in the lifetime of a very
long-lived process and where rebooting the machine is not an option.

> The PAGE_ALLOC_COSTLY_ORDER was in practice a not stack-local
> per-process equivalent of what I recommended as the way to trigger
> lumpy (i.e. __GFP_LUMPY), but it's not a good enough approximation
> anymore. So the "activation" for
> blindfolded-hammer-algorithm-creating-swap-storms has to be in
> function of the caller stack, and not in function of the allocation
> order. If that change is done, I won't be forced to drop lumpy
> anymore! But even then I find it hard to justify to keep lumpy alive
> unless it is proven to be more efficient than compaction. But I could
> avoid touching the lumpy code at least.
> 

To make compaction a full replacement for lumpy, reclaim would have to
know how to reclaim order-9 worth of pages and then compact properly.
It's not setup for this and a naive algorithm would spend a lot of time
in the compaction scanning code (which is pretty inefficient). A possible
alternative would be to lumpy-compact i.e. select a page from the LRU and
move all pages around it elsewhere. Again, this is not what we are currently
doing but it's a direction that could be taken.

> My tree uses compaction in a fine way inside kswapd too and tons of
> systems are running without lumpy and floods of order 9 allocations
> with only compaction (in direct reclaim and kswapd) without the
> slighest problem.
>
> Furthermore I extended compaction for all
> allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
> removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
> not to use compaction for every allocation including 1,2,3, and things
> works fine this way.
> 

I see no problem with using compaction for the lower orders when it is
available. It was during review that it got disabled because there were
concerns about how stable migration was as there were a number of bugs being
ironed out.

> For now, to fixup the reject I think I'll go ahead remove these new
> lumpy changes, which also guarantees me the most tested configuration
> that I'm sure works fine without having to test how "less unusable"
> lumpy has become. If later I'll be asked to retain lumpy in order to
> merge THP I'll simply add the __GFP_LUMPY and I'll restrict lumpy in
> the sysfs tweaking corner case.
> 

Just bear in mind what happens when compaction is not available.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-28 10:20           ` Mel Gorman
  0 siblings, 0 replies; 66+ messages in thread
From: Mel Gorman @ 2010-10-28 10:20 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

On Wed, Oct 27, 2010 at 08:03:33PM +0200, Andrea Arcangeli wrote:
> Hi Mel,
> 
> On Wed, Oct 27, 2010 at 06:16:43PM +0100, Mel Gorman wrote:
> > The series drastically limits the level of hammering lumpy does to the
> > system. I'm currently keeping it alive because lumpy reclaim has received a lot
> > more testing than compaction has. While I ultimately see it going away, I am
> > resisting it being deleted until compaction has been around for a few releases.
> 
> I admit I didn't yet test this modification yet to verify how
> "drastical" is the drastical change. But "less unusable lumpy" I doubt
> will translate to "as usable as without lumpy". And I doubt lumpy will
> ever lead to something "usable" when order 9 allocations are the norm
> and more frequent than order 0 allocations.
> 

Except in the case that this is what you really want - you want those order-9
pages even if it means there is a high allocating cost. This applies when
resizing the static huge page pool to start a very long-lived job for
example.

However, I do agree that lumpy really was bashing the system way too hard
which led to that series of patches. It still could be a lot lighter but
it's much better than it was.


> > Simply because it has been tested and even with compaction there were cases
> > envisoned where it would be used - low memory or when compaction is not
> > configured in for example. The ideal is that compaction is used until lumpy
> 
> Compaction should always be configured in.

That is not the same as guaranteed and it's disabled by default. While I'd
expect distros to enable it, they might not.

> All archs supports
> migration. Only reason to disable compaction is for debugging
> purposes, and should go in kernel hacking section. Or alternatively if
> it's not important that order >0 allocation succeeds (some embedded
> may be in that lucky situation and they can save some bytecode).
> 
> Keeping lumpy in and activated for all high order allocations like

This is not true. We always compact first if it is possible and only if
that fails do we fall back to direct reclaim leading to lumpy reclaim.

> this, can only _hide_ bugs and inefficiencies in compaction in my view
> so in addition to damaging the runtime, it fragment userbase and
> debuggability and I see zero good out of lumpy for all normal
> allocations.
> 

As both code paths are hit, I do not see how it fragments the userbase.
As for debuggability to see if lumpy reclaim is being relied on, one can
monitor if lumpy reclaim using the mm_vmscan_lru_isolate tracepoint and
checking the values for the "lumpy" fields or the order field there.

That said, if it was possible, I'd be also watching to see how often
lumpy reclaim or compaction is being used for small orders because if
it is a common occurance, there is something else possibly wrong (e.g.
MIGRATE_RESERVE broken again).

> > is necessary although this applies more to the static resizing of the huge
> > page pool than THP which I'd expect to backoff without using lumpy reclaim
> > i.e. fail the allocation rather than using lumpy reclaim.
> 
> I agree lumpy is more drastic and aggressive than reclaim and it may
> be quicker to generate hugepages by throwing its blind hammer, in turn
> destroying everything else running and hanging the system for a long
> while.

The point of the series was to be more precise about the hammering and
reduce the allocation latency while also reducing the number of pages
reclaimed and the disruption to the system.

> I wouldn't be so against lumpy if it was only activated by a
> special __GFP_LUMPY flag that only hugetlbfs pool resizing uses.
> hugetlbfs is the very special case, not all other normal
> allocations.
> 

Then by all means try taking it in this direction. I would prefer it over
deletion. Just be careful on what has to happen when compaction is not
available or when it fails because there was not enough memory free to both
allocate the page and satisfy watermarks. Returning "fail" only suits THP. It
doesn't suit static huge page pool resizing and it doesn't suit users of
dynamic huge page pool resizing either.

> > Uhhh, I have one more modification in mind when lumpy is involved and
> > it's to relax the zone watermark slightly to only obey up to
> > PAGE_ALLOC_COSTLY_ORDER. At the moment, it is freeing more pages than
> > are necessary to satisfy an allocation request and hits the system
> > harder than it should. Similar logic should apply to compaction.
> 
> On a side note I want to remove the PAGE_ALLOC_COSTLY_ORDER too, that
> is a flawed concept in the first place.

I didn't say remove. In zone_watermark_ok(), we do not consider any pages of
the lower orders to be free. Depending on the required watermark level, the
system can end up freeing multiple order-9 pages for 1 request unnecessarily.

> A VM that behaves radically
> (radically as in grinding system to an halt and being unusable and
> creating swap storms) different when the order of allocation raises
> from 3 to 4 is hackish and fundamentally incompatible with logics that
> uses frequent order 9 allocations and makes them the default.
> 

Are order-4 allocations common? If so, that in itself needs to be examined. If
they are kernel allocations, both compaction and lumpy reclaim are going
to hit a wall eventually as large numbers of long-lived high-order kernel
allocations impair anti-fragmentation.

> Basically anybody asking an order 9 during the normal runtime (not
> some magic sysfs control) has to be ok if it fails and only relay on
> compaction, or it's in some corner case and as such shall be threated
> instead of mandating the default VM behavior for >=4 order allocation
> for everything else.
> 

And as I've said before, there are users that are happy to wait while those
order-9 allocations happen because it occurs early in the lifetime of a very
long-lived process and where rebooting the machine is not an option.

> The PAGE_ALLOC_COSTLY_ORDER was in practice a not stack-local
> per-process equivalent of what I recommended as the way to trigger
> lumpy (i.e. __GFP_LUMPY), but it's not a good enough approximation
> anymore. So the "activation" for
> blindfolded-hammer-algorithm-creating-swap-storms has to be in
> function of the caller stack, and not in function of the allocation
> order. If that change is done, I won't be forced to drop lumpy
> anymore! But even then I find it hard to justify to keep lumpy alive
> unless it is proven to be more efficient than compaction. But I could
> avoid touching the lumpy code at least.
> 

To make compaction a full replacement for lumpy, reclaim would have to
know how to reclaim order-9 worth of pages and then compact properly.
It's not setup for this and a naive algorithm would spend a lot of time
in the compaction scanning code (which is pretty inefficient). A possible
alternative would be to lumpy-compact i.e. select a page from the LRU and
move all pages around it elsewhere. Again, this is not what we are currently
doing but it's a direction that could be taken.

> My tree uses compaction in a fine way inside kswapd too and tons of
> systems are running without lumpy and floods of order 9 allocations
> with only compaction (in direct reclaim and kswapd) without the
> slighest problem.
>
> Furthermore I extended compaction for all
> allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
> removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
> not to use compaction for every allocation including 1,2,3, and things
> works fine this way.
> 

I see no problem with using compaction for the lower orders when it is
available. It was during review that it got disabled because there were
concerns about how stable migration was as there were a number of bugs being
ironed out.

> For now, to fixup the reject I think I'll go ahead remove these new
> lumpy changes, which also guarantees me the most tested configuration
> that I'm sure works fine without having to test how "less unusable"
> lumpy has become. If later I'll be asked to retain lumpy in order to
> merge THP I'll simply add the __GFP_LUMPY and I'll restrict lumpy in
> the sysfs tweaking corner case.
> 

Just bear in mind what happens when compaction is not available.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-28  8:00           ` KOSAKI Motohiro
@ 2010-10-28 15:12             ` Andrea Arcangeli
  -1 siblings, 0 replies; 66+ messages in thread
From: Andrea Arcangeli @ 2010-10-28 15:12 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Mel Gorman, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

On Thu, Oct 28, 2010 at 05:00:57PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > My tree uses compaction in a fine way inside kswapd too and tons of
> > systems are running without lumpy and floods of order 9 allocations
> > with only compaction (in direct reclaim and kswapd) without the
> > slighest problem. Furthermore I extended compaction for all
> > allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
> > removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
> > not to use compaction for every allocation including 1,2,3, and things
> > works fine this way.
> 
> Interesting. I parsed this you have compaction improvement. If so,
> can you please post them? Generically, 1) improve the feature 2) remove
> unused one is safety order. In the other hand, reverse order seems to has
> regression risk.

THP is way higher priority than the compaction improvements, so the
compaction improvements are not at the top of the queue:

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commitdiff;h=d8f02410d718725a7daaf192af33abc41dcfae16;hp=39c4a61fedc5f5bf0c95a60483ac0acea1a9a757

At the top of the queue there is the lumpy_reclaim removal as that's
higher priority than THP.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-28 15:12             ` Andrea Arcangeli
  0 siblings, 0 replies; 66+ messages in thread
From: Andrea Arcangeli @ 2010-10-28 15:12 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Mel Gorman, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Wu Fengguang, Minchan Kim, Rik van Riel

On Thu, Oct 28, 2010 at 05:00:57PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > My tree uses compaction in a fine way inside kswapd too and tons of
> > systems are running without lumpy and floods of order 9 allocations
> > with only compaction (in direct reclaim and kswapd) without the
> > slighest problem. Furthermore I extended compaction for all
> > allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
> > removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
> > not to use compaction for every allocation including 1,2,3, and things
> > works fine this way.
> 
> Interesting. I parsed this you have compaction improvement. If so,
> can you please post them? Generically, 1) improve the feature 2) remove
> unused one is safety order. In the other hand, reverse order seems to has
> regression risk.

THP is way higher priority than the compaction improvements, so the
compaction improvements are not at the top of the queue:

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog

http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commitdiff;h=d8f02410d718725a7daaf192af33abc41dcfae16;hp=39c4a61fedc5f5bf0c95a60483ac0acea1a9a757

At the top of the queue there is the lumpy_reclaim removal as that's
higher priority than THP.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-28  2:31       ` Ed Tomlinson
@ 2010-10-28 15:22         ` Andrea Arcangeli
  -1 siblings, 0 replies; 66+ messages in thread
From: Andrea Arcangeli @ 2010-10-28 15:22 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Mel Gorman, Wu Fengguang, Minchan Kim, Rik van Riel

Hi,

On Wed, Oct 27, 2010 at 10:31:07PM -0400, Ed Tomlinson wrote:
> On Wednesday 27 October 2010 12:41:38 Andrea Arcangeli wrote:
> > I hope lumpy work stops here and that it goes away whenever THP is
> > merged.
> 
> Andrea,
> 
> I've been running THP here for since May (#25).  Here it does its job as it should.

Thanks for the report :).

> When do you see it as ready for inclusion?

It is already ready for inclusion. I posted it on 24 Oct to Linus but
got no answer. I guess he's being busy with merging the other stuff
(including the lumpy improvements that now requires me to remove more
stuff, but that's not big deal, but I'll have to audit everything and
separate the good from the bad and identify any real fix from the
lumpy stuff that I'll drop, I'll drop also removing the vmstat.h
lumpy stats, so some userland rebuild may be needed).

As long as sc->lumpy_reclaim_mode is always = LUMPY_MODE_NONE,
whenever sc->order == 0, I'm not going to let it live in my tree.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-28 15:22         ` Andrea Arcangeli
  0 siblings, 0 replies; 66+ messages in thread
From: Andrea Arcangeli @ 2010-10-28 15:22 UTC (permalink / raw)
  To: Ed Tomlinson
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Johannes Weiner,
	Mel Gorman, Wu Fengguang, Minchan Kim, Rik van Riel

Hi,

On Wed, Oct 27, 2010 at 10:31:07PM -0400, Ed Tomlinson wrote:
> On Wednesday 27 October 2010 12:41:38 Andrea Arcangeli wrote:
> > I hope lumpy work stops here and that it goes away whenever THP is
> > merged.
> 
> Andrea,
> 
> I've been running THP here for since May (#25).  Here it does its job as it should.

Thanks for the report :).

> When do you see it as ready for inclusion?

It is already ready for inclusion. I posted it on 24 Oct to Linus but
got no answer. I guess he's being busy with merging the other stuff
(including the lumpy improvements that now requires me to remove more
stuff, but that's not big deal, but I'll have to audit everything and
separate the good from the bad and identify any real fix from the
lumpy stuff that I'll drop, I'll drop also removing the vmstat.h
lumpy stats, so some userland rebuild may be needed).

As long as sc->lumpy_reclaim_mode is always = LUMPY_MODE_NONE,
whenever sc->order == 0, I'm not going to let it live in my tree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-28 15:12             ` Andrea Arcangeli
@ 2010-10-29  2:23               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-10-29  2:23 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kosaki.motohiro, Mel Gorman, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Wu Fengguang, Minchan Kim, Rik van Riel

> On Thu, Oct 28, 2010 at 05:00:57PM +0900, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > My tree uses compaction in a fine way inside kswapd too and tons of
> > > systems are running without lumpy and floods of order 9 allocations
> > > with only compaction (in direct reclaim and kswapd) without the
> > > slighest problem. Furthermore I extended compaction for all
> > > allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
> > > removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
> > > not to use compaction for every allocation including 1,2,3, and things
> > > works fine this way.
> > 
> > Interesting. I parsed this you have compaction improvement. If so,
> > can you please post them? Generically, 1) improve the feature 2) remove
> > unused one is safety order. In the other hand, reverse order seems to has
> > regression risk.
> 
> THP is way higher priority than the compaction improvements, so the
> compaction improvements are not at the top of the queue:
> 
> http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog
> 
> http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commitdiff;h=d8f02410d718725a7daaf192af33abc41dcfae16;hp=39c4a61fedc5f5bf0c95a60483ac0acea1a9a757
> 
> At the top of the queue there is the lumpy_reclaim removal as that's
> higher priority than THP.

Umm... 

If THP vs lumpy confliction is most big matter, I'd prefer automatical 
lumpy disabling when THP enabled rather than completely removing. It is lower risk.
And, After finish to improve compaction, I expect we will be able to discuss
remove thing. 

If my parse is correct, You have tested "improved-compaction + no-lumpy + THP"
combination, but nobody have tested "current-compaction + no-lumpy". 
IOW, I only say I dislike a regression.






^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-10-29  2:23               ` KOSAKI Motohiro
  0 siblings, 0 replies; 66+ messages in thread
From: KOSAKI Motohiro @ 2010-10-29  2:23 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kosaki.motohiro, Mel Gorman, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Wu Fengguang, Minchan Kim, Rik van Riel

> On Thu, Oct 28, 2010 at 05:00:57PM +0900, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > My tree uses compaction in a fine way inside kswapd too and tons of
> > > systems are running without lumpy and floods of order 9 allocations
> > > with only compaction (in direct reclaim and kswapd) without the
> > > slighest problem. Furthermore I extended compaction for all
> > > allocations not just that PAGE_ALLOC_COSTLY_ORDER (maybe I already
> > > removed all PAGE_ALLOC_COSTLY_ORDER checks?). There's no good reason
> > > not to use compaction for every allocation including 1,2,3, and things
> > > works fine this way.
> > 
> > Interesting. I parsed this you have compaction improvement. If so,
> > can you please post them? Generically, 1) improve the feature 2) remove
> > unused one is safety order. In the other hand, reverse order seems to has
> > regression risk.
> 
> THP is way higher priority than the compaction improvements, so the
> compaction improvements are not at the top of the queue:
> 
> http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=shortlog
> 
> http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commitdiff;h=d8f02410d718725a7daaf192af33abc41dcfae16;hp=39c4a61fedc5f5bf0c95a60483ac0acea1a9a757
> 
> At the top of the queue there is the lumpy_reclaim removal as that's
> higher priority than THP.

Umm... 

If THP vs lumpy confliction is most big matter, I'd prefer automatical 
lumpy disabling when THP enabled rather than completely removing. It is lower risk.
And, After finish to improve compaction, I expect we will be able to discuss
remove thing. 

If my parse is correct, You have tested "improved-compaction + no-lumpy + THP"
combination, but nobody have tested "current-compaction + no-lumpy". 
IOW, I only say I dislike a regression.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
  2010-10-28 10:20           ` Mel Gorman
@ 2010-11-02  2:04             ` Wu Fengguang
  -1 siblings, 0 replies; 66+ messages in thread
From: Wu Fengguang @ 2010-11-02  2:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Minchan Kim, Rik van Riel

> To make compaction a full replacement for lumpy, reclaim would have to
> know how to reclaim order-9 worth of pages and then compact properly.
> It's not setup for this and a naive algorithm would spend a lot of time
> in the compaction scanning code (which is pretty inefficient). A possible
> alternative would be to lumpy-compact i.e. select a page from the LRU and
> move all pages around it elsewhere. Again, this is not what we are currently
> doing but it's a direction that could be taken.

Agreed. The more lumpy reclaim, the more young pages being wrongly
evicted. THP could trigger lumpy reclaims heavily, that's why Andreas
need to disable it. Lumpy migration looks much better.  Compaction
looks like some pre/batched lumpy-migration. We may also do on demand
lumpy migration in future.

> +static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> +                                bool sync)
> +{
> +     enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> +
> +     /*
> +      * Some reclaim have alredy been failed. No worth to try synchronous
> +      * lumpy reclaim.
> +      */
> +     if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> +             return;
> +
> +     /*
> +      * If we need a large contiguous chunk of memory, or have
> +      * trouble getting a small set of contiguous pages, we
> +      * will reclaim both active and inactive pages.
> +      */
> +     if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +             sc->lumpy_reclaim_mode = mode;
> +     else if (sc->order && priority < DEF_PRIORITY - 2)
> +             sc->lumpy_reclaim_mode = mode;
> +     else
> +             sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}

Andrea, I don't see the conflicts in doing lumpy reclaim improvements
in parallel to compaction and THP. If lumpy reclaim hurts THP, it can
be trivially disabled in your tree for huge page order allocations?

+       if (sc->order > 9)
+               sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition
@ 2010-11-02  2:04             ` Wu Fengguang
  0 siblings, 0 replies; 66+ messages in thread
From: Wu Fengguang @ 2010-11-02  2:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrea Arcangeli, KOSAKI Motohiro, LKML, linux-mm, Andrew Morton,
	Johannes Weiner, Minchan Kim, Rik van Riel

> To make compaction a full replacement for lumpy, reclaim would have to
> know how to reclaim order-9 worth of pages and then compact properly.
> It's not setup for this and a naive algorithm would spend a lot of time
> in the compaction scanning code (which is pretty inefficient). A possible
> alternative would be to lumpy-compact i.e. select a page from the LRU and
> move all pages around it elsewhere. Again, this is not what we are currently
> doing but it's a direction that could be taken.

Agreed. The more lumpy reclaim, the more young pages being wrongly
evicted. THP could trigger lumpy reclaims heavily, that's why Andreas
need to disable it. Lumpy migration looks much better.  Compaction
looks like some pre/batched lumpy-migration. We may also do on demand
lumpy migration in future.

> +static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc,
> +                                bool sync)
> +{
> +     enum lumpy_mode mode = sync ? LUMPY_MODE_SYNC : LUMPY_MODE_ASYNC;
> +
> +     /*
> +      * Some reclaim have alredy been failed. No worth to try synchronous
> +      * lumpy reclaim.
> +      */
> +     if (sync && sc->lumpy_reclaim_mode == LUMPY_MODE_NONE)
> +             return;
> +
> +     /*
> +      * If we need a large contiguous chunk of memory, or have
> +      * trouble getting a small set of contiguous pages, we
> +      * will reclaim both active and inactive pages.
> +      */
> +     if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> +             sc->lumpy_reclaim_mode = mode;
> +     else if (sc->order && priority < DEF_PRIORITY - 2)
> +             sc->lumpy_reclaim_mode = mode;
> +     else
> +             sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;
> +}

Andrea, I don't see the conflicts in doing lumpy reclaim improvements
in parallel to compaction and THP. If lumpy reclaim hurts THP, it can
be trivially disabled in your tree for huge page order allocations?

+       if (sc->order > 9)
+               sc->lumpy_reclaim_mode = LUMPY_MODE_NONE;

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2010-11-02  2:05 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-05  6:11 [RFC][PATCH 0/7] low latency synchrounous lumpy reclaim KOSAKI Motohiro
2010-08-05  6:11 ` KOSAKI Motohiro
2010-08-05  6:12 ` [PATCH 1/7] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls KOSAKI Motohiro
2010-08-05  6:12   ` KOSAKI Motohiro
2010-08-05 15:02   ` Mel Gorman
2010-08-05 15:02     ` Mel Gorman
2010-08-08  6:42     ` KOSAKI Motohiro
2010-08-08  6:42       ` KOSAKI Motohiro
2010-08-05 15:19   ` Rik van Riel
2010-08-05 15:19     ` Rik van Riel
2010-08-05  6:13 ` [PATCH 2/7] vmscan: synchronous lumpy reclaim don't call congestion_wait() KOSAKI Motohiro
2010-08-05  6:13   ` KOSAKI Motohiro
2010-08-05 13:55   ` Minchan Kim
2010-08-05 13:55     ` Minchan Kim
2010-08-05 15:05   ` Rik van Riel
2010-08-05 15:05     ` Rik van Riel
2010-08-05 15:06   ` Mel Gorman
2010-08-05 15:06     ` Mel Gorman
2010-08-05  6:13 ` [PATCH 3/7] vmscan: synchrounous lumpy reclaim use lock_page() instead trylock_page() KOSAKI Motohiro
2010-08-05  6:13   ` KOSAKI Motohiro
2010-08-05 14:17   ` Minchan Kim
2010-08-05 14:17     ` Minchan Kim
2010-08-06  0:52     ` Minchan Kim
2010-08-06  0:52       ` Minchan Kim
2010-08-05 15:12   ` Mel Gorman
2010-08-05 15:12     ` Mel Gorman
2010-08-05 15:26   ` Rik van Riel
2010-08-05 15:26     ` Rik van Riel
2010-08-05  6:14 ` [PATCH 4/7] vmscan: narrowing synchrounous lumply reclaim condition KOSAKI Motohiro
2010-08-05  6:14   ` KOSAKI Motohiro
2010-08-05 14:59   ` Minchan Kim
2010-08-05 14:59     ` Minchan Kim
2010-10-27 16:41   ` Andrea Arcangeli
2010-10-27 16:41     ` Andrea Arcangeli
2010-10-27 17:16     ` Mel Gorman
2010-10-27 17:16       ` Mel Gorman
2010-10-27 18:03       ` Andrea Arcangeli
2010-10-27 18:03         ` Andrea Arcangeli
2010-10-28  8:00         ` KOSAKI Motohiro
2010-10-28  8:00           ` KOSAKI Motohiro
2010-10-28 15:12           ` Andrea Arcangeli
2010-10-28 15:12             ` Andrea Arcangeli
2010-10-29  2:23             ` KOSAKI Motohiro
2010-10-29  2:23               ` KOSAKI Motohiro
2010-10-28 10:20         ` Mel Gorman
2010-10-28 10:20           ` Mel Gorman
2010-11-02  2:04           ` Wu Fengguang
2010-11-02  2:04             ` Wu Fengguang
2010-10-28  2:31     ` Ed Tomlinson
2010-10-28  2:31       ` Ed Tomlinson
2010-10-28 15:22       ` Andrea Arcangeli
2010-10-28 15:22         ` Andrea Arcangeli
2010-08-05  6:14 ` [PATCH 5/7] vmscan: kill dead code in shrink_inactive_list() KOSAKI Motohiro
2010-08-05  6:14   ` KOSAKI Motohiro
2010-08-05 15:08   ` Minchan Kim
2010-08-05 15:08     ` Minchan Kim
2010-08-05 15:14   ` Mel Gorman
2010-08-05 15:14     ` Mel Gorman
2010-08-05  6:15 ` [PATCH 6/7] vmscan: remove PF_SWAPWRITE from __zone_reclaim() KOSAKI Motohiro
2010-08-05  6:15   ` KOSAKI Motohiro
2010-08-05  6:16 ` [PATCH 7/7] vmscan: isolated_lru_pages() stop neighbor search if neighbor can't be isolated KOSAKI Motohiro
2010-08-05  6:16   ` KOSAKI Motohiro
2010-08-05 15:25   ` Mel Gorman
2010-08-05 15:25     ` Mel Gorman
2010-08-05 15:40   ` Minchan Kim
2010-08-05 15:40     ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.