All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/18] make direct compaction more deterministic
@ 2016-05-31 13:08 ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

This is mostly a followup to Michal's oom detection rework, which highlighted
the need for direct compaction to provide better feedback in reclaim/compaction
loop, so that it can reliably recognize when compaction cannot make further
progress, and allocation should invoke OOM killer or fail. We've discussed
this at LSF/MM [1] where I proposed expanding the async/sync migration mode
used in compaction to more general "priorities". This patchset adds one new
priority that just overrides all the heuristics and makes compaction fully
scan all zones. I don't currently think that we need more fine-grained
priorities, but we'll see. Other than that there's some smaller fixes and
cleanups, mainly related to the THP-specific hacks.

Changes since v1 RFC:

* Incorporate feedback from Michal, Joonsoo, Tetsuo
* Expanded cleanup of watermark checks controlling reclaim/compaction

I've tested this with stress-highalloc in GFP_KERNEL order-4 and
GFP_HIGHUSER_MOVABLE order-9 scenarios. There's not much report but noise,
except reductions in direct reclaim.

order-9:

Direct pages scanned                238949       41502
Kswapd pages scanned               2069710     2229295
Kswapd pages reclaimed             1981047     2139089
Direct pages reclaimed              236534       41502

order-4:

Direct pages scanned                204214      110733
Kswapd pages scanned               2125221     2179180
Kswapd pages reclaimed             2027102     2098257
Direct pages reclaimed              194942      110695

Also Patch 1 describes reductions in page migration failures.

The series is based on 4.7-rc1.

[1] https://lwn.net/Articles/684611/

Hugh Dickins (1):
  mm, compaction: don't isolate PageWriteback pages in
    MIGRATE_SYNC_LIGHT mode

Vlastimil Babka (17):
  mm, page_alloc: set alloc_flags only once in slowpath
  mm, page_alloc: don't retry initial attempt in slowpath
  mm, page_alloc: restructure direct compaction handling in slowpath
  mm, page_alloc: make THP-specific decisions more generic
  mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
  mm, compaction: introduce direct compaction priority
  mm, compaction: simplify contended compaction handling
  mm, compaction: make whole_zone flag ignore cached scanner positions
  mm, compaction: cleanup unused functions
  mm, compaction: add the ultimate direct compaction priority
  mm, compaction: more reliably increase direct compaction priority
  mm, compaction: use correct watermark when checking allocation success
  mm, compaction: create compact_gap wrapper
  mm, compaction: use proper alloc_flags in __compaction_suitable()
  mm, compaction: require only min watermarks for non-costly orders
  mm, vmscan: make compaction_ready() more accurate and readable
  mm, vmscan: use proper classzone_idx in should_continue_reclaim()

 include/linux/compaction.h        | 101 +++++-----------
 include/linux/gfp.h               |  14 ++-
 include/trace/events/compaction.h |  12 +-
 include/trace/events/mmflags.h    |   1 +
 mm/compaction.c                   | 186 ++++++++++-------------------
 mm/huge_memory.c                  |  27 +++--
 mm/internal.h                     |   7 +-
 mm/migrate.c                      |   2 +-
 mm/page_alloc.c                   | 241 ++++++++++++++++++--------------------
 mm/vmscan.c                       |  80 +++++--------
 tools/perf/builtin-kmem.c         |   1 +
 11 files changed, 271 insertions(+), 401 deletions(-)

-- 
2.8.3

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 00/18] make direct compaction more deterministic
@ 2016-05-31 13:08 ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

This is mostly a followup to Michal's oom detection rework, which highlighted
the need for direct compaction to provide better feedback in reclaim/compaction
loop, so that it can reliably recognize when compaction cannot make further
progress, and allocation should invoke OOM killer or fail. We've discussed
this at LSF/MM [1] where I proposed expanding the async/sync migration mode
used in compaction to more general "priorities". This patchset adds one new
priority that just overrides all the heuristics and makes compaction fully
scan all zones. I don't currently think that we need more fine-grained
priorities, but we'll see. Other than that there's some smaller fixes and
cleanups, mainly related to the THP-specific hacks.

Changes since v1 RFC:

* Incorporate feedback from Michal, Joonsoo, Tetsuo
* Expanded cleanup of watermark checks controlling reclaim/compaction

I've tested this with stress-highalloc in GFP_KERNEL order-4 and
GFP_HIGHUSER_MOVABLE order-9 scenarios. There's not much report but noise,
except reductions in direct reclaim.

order-9:

Direct pages scanned                238949       41502
Kswapd pages scanned               2069710     2229295
Kswapd pages reclaimed             1981047     2139089
Direct pages reclaimed              236534       41502

order-4:

Direct pages scanned                204214      110733
Kswapd pages scanned               2125221     2179180
Kswapd pages reclaimed             2027102     2098257
Direct pages reclaimed              194942      110695

Also Patch 1 describes reductions in page migration failures.

The series is based on 4.7-rc1.

[1] https://lwn.net/Articles/684611/

Hugh Dickins (1):
  mm, compaction: don't isolate PageWriteback pages in
    MIGRATE_SYNC_LIGHT mode

Vlastimil Babka (17):
  mm, page_alloc: set alloc_flags only once in slowpath
  mm, page_alloc: don't retry initial attempt in slowpath
  mm, page_alloc: restructure direct compaction handling in slowpath
  mm, page_alloc: make THP-specific decisions more generic
  mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
  mm, compaction: introduce direct compaction priority
  mm, compaction: simplify contended compaction handling
  mm, compaction: make whole_zone flag ignore cached scanner positions
  mm, compaction: cleanup unused functions
  mm, compaction: add the ultimate direct compaction priority
  mm, compaction: more reliably increase direct compaction priority
  mm, compaction: use correct watermark when checking allocation success
  mm, compaction: create compact_gap wrapper
  mm, compaction: use proper alloc_flags in __compaction_suitable()
  mm, compaction: require only min watermarks for non-costly orders
  mm, vmscan: make compaction_ready() more accurate and readable
  mm, vmscan: use proper classzone_idx in should_continue_reclaim()

 include/linux/compaction.h        | 101 +++++-----------
 include/linux/gfp.h               |  14 ++-
 include/trace/events/compaction.h |  12 +-
 include/trace/events/mmflags.h    |   1 +
 mm/compaction.c                   | 186 ++++++++++-------------------
 mm/huge_memory.c                  |  27 +++--
 mm/internal.h                     |   7 +-
 mm/migrate.c                      |   2 +-
 mm/page_alloc.c                   | 241 ++++++++++++++++++--------------------
 mm/vmscan.c                       |  80 +++++--------
 tools/perf/builtin-kmem.c         |   1 +
 11 files changed, 271 insertions(+), 401 deletions(-)

-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 01/18] mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Hugh Dickins, Vlastimil Babka

From: Hugh Dickins <hughd@google.com>

At present MIGRATE_SYNC_LIGHT is allowing __isolate_lru_page() to
isolate a PageWriteback page, which __unmap_and_move() then rejects
with -EBUSY: of course the writeback might complete in between, but
that's not what we usually expect, so probably better not to isolate it.

When tested by stress-highalloc from mmtests, this has reduced the number of
page migrate failures by 60-70%.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/compaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 1427366ad673..e611f3f90f5f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1146,7 +1146,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	struct page *page;
 	const isolate_mode_t isolate_mode =
 		(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
-		(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
+		(cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);
 
 	/*
 	 * Start at where we last stopped, or beginning of the zone as
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 01/18] mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Hugh Dickins, Vlastimil Babka

From: Hugh Dickins <hughd@google.com>

At present MIGRATE_SYNC_LIGHT is allowing __isolate_lru_page() to
isolate a PageWriteback page, which __unmap_and_move() then rejects
with -EBUSY: of course the writeback might complete in between, but
that's not what we usually expect, so probably better not to isolate it.

When tested by stress-highalloc from mmtests, this has reduced the number of
page migrate failures by 60-70%.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/compaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 1427366ad673..e611f3f90f5f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1146,7 +1146,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	struct page *page;
 	const isolate_mode_t isolate_mode =
 		(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
-		(cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0);
+		(cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);
 
 	/*
 	 * Start at where we last stopped, or beginning of the zone as
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 02/18] mm, page_alloc: set alloc_flags only once in slowpath
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

In __alloc_pages_slowpath(), alloc_flags doesn't change after it's initialized,
so move the initialization above the retry: label. Also make the comment above
the initialization more descriptive.

The only exception in the alloc_flags being constant is ALLOC_NO_WATERMARKS,
which may change due to TIF_MEMDIE being set on the allocating thread. We can
fix this, and make the code simpler and a bit more effective at the same time,
by moving the part that determines ALLOC_NO_WATERMARKS from
gfp_to_alloc_flags() to gfp_pfmemalloc_allowed(). This means we don't have to
mask out ALLOC_NO_WATERMARKS in numerous places in __alloc_pages_slowpath()
anymore. The only test for the flag can instead call gfp_pfmemalloc_allowed().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 49 ++++++++++++++++++++++++-------------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8f3bfc435ee..da3a62a94b4a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3203,8 +3203,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	 */
 	count_vm_event(COMPACTSTALL);
 
-	page = get_page_from_freelist(gfp_mask, order,
-					alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
+	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	if (page) {
 		struct zone *zone = page_zone(page);
@@ -3372,8 +3371,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 		return NULL;
 
 retry:
-	page = get_page_from_freelist(gfp_mask, order,
-					alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
+	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
@@ -3431,16 +3429,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	} else if (unlikely(rt_task(current)) && !in_interrupt())
 		alloc_flags |= ALLOC_HARDER;
 
-	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-		if (gfp_mask & __GFP_MEMALLOC)
-			alloc_flags |= ALLOC_NO_WATERMARKS;
-		else if (in_serving_softirq() && (current->flags & PF_MEMALLOC))
-			alloc_flags |= ALLOC_NO_WATERMARKS;
-		else if (!in_interrupt() &&
-				((current->flags & PF_MEMALLOC) ||
-				 unlikely(test_thread_flag(TIF_MEMDIE))))
-			alloc_flags |= ALLOC_NO_WATERMARKS;
-	}
 #ifdef CONFIG_CMA
 	if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
@@ -3450,7 +3438,19 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 
 bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 {
-	return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS);
+	if (unlikely(gfp_mask & __GFP_NOMEMALLOC))
+		return false;
+
+	if (gfp_mask & __GFP_MEMALLOC)
+		return true;
+	if (in_serving_softirq() && (current->flags & PF_MEMALLOC))
+		return true;
+	if (!in_interrupt() &&
+			((current->flags & PF_MEMALLOC) ||
+			 unlikely(test_thread_flag(TIF_MEMDIE))))
+		return true;
+
+	return false;
 }
 
 static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
@@ -3585,25 +3585,24 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
 		gfp_mask &= ~__GFP_ATOMIC;
 
-retry:
-	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
-		wake_all_kswapds(order, ac);
-
 	/*
-	 * OK, we're below the kswapd watermark and have kicked background
-	 * reclaim. Now things get more complex, so set up alloc_flags according
-	 * to how we want to proceed.
+	 * The fast path uses conservative alloc_flags to succeed only until
+	 * kswapd needs to be woken up, and to avoid the cost of setting up
+	 * alloc_flags precisely. So we do that now.
 	 */
 	alloc_flags = gfp_to_alloc_flags(gfp_mask);
 
+retry:
+	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
+		wake_all_kswapds(order, ac);
+
 	/* This is the last chance, in general, before the goto nopage. */
-	page = get_page_from_freelist(gfp_mask, order,
-				alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
+	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 	if (page)
 		goto got_pg;
 
 	/* Allocate without watermarks if the context allows */
-	if (alloc_flags & ALLOC_NO_WATERMARKS) {
+	if (gfp_pfmemalloc_allowed(gfp_mask)) {
 		/*
 		 * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
 		 * the allocation is high priority and these type of
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 02/18] mm, page_alloc: set alloc_flags only once in slowpath
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

In __alloc_pages_slowpath(), alloc_flags doesn't change after it's initialized,
so move the initialization above the retry: label. Also make the comment above
the initialization more descriptive.

The only exception in the alloc_flags being constant is ALLOC_NO_WATERMARKS,
which may change due to TIF_MEMDIE being set on the allocating thread. We can
fix this, and make the code simpler and a bit more effective at the same time,
by moving the part that determines ALLOC_NO_WATERMARKS from
gfp_to_alloc_flags() to gfp_pfmemalloc_allowed(). This means we don't have to
mask out ALLOC_NO_WATERMARKS in numerous places in __alloc_pages_slowpath()
anymore. The only test for the flag can instead call gfp_pfmemalloc_allowed().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 49 ++++++++++++++++++++++++-------------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8f3bfc435ee..da3a62a94b4a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3203,8 +3203,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	 */
 	count_vm_event(COMPACTSTALL);
 
-	page = get_page_from_freelist(gfp_mask, order,
-					alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
+	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	if (page) {
 		struct zone *zone = page_zone(page);
@@ -3372,8 +3371,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 		return NULL;
 
 retry:
-	page = get_page_from_freelist(gfp_mask, order,
-					alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
+	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
@@ -3431,16 +3429,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	} else if (unlikely(rt_task(current)) && !in_interrupt())
 		alloc_flags |= ALLOC_HARDER;
 
-	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-		if (gfp_mask & __GFP_MEMALLOC)
-			alloc_flags |= ALLOC_NO_WATERMARKS;
-		else if (in_serving_softirq() && (current->flags & PF_MEMALLOC))
-			alloc_flags |= ALLOC_NO_WATERMARKS;
-		else if (!in_interrupt() &&
-				((current->flags & PF_MEMALLOC) ||
-				 unlikely(test_thread_flag(TIF_MEMDIE))))
-			alloc_flags |= ALLOC_NO_WATERMARKS;
-	}
 #ifdef CONFIG_CMA
 	if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
 		alloc_flags |= ALLOC_CMA;
@@ -3450,7 +3438,19 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 
 bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 {
-	return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS);
+	if (unlikely(gfp_mask & __GFP_NOMEMALLOC))
+		return false;
+
+	if (gfp_mask & __GFP_MEMALLOC)
+		return true;
+	if (in_serving_softirq() && (current->flags & PF_MEMALLOC))
+		return true;
+	if (!in_interrupt() &&
+			((current->flags & PF_MEMALLOC) ||
+			 unlikely(test_thread_flag(TIF_MEMDIE))))
+		return true;
+
+	return false;
 }
 
 static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
@@ -3585,25 +3585,24 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
 		gfp_mask &= ~__GFP_ATOMIC;
 
-retry:
-	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
-		wake_all_kswapds(order, ac);
-
 	/*
-	 * OK, we're below the kswapd watermark and have kicked background
-	 * reclaim. Now things get more complex, so set up alloc_flags according
-	 * to how we want to proceed.
+	 * The fast path uses conservative alloc_flags to succeed only until
+	 * kswapd needs to be woken up, and to avoid the cost of setting up
+	 * alloc_flags precisely. So we do that now.
 	 */
 	alloc_flags = gfp_to_alloc_flags(gfp_mask);
 
+retry:
+	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
+		wake_all_kswapds(order, ac);
+
 	/* This is the last chance, in general, before the goto nopage. */
-	page = get_page_from_freelist(gfp_mask, order,
-				alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
+	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 	if (page)
 		goto got_pg;
 
 	/* Allocate without watermarks if the context allows */
-	if (alloc_flags & ALLOC_NO_WATERMARKS) {
+	if (gfp_pfmemalloc_allowed(gfp_mask)) {
 		/*
 		 * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
 		 * the allocation is high priority and these type of
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 03/18] mm, page_alloc: don't retry initial attempt in slowpath
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

After __alloc_pages_slowpath() sets up new alloc_flags and wakes up kswapd, it
first tries get_page_from_freelist() with the new alloc_flags, as it may
succeed e.g. due to using min watermark instead of low watermark. This attempt
does not have to be retried on each loop, since direct reclaim, direct
compaction and oom call get_page_from_freelist() themselves. There is a corner
case where direct reclaim does not attempt allocation when there is no reclaim
progress, but that is trivial to adjust.

This patch therefore moves the initial attempt above the retry label. The
ALLOC_NO_WATERMARKS attempt is kept under retry label as it's special and
should be retried after each loop. Kswapd wakeups are also done on each retry
to be safe from potential races resulting in kswapd going to sleep while a
process (that may not be able to reclaim by itself) is still looping.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index da3a62a94b4a..9f83259a18a8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3367,10 +3367,9 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 	bool drained = false;
 
 	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
-	if (unlikely(!(*did_some_progress)))
-		return NULL;
 
 retry:
+	/* We attempt even when no progress, as kswapd might have done some */
 	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	/*
@@ -3378,7 +3377,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 	 * pages are pinned on the per-cpu lists or in high alloc reserves.
 	 * Shrink them them and try again
 	 */
-	if (!page && !drained) {
+	if (!page && *did_some_progress && !drained) {
 		unreserve_highatomic_pageblock(ac);
 		drain_all_pages(NULL);
 		drained = true;
@@ -3592,15 +3591,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 */
 	alloc_flags = gfp_to_alloc_flags(gfp_mask);
 
-retry:
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
 		wake_all_kswapds(order, ac);
 
-	/* This is the last chance, in general, before the goto nopage. */
+	/*
+	 * The adjusted alloc_flags might result in immediate success, so try
+	 * that first
+	 */
 	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 	if (page)
 		goto got_pg;
 
+retry:
+	/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
+	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
+		wake_all_kswapds(order, ac);
+
 	/* Allocate without watermarks if the context allows */
 	if (gfp_pfmemalloc_allowed(gfp_mask)) {
 		/*
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 03/18] mm, page_alloc: don't retry initial attempt in slowpath
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

After __alloc_pages_slowpath() sets up new alloc_flags and wakes up kswapd, it
first tries get_page_from_freelist() with the new alloc_flags, as it may
succeed e.g. due to using min watermark instead of low watermark. This attempt
does not have to be retried on each loop, since direct reclaim, direct
compaction and oom call get_page_from_freelist() themselves. There is a corner
case where direct reclaim does not attempt allocation when there is no reclaim
progress, but that is trivial to adjust.

This patch therefore moves the initial attempt above the retry label. The
ALLOC_NO_WATERMARKS attempt is kept under retry label as it's special and
should be retried after each loop. Kswapd wakeups are also done on each retry
to be safe from potential races resulting in kswapd going to sleep while a
process (that may not be able to reclaim by itself) is still looping.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index da3a62a94b4a..9f83259a18a8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3367,10 +3367,9 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 	bool drained = false;
 
 	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
-	if (unlikely(!(*did_some_progress)))
-		return NULL;
 
 retry:
+	/* We attempt even when no progress, as kswapd might have done some */
 	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	/*
@@ -3378,7 +3377,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 	 * pages are pinned on the per-cpu lists or in high alloc reserves.
 	 * Shrink them them and try again
 	 */
-	if (!page && !drained) {
+	if (!page && *did_some_progress && !drained) {
 		unreserve_highatomic_pageblock(ac);
 		drain_all_pages(NULL);
 		drained = true;
@@ -3592,15 +3591,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 */
 	alloc_flags = gfp_to_alloc_flags(gfp_mask);
 
-retry:
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
 		wake_all_kswapds(order, ac);
 
-	/* This is the last chance, in general, before the goto nopage. */
+	/*
+	 * The adjusted alloc_flags might result in immediate success, so try
+	 * that first
+	 */
 	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 	if (page)
 		goto got_pg;
 
+retry:
+	/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
+	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
+		wake_all_kswapds(order, ac);
+
 	/* Allocate without watermarks if the context allows */
 	if (gfp_pfmemalloc_allowed(gfp_mask)) {
 		/*
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 04/18] mm, page_alloc: restructure direct compaction handling in slowpath
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The retry loop in __alloc_pages_slowpath is supposed to keep trying reclaim
and compaction (and OOM), until either the allocation succeeds, or returns
with failure. Success here is more probable when reclaim precedes compaction,
as certain watermarks have to be met for compaction to even try, and more free
pages increase the probability of compaction success. On the other hand,
starting with light async compaction (if the watermarks allow it), can be
more efficient, especially for smaller orders, if there's enough free memory
which is just fragmented.

Thus, the current code starts with compaction before reclaim, and to make sure
that the last reclaim is always followed by a final compaction, there's another
direct compaction call at the end of the loop. This makes the code hard to
follow and adds some duplicated handling of migration_mode decisions. It's also
somewhat inefficient that even if reclaim or compaction decides not to retry,
the final compaction is still attempted. Some gfp flags combination also
shortcut these retry decisions by "goto noretry;", making it even harder to
follow.

This patch attempts to restructure the code with only minimal functional
changes. The call to the first compaction and THP-specific checks are now
placed above the retry loop, and the "noretry" direct compaction is removed.

The initial compaction is additionally restricted only to costly orders, as we
can expect smaller orders to be held back by watermarks, and only larger orders
to suffer primarily from fragmentation. This better matches the checks in
reclaim's shrink_zones().

There are two other smaller functional changes. One is that the upgrade from
async migration to light sync migration will always occur after the initial
compaction. This is how it has been until recent patch "mm, oom: protect
!costly allocations some more", which introduced upgrading the mode based on
COMPACT_COMPLETE result, but kept the final compaction always upgraded, which
made it even more special. It's better to return to the simpler handling for
now, as migration modes will be further modified later in the series.

The second change is that once both reclaim and compaction declare it's not
worth to retry the reclaim/compact loop, there is no final compaction attempt.
As argued above, this is intentional. If that final compaction were to succeed,
it would be due to a wrong retry decision, or simply a race with somebody else
freeing memory for us.

The main outcome of this patch should be simpler code. Logically, the initial
compaction without reclaim is the exceptional case to the reclaim/compaction
scheme, but prior to the patch, it was the last loop iteration that was
exceptional. Now the code matches the logic better. The change also enable the
following patches.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 107 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 55 insertions(+), 52 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9f83259a18a8..9be151b784f9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3560,7 +3560,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
-	enum migrate_mode migration_mode = MIGRATE_ASYNC;
+	enum migrate_mode migration_mode = MIGRATE_SYNC_LIGHT;
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
@@ -3602,6 +3602,50 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/*
+	 * For costly allocations, try direct compaction first, as it's likely
+	 * that we have enough base pages and don't need to reclaim.
+	 */
+	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER) {
+		page = __alloc_pages_direct_compact(gfp_mask, order,
+						alloc_flags, ac,
+						MIGRATE_ASYNC,
+						&compact_result);
+		if (page)
+			goto got_pg;
+
+		/* Checks for THP-specific high-order allocations */
+		if (is_thp_gfp_mask(gfp_mask)) {
+			/*
+			 * If compaction is deferred for high-order allocations,
+			 * it is because sync compaction recently failed. If
+			 * this is the case and the caller requested a THP
+			 * allocation, we do not want to heavily disrupt the
+			 * system, so we fail the allocation instead of entering
+			 * direct reclaim.
+			 */
+			if (compact_result == COMPACT_DEFERRED)
+				goto nopage;
+
+			/*
+			 * Compaction is contended so rather back off than cause
+			 * excessive stalls.
+			 */
+			if (compact_result == COMPACT_CONTENDED)
+				goto nopage;
+
+			/*
+			 * It can become very expensive to allocate transparent
+			 * hugepages at fault, so use asynchronous memory
+			 * compaction for THP unless it is khugepaged trying to
+			 * collapse. All other requests should tolerate at
+			 * least light sync migration.
+			 */
+			if (!(current->flags & PF_KTHREAD))
+				migration_mode = MIGRATE_ASYNC;
+		}
+	}
+
 retry:
 	/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
@@ -3650,55 +3694,33 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
 		goto nopage;
 
-	/*
-	 * Try direct compaction. The first pass is asynchronous. Subsequent
-	 * attempts after direct reclaim are synchronous
-	 */
+
+	/* Try direct reclaim and then allocating */
+	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
+							&did_some_progress);
+	if (page)
+		goto got_pg;
+
+	/* Try direct compaction and then allocating */
 	page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
 					migration_mode,
 					&compact_result);
 	if (page)
 		goto got_pg;
 
-	/* Checks for THP-specific high-order allocations */
-	if (is_thp_gfp_mask(gfp_mask)) {
-		/*
-		 * If compaction is deferred for high-order allocations, it is
-		 * because sync compaction recently failed. If this is the case
-		 * and the caller requested a THP allocation, we do not want
-		 * to heavily disrupt the system, so we fail the allocation
-		 * instead of entering direct reclaim.
-		 */
-		if (compact_result == COMPACT_DEFERRED)
-			goto nopage;
-
-		/*
-		 * Compaction is contended so rather back off than cause
-		 * excessive stalls.
-		 */
-		if(compact_result == COMPACT_CONTENDED)
-			goto nopage;
-	}
-
 	if (order && compaction_made_progress(compact_result))
 		compaction_retries++;
 
-	/* Try direct reclaim and then allocating */
-	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
-							&did_some_progress);
-	if (page)
-		goto got_pg;
-
 	/* Do not loop if specifically requested */
 	if (gfp_mask & __GFP_NORETRY)
-		goto noretry;
+		goto nopage;
 
 	/*
 	 * Do not retry costly high order allocations unless they are
 	 * __GFP_REPEAT
 	 */
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
-		goto noretry;
+		goto nopage;
 
 	/*
 	 * Costly allocations might have made a progress but this doesn't mean
@@ -3737,25 +3759,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto retry;
 	}
 
-noretry:
-	/*
-	 * High-order allocations do not necessarily loop after direct reclaim
-	 * and reclaim/compaction depends on compaction being called after
-	 * reclaim so call directly if necessary.
-	 * It can become very expensive to allocate transparent hugepages at
-	 * fault, so use asynchronous memory compaction for THP unless it is
-	 * khugepaged trying to collapse. All other requests should tolerate
-	 * at least light sync migration.
-	 */
-	if (is_thp_gfp_mask(gfp_mask) && !(current->flags & PF_KTHREAD))
-		migration_mode = MIGRATE_ASYNC;
-	else
-		migration_mode = MIGRATE_SYNC_LIGHT;
-	page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags,
-					    ac, migration_mode,
-					    &compact_result);
-	if (page)
-		goto got_pg;
 nopage:
 	warn_alloc_failed(gfp_mask, order, NULL);
 got_pg:
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 04/18] mm, page_alloc: restructure direct compaction handling in slowpath
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The retry loop in __alloc_pages_slowpath is supposed to keep trying reclaim
and compaction (and OOM), until either the allocation succeeds, or returns
with failure. Success here is more probable when reclaim precedes compaction,
as certain watermarks have to be met for compaction to even try, and more free
pages increase the probability of compaction success. On the other hand,
starting with light async compaction (if the watermarks allow it), can be
more efficient, especially for smaller orders, if there's enough free memory
which is just fragmented.

Thus, the current code starts with compaction before reclaim, and to make sure
that the last reclaim is always followed by a final compaction, there's another
direct compaction call at the end of the loop. This makes the code hard to
follow and adds some duplicated handling of migration_mode decisions. It's also
somewhat inefficient that even if reclaim or compaction decides not to retry,
the final compaction is still attempted. Some gfp flags combination also
shortcut these retry decisions by "goto noretry;", making it even harder to
follow.

This patch attempts to restructure the code with only minimal functional
changes. The call to the first compaction and THP-specific checks are now
placed above the retry loop, and the "noretry" direct compaction is removed.

The initial compaction is additionally restricted only to costly orders, as we
can expect smaller orders to be held back by watermarks, and only larger orders
to suffer primarily from fragmentation. This better matches the checks in
reclaim's shrink_zones().

There are two other smaller functional changes. One is that the upgrade from
async migration to light sync migration will always occur after the initial
compaction. This is how it has been until recent patch "mm, oom: protect
!costly allocations some more", which introduced upgrading the mode based on
COMPACT_COMPLETE result, but kept the final compaction always upgraded, which
made it even more special. It's better to return to the simpler handling for
now, as migration modes will be further modified later in the series.

The second change is that once both reclaim and compaction declare it's not
worth to retry the reclaim/compact loop, there is no final compaction attempt.
As argued above, this is intentional. If that final compaction were to succeed,
it would be due to a wrong retry decision, or simply a race with somebody else
freeing memory for us.

The main outcome of this patch should be simpler code. Logically, the initial
compaction without reclaim is the exceptional case to the reclaim/compaction
scheme, but prior to the patch, it was the last loop iteration that was
exceptional. Now the code matches the logic better. The change also enable the
following patches.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 107 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 55 insertions(+), 52 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9f83259a18a8..9be151b784f9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3560,7 +3560,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
-	enum migrate_mode migration_mode = MIGRATE_ASYNC;
+	enum migrate_mode migration_mode = MIGRATE_SYNC_LIGHT;
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
@@ -3602,6 +3602,50 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/*
+	 * For costly allocations, try direct compaction first, as it's likely
+	 * that we have enough base pages and don't need to reclaim.
+	 */
+	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER) {
+		page = __alloc_pages_direct_compact(gfp_mask, order,
+						alloc_flags, ac,
+						MIGRATE_ASYNC,
+						&compact_result);
+		if (page)
+			goto got_pg;
+
+		/* Checks for THP-specific high-order allocations */
+		if (is_thp_gfp_mask(gfp_mask)) {
+			/*
+			 * If compaction is deferred for high-order allocations,
+			 * it is because sync compaction recently failed. If
+			 * this is the case and the caller requested a THP
+			 * allocation, we do not want to heavily disrupt the
+			 * system, so we fail the allocation instead of entering
+			 * direct reclaim.
+			 */
+			if (compact_result == COMPACT_DEFERRED)
+				goto nopage;
+
+			/*
+			 * Compaction is contended so rather back off than cause
+			 * excessive stalls.
+			 */
+			if (compact_result == COMPACT_CONTENDED)
+				goto nopage;
+
+			/*
+			 * It can become very expensive to allocate transparent
+			 * hugepages at fault, so use asynchronous memory
+			 * compaction for THP unless it is khugepaged trying to
+			 * collapse. All other requests should tolerate at
+			 * least light sync migration.
+			 */
+			if (!(current->flags & PF_KTHREAD))
+				migration_mode = MIGRATE_ASYNC;
+		}
+	}
+
 retry:
 	/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
@@ -3650,55 +3694,33 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
 		goto nopage;
 
-	/*
-	 * Try direct compaction. The first pass is asynchronous. Subsequent
-	 * attempts after direct reclaim are synchronous
-	 */
+
+	/* Try direct reclaim and then allocating */
+	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
+							&did_some_progress);
+	if (page)
+		goto got_pg;
+
+	/* Try direct compaction and then allocating */
 	page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
 					migration_mode,
 					&compact_result);
 	if (page)
 		goto got_pg;
 
-	/* Checks for THP-specific high-order allocations */
-	if (is_thp_gfp_mask(gfp_mask)) {
-		/*
-		 * If compaction is deferred for high-order allocations, it is
-		 * because sync compaction recently failed. If this is the case
-		 * and the caller requested a THP allocation, we do not want
-		 * to heavily disrupt the system, so we fail the allocation
-		 * instead of entering direct reclaim.
-		 */
-		if (compact_result == COMPACT_DEFERRED)
-			goto nopage;
-
-		/*
-		 * Compaction is contended so rather back off than cause
-		 * excessive stalls.
-		 */
-		if(compact_result == COMPACT_CONTENDED)
-			goto nopage;
-	}
-
 	if (order && compaction_made_progress(compact_result))
 		compaction_retries++;
 
-	/* Try direct reclaim and then allocating */
-	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
-							&did_some_progress);
-	if (page)
-		goto got_pg;
-
 	/* Do not loop if specifically requested */
 	if (gfp_mask & __GFP_NORETRY)
-		goto noretry;
+		goto nopage;
 
 	/*
 	 * Do not retry costly high order allocations unless they are
 	 * __GFP_REPEAT
 	 */
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
-		goto noretry;
+		goto nopage;
 
 	/*
 	 * Costly allocations might have made a progress but this doesn't mean
@@ -3737,25 +3759,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto retry;
 	}
 
-noretry:
-	/*
-	 * High-order allocations do not necessarily loop after direct reclaim
-	 * and reclaim/compaction depends on compaction being called after
-	 * reclaim so call directly if necessary.
-	 * It can become very expensive to allocate transparent hugepages at
-	 * fault, so use asynchronous memory compaction for THP unless it is
-	 * khugepaged trying to collapse. All other requests should tolerate
-	 * at least light sync migration.
-	 */
-	if (is_thp_gfp_mask(gfp_mask) && !(current->flags & PF_KTHREAD))
-		migration_mode = MIGRATE_ASYNC;
-	else
-		migration_mode = MIGRATE_SYNC_LIGHT;
-	page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags,
-					    ac, migration_mode,
-					    &compact_result);
-	if (page)
-		goto got_pg;
 nopage:
 	warn_alloc_failed(gfp_mask, order, NULL);
 got_pg:
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 05/18] mm, page_alloc: make THP-specific decisions more generic
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

Since THP allocations during page faults can be costly, extra decisions are
employed for them to avoid excessive reclaim and compaction, if the initial
compaction doesn't look promising. The detection has never been perfect as
there is no gfp flag specific to THP allocations. At this moment it checks the
whole combination of flags that makes up GFP_TRANSHUGE, and hopes that no other
users of such combination exist, or would mind being treated the same way.
Extra care is also taken to separate allocations from khugepaged, where latency
doesn't matter that much.

It is however possible to distinguish these allocations in a simpler and more
reliable way. The key observation is that after the initial compaction followed
by the first iteration of "standard" reclaim/compaction, both __GFP_NORETRY
allocations and costly allocations without __GFP_REPEAT are declared as
failures:

        /* Do not loop if specifically requested */
        if (gfp_mask & __GFP_NORETRY)
                goto nopage;

        /*
         * Do not retry costly high order allocations unless they are
         * __GFP_REPEAT
         */
        if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
                goto nopage;

This means we can further distinguish allocations that are costly order *and*
additionally include the __GFP_NORETRY flag. As it happens, GFP_TRANSHUGE
allocations do already fall into this category. This will also allow other
costly allocations with similar high-order benefit vs latency considerations to
use this semantic. Furthermore, we can distinguish THP allocations that should
try a bit harder (such as from khugepageed) by removing __GFP_NORETRY, as will
be done in the next patch.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9be151b784f9..529999c48333 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3169,7 +3169,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
-
 /*
  * Maximum number of compaction retries wit a progress before OOM
  * killer is consider as the only way to move forward.
@@ -3452,11 +3451,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 	return false;
 }
 
-static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
-{
-	return (gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM)) == GFP_TRANSHUGE;
-}
-
 /*
  * Maximum number of reclaim retries without any progress before OOM killer
  * is consider as the only way to move forward.
@@ -3614,8 +3608,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		if (page)
 			goto got_pg;
 
-		/* Checks for THP-specific high-order allocations */
-		if (is_thp_gfp_mask(gfp_mask)) {
+		/*
+		 * Checks for costly allocations with __GFP_NORETRY, which
+		 * includes THP page fault allocations
+		 */
+		if (gfp_mask & __GFP_NORETRY) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3635,11 +3632,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				goto nopage;
 
 			/*
-			 * It can become very expensive to allocate transparent
-			 * hugepages at fault, so use asynchronous memory
-			 * compaction for THP unless it is khugepaged trying to
-			 * collapse. All other requests should tolerate at
-			 * least light sync migration.
+			 * Looks like reclaim/compaction is worth trying, but
+			 * sync compaction could be very expensive, so keep
+			 * using async compaction, unless it's khugepaged
+			 * trying to collapse.
 			 */
 			if (!(current->flags & PF_KTHREAD))
 				migration_mode = MIGRATE_ASYNC;
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 05/18] mm, page_alloc: make THP-specific decisions more generic
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

Since THP allocations during page faults can be costly, extra decisions are
employed for them to avoid excessive reclaim and compaction, if the initial
compaction doesn't look promising. The detection has never been perfect as
there is no gfp flag specific to THP allocations. At this moment it checks the
whole combination of flags that makes up GFP_TRANSHUGE, and hopes that no other
users of such combination exist, or would mind being treated the same way.
Extra care is also taken to separate allocations from khugepaged, where latency
doesn't matter that much.

It is however possible to distinguish these allocations in a simpler and more
reliable way. The key observation is that after the initial compaction followed
by the first iteration of "standard" reclaim/compaction, both __GFP_NORETRY
allocations and costly allocations without __GFP_REPEAT are declared as
failures:

        /* Do not loop if specifically requested */
        if (gfp_mask & __GFP_NORETRY)
                goto nopage;

        /*
         * Do not retry costly high order allocations unless they are
         * __GFP_REPEAT
         */
        if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
                goto nopage;

This means we can further distinguish allocations that are costly order *and*
additionally include the __GFP_NORETRY flag. As it happens, GFP_TRANSHUGE
allocations do already fall into this category. This will also allow other
costly allocations with similar high-order benefit vs latency considerations to
use this semantic. Furthermore, we can distinguish THP allocations that should
try a bit harder (such as from khugepageed) by removing __GFP_NORETRY, as will
be done in the next patch.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9be151b784f9..529999c48333 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3169,7 +3169,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
-
 /*
  * Maximum number of compaction retries wit a progress before OOM
  * killer is consider as the only way to move forward.
@@ -3452,11 +3451,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 	return false;
 }
 
-static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
-{
-	return (gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM)) == GFP_TRANSHUGE;
-}
-
 /*
  * Maximum number of reclaim retries without any progress before OOM killer
  * is consider as the only way to move forward.
@@ -3614,8 +3608,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		if (page)
 			goto got_pg;
 
-		/* Checks for THP-specific high-order allocations */
-		if (is_thp_gfp_mask(gfp_mask)) {
+		/*
+		 * Checks for costly allocations with __GFP_NORETRY, which
+		 * includes THP page fault allocations
+		 */
+		if (gfp_mask & __GFP_NORETRY) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3635,11 +3632,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				goto nopage;
 
 			/*
-			 * It can become very expensive to allocate transparent
-			 * hugepages at fault, so use asynchronous memory
-			 * compaction for THP unless it is khugepaged trying to
-			 * collapse. All other requests should tolerate at
-			 * least light sync migration.
+			 * Looks like reclaim/compaction is worth trying, but
+			 * sync compaction could be very expensive, so keep
+			 * using async compaction, unless it's khugepaged
+			 * trying to collapse.
 			 */
 			if (!(current->flags & PF_KTHREAD))
 				migration_mode = MIGRATE_ASYNC;
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 06/18] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

After the previous patch, we can distinguish costly allocations that should be
really lightweight, such as THP page faults, with __GFP_NORETRY. This means we
don't need to recognize khugepaged allocations via PF_KTHREAD anymore. We can
also change THP page faults in areas where madvise(MADV_HUGEPAGE) was used to
try as hard as khugepaged, as the process has indicated that it benefits from
THP's and is willing to pay some initial latency costs.

We can also make the flags handling less cryptic by distinguishing
GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding __GFP_NORETRY
or __GFP_KSWAPD_RECLAIM is done where needed.

The patch effectively changes the current GFP_TRANSHUGE users as follows:

* get_huge_zero_page() - the zero page lifetime should be relatively long and
  it's shared by multiple users, so it's worth spending some effort on it.
  We use GFP_TRANSHUGE, and __GFP_NORETRY is not added. This also restores
  direct reclaim to this allocation, which was unintentionally removed by
  commit e4a49efe4e7e ("mm: thp: set THP defrag by default to madvise and add
  a stall-free defrag option")

* alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency is not
  an issue. So if khugepaged "defrag" is enabled (the default), do reclaim
  via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the PF_KTHREAD check
  from page alloc.
  As a side-effect, khugepaged will now no longer check if the initial
  compaction was deferred or contended. This is OK, as khugepaged sleep times
  between collapsion attempts are long enough to prevent noticeable disruption,
  so we should allow it to spend some effort.

* migrate_misplaced_transhuge_page() - already was masking out __GFP_RECLAIM,
  so just convert to GFP_TRANSHUGE_LIGHT which is equivalent.

* alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise) are
  now allocating without __GFP_NORETRY. Other vma's keep using __GFP_NORETRY
  if direct reclaim/compaction is at all allowed (by default it's allowed only
  for madvised vma's). The rest is conversion to GFP_TRANSHUGE(_LIGHT).

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/gfp.h            | 14 ++++++++------
 include/trace/events/mmflags.h |  1 +
 mm/huge_memory.c               | 27 +++++++++++++++------------
 mm/migrate.c                   |  2 +-
 mm/page_alloc.c                |  6 ++----
 tools/perf/builtin-kmem.c      |  1 +
 6 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..a6ebe0dccd67 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -238,9 +238,11 @@ struct vm_area_struct;
  *   are expected to be movable via page reclaim or page migration. Typically,
  *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.
  *
- * GFP_TRANSHUGE is used for THP allocations. They are compound allocations
- *   that will fail quickly if memory is not available and will not wake
- *   kswapd on failure.
+ * GFP_TRANSHUGE and GFP_TRANSHUGE_LIGHT are used for THP allocations. They are
+ *   compound allocations that will generally fail quickly if memory is not
+ *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
+ *   version does not attempt reclaim/compaction at all and is by default used
+ *   in page fault path, while the non-light is used by khugepaged.
  */
 #define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
@@ -255,9 +257,9 @@ struct vm_area_struct;
 #define GFP_DMA32	__GFP_DMA32
 #define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE	(GFP_HIGHUSER | __GFP_MOVABLE)
-#define GFP_TRANSHUGE	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
-			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
-			 ~__GFP_RECLAIM)
+#define GFP_TRANSHUGE_LIGHT	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
+			 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
+#define GFP_TRANSHUGE	(GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 43cedbf0c759..5a81ab48a2fb 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -11,6 +11,7 @@
 
 #define __def_gfpflag_names						\
 	{(unsigned long)GFP_TRANSHUGE,		"GFP_TRANSHUGE"},	\
+	{(unsigned long)GFP_TRANSHUGE_LIGHT,	"GFP_TRANSHUGE_LIGHT"}, \
 	{(unsigned long)GFP_HIGHUSER_MOVABLE,	"GFP_HIGHUSER_MOVABLE"},\
 	{(unsigned long)GFP_HIGHUSER,		"GFP_HIGHUSER"},	\
 	{(unsigned long)GFP_USER,		"GFP_USER"},		\
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ed58530f695..37db58802385 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -864,29 +864,32 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 }
 
 /*
- * If THP is set to always then directly reclaim/compact as necessary
- * If set to defer then do no reclaim and defer to khugepaged
+ * If THP defrag is set to always then directly reclaim/compact as necessary
+ * If set to defer then do only background reclaim/compact and defer to khugepaged
  * If set to madvise and the VMA is flagged then directly reclaim/compact
+ * When direct reclaim/compact is allowed, don't retry except for flagged VMA's
  */
 static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
-	gfp_t reclaim_flags = 0;
+	bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
 
-	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags) &&
-	    (vma->vm_flags & VM_HUGEPAGE))
-		reclaim_flags = __GFP_DIRECT_RECLAIM;
-	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
-		reclaim_flags = __GFP_KSWAPD_RECLAIM;
-	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
-		reclaim_flags = __GFP_DIRECT_RECLAIM;
+	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
+				&transparent_hugepage_flags) && vma_madvised)
+		return GFP_TRANSHUGE;
+	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
+						&transparent_hugepage_flags))
+		return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
+	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
+						&transparent_hugepage_flags))
+		return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
 
-	return GFP_TRANSHUGE | reclaim_flags;
+	return GFP_TRANSHUGE_LIGHT;
 }
 
 /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
 static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
 {
-	return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
+	return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
 }
 
 /* Caller must hold page table lock. */
diff --git a/mm/migrate.c b/mm/migrate.c
index 9baf41c877ff..d09e985f644d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1772,7 +1772,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 		goto out_dropref;
 
 	new_page = alloc_pages_node(node,
-		(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
 		HPAGE_PMD_ORDER);
 	if (!new_page)
 		goto out_fail;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 529999c48333..d7fc4c86e077 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3634,11 +3634,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 			/*
 			 * Looks like reclaim/compaction is worth trying, but
 			 * sync compaction could be very expensive, so keep
-			 * using async compaction, unless it's khugepaged
-			 * trying to collapse.
+			 * using async compaction.
 			 */
-			if (!(current->flags & PF_KTHREAD))
-				migration_mode = MIGRATE_ASYNC;
+			migration_mode = MIGRATE_ASYNC;
 		}
 	}
 
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 58adfee230de..5f67a3bd98a5 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -608,6 +608,7 @@ static const struct {
 	const char *compact;
 } gfp_compact_table[] = {
 	{ "GFP_TRANSHUGE",		"THP" },
+	{ "GFP_TRANSHUGE_LIGHT",	"THL" },
 	{ "GFP_HIGHUSER_MOVABLE",	"HUM" },
 	{ "GFP_HIGHUSER",		"HU" },
 	{ "GFP_USER",			"U" },
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 06/18] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

After the previous patch, we can distinguish costly allocations that should be
really lightweight, such as THP page faults, with __GFP_NORETRY. This means we
don't need to recognize khugepaged allocations via PF_KTHREAD anymore. We can
also change THP page faults in areas where madvise(MADV_HUGEPAGE) was used to
try as hard as khugepaged, as the process has indicated that it benefits from
THP's and is willing to pay some initial latency costs.

We can also make the flags handling less cryptic by distinguishing
GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding __GFP_NORETRY
or __GFP_KSWAPD_RECLAIM is done where needed.

The patch effectively changes the current GFP_TRANSHUGE users as follows:

* get_huge_zero_page() - the zero page lifetime should be relatively long and
  it's shared by multiple users, so it's worth spending some effort on it.
  We use GFP_TRANSHUGE, and __GFP_NORETRY is not added. This also restores
  direct reclaim to this allocation, which was unintentionally removed by
  commit e4a49efe4e7e ("mm: thp: set THP defrag by default to madvise and add
  a stall-free defrag option")

* alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency is not
  an issue. So if khugepaged "defrag" is enabled (the default), do reclaim
  via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the PF_KTHREAD check
  from page alloc.
  As a side-effect, khugepaged will now no longer check if the initial
  compaction was deferred or contended. This is OK, as khugepaged sleep times
  between collapsion attempts are long enough to prevent noticeable disruption,
  so we should allow it to spend some effort.

* migrate_misplaced_transhuge_page() - already was masking out __GFP_RECLAIM,
  so just convert to GFP_TRANSHUGE_LIGHT which is equivalent.

* alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise) are
  now allocating without __GFP_NORETRY. Other vma's keep using __GFP_NORETRY
  if direct reclaim/compaction is at all allowed (by default it's allowed only
  for madvised vma's). The rest is conversion to GFP_TRANSHUGE(_LIGHT).

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/gfp.h            | 14 ++++++++------
 include/trace/events/mmflags.h |  1 +
 mm/huge_memory.c               | 27 +++++++++++++++------------
 mm/migrate.c                   |  2 +-
 mm/page_alloc.c                |  6 ++----
 tools/perf/builtin-kmem.c      |  1 +
 6 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..a6ebe0dccd67 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -238,9 +238,11 @@ struct vm_area_struct;
  *   are expected to be movable via page reclaim or page migration. Typically,
  *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.
  *
- * GFP_TRANSHUGE is used for THP allocations. They are compound allocations
- *   that will fail quickly if memory is not available and will not wake
- *   kswapd on failure.
+ * GFP_TRANSHUGE and GFP_TRANSHUGE_LIGHT are used for THP allocations. They are
+ *   compound allocations that will generally fail quickly if memory is not
+ *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
+ *   version does not attempt reclaim/compaction at all and is by default used
+ *   in page fault path, while the non-light is used by khugepaged.
  */
 #define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
@@ -255,9 +257,9 @@ struct vm_area_struct;
 #define GFP_DMA32	__GFP_DMA32
 #define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE	(GFP_HIGHUSER | __GFP_MOVABLE)
-#define GFP_TRANSHUGE	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
-			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
-			 ~__GFP_RECLAIM)
+#define GFP_TRANSHUGE_LIGHT	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
+			 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
+#define GFP_TRANSHUGE	(GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 43cedbf0c759..5a81ab48a2fb 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -11,6 +11,7 @@
 
 #define __def_gfpflag_names						\
 	{(unsigned long)GFP_TRANSHUGE,		"GFP_TRANSHUGE"},	\
+	{(unsigned long)GFP_TRANSHUGE_LIGHT,	"GFP_TRANSHUGE_LIGHT"}, \
 	{(unsigned long)GFP_HIGHUSER_MOVABLE,	"GFP_HIGHUSER_MOVABLE"},\
 	{(unsigned long)GFP_HIGHUSER,		"GFP_HIGHUSER"},	\
 	{(unsigned long)GFP_USER,		"GFP_USER"},		\
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ed58530f695..37db58802385 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -864,29 +864,32 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
 }
 
 /*
- * If THP is set to always then directly reclaim/compact as necessary
- * If set to defer then do no reclaim and defer to khugepaged
+ * If THP defrag is set to always then directly reclaim/compact as necessary
+ * If set to defer then do only background reclaim/compact and defer to khugepaged
  * If set to madvise and the VMA is flagged then directly reclaim/compact
+ * When direct reclaim/compact is allowed, don't retry except for flagged VMA's
  */
 static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
-	gfp_t reclaim_flags = 0;
+	bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
 
-	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags) &&
-	    (vma->vm_flags & VM_HUGEPAGE))
-		reclaim_flags = __GFP_DIRECT_RECLAIM;
-	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
-		reclaim_flags = __GFP_KSWAPD_RECLAIM;
-	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
-		reclaim_flags = __GFP_DIRECT_RECLAIM;
+	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
+				&transparent_hugepage_flags) && vma_madvised)
+		return GFP_TRANSHUGE;
+	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
+						&transparent_hugepage_flags))
+		return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
+	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
+						&transparent_hugepage_flags))
+		return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
 
-	return GFP_TRANSHUGE | reclaim_flags;
+	return GFP_TRANSHUGE_LIGHT;
 }
 
 /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
 static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
 {
-	return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
+	return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
 }
 
 /* Caller must hold page table lock. */
diff --git a/mm/migrate.c b/mm/migrate.c
index 9baf41c877ff..d09e985f644d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1772,7 +1772,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 		goto out_dropref;
 
 	new_page = alloc_pages_node(node,
-		(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
 		HPAGE_PMD_ORDER);
 	if (!new_page)
 		goto out_fail;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 529999c48333..d7fc4c86e077 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3634,11 +3634,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 			/*
 			 * Looks like reclaim/compaction is worth trying, but
 			 * sync compaction could be very expensive, so keep
-			 * using async compaction, unless it's khugepaged
-			 * trying to collapse.
+			 * using async compaction.
 			 */
-			if (!(current->flags & PF_KTHREAD))
-				migration_mode = MIGRATE_ASYNC;
+			migration_mode = MIGRATE_ASYNC;
 		}
 	}
 
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 58adfee230de..5f67a3bd98a5 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -608,6 +608,7 @@ static const struct {
 	const char *compact;
 } gfp_compact_table[] = {
 	{ "GFP_TRANSHUGE",		"THP" },
+	{ "GFP_TRANSHUGE_LIGHT",	"THL" },
 	{ "GFP_HIGHUSER_MOVABLE",	"HUM" },
 	{ "GFP_HIGHUSER",		"HU" },
 	{ "GFP_USER",			"U" },
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 07/18] mm, compaction: introduce direct compaction priority
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

In the context of direct compaction, for some types of allocations we would
like the compaction to either succeed or definitely fail while trying as hard
as possible. Current async/sync_light migration mode is insufficient, as there
are heuristics such as caching scanner positions, marking pageblocks as
unsuitable or deferring compaction for a zone. At least the final compaction
attempt should be able to override these heuristics.

To communicate how hard compaction should try, we replace migration mode with
a new enum compact_priority and change the relevant function signatures. In
compact_zone_order() where struct compact_control is constructed, the priority
is mapped to suitable control flags. This patch itself has no functional
change, as the current priority levels are mapped back to the same migration
modes as before. Expanding them will be done next.

Note that !CONFIG_COMPACTION variant of try_to_compact_pages() is removed, as
the only caller exists under CONFIG_COMPACTION.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h        | 22 +++++++++++++---------
 include/trace/events/compaction.h | 12 ++++++------
 mm/compaction.c                   | 13 +++++++------
 mm/page_alloc.c                   | 28 ++++++++++++++--------------
 4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a58c852a268f..ba67bc8edbb6 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -1,6 +1,18 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+/*
+ * Determines how hard direct compaction should try to succeed.
+ * Lower value means higher priority, analogically to reclaim priority.
+ */
+enum compact_priority {
+	COMPACT_PRIO_SYNC_LIGHT,
+	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
+	DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
+	COMPACT_PRIO_ASYNC,
+	INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
+};
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* When adding new states, please adjust include/trace/events/compaction.h */
 enum compact_result {
@@ -66,7 +78,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 			unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum migrate_mode mode, int *contended);
+		enum compact_priority prio, int *contended);
 extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
@@ -151,14 +163,6 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
-static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
-			unsigned int order, int alloc_flags,
-			const struct alloc_context *ac,
-			enum migrate_mode mode, int *contended)
-{
-	return COMPACT_CONTINUE;
-}
-
 static inline void compact_pgdat(pg_data_t *pgdat, int order)
 {
 }
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index 36e2d6fb1360..c2ba402ab256 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -226,26 +226,26 @@ TRACE_EVENT(mm_compaction_try_to_compact_pages,
 	TP_PROTO(
 		int order,
 		gfp_t gfp_mask,
-		enum migrate_mode mode),
+		int prio),
 
-	TP_ARGS(order, gfp_mask, mode),
+	TP_ARGS(order, gfp_mask, prio),
 
 	TP_STRUCT__entry(
 		__field(int, order)
 		__field(gfp_t, gfp_mask)
-		__field(enum migrate_mode, mode)
+		__field(int, prio)
 	),
 
 	TP_fast_assign(
 		__entry->order = order;
 		__entry->gfp_mask = gfp_mask;
-		__entry->mode = mode;
+		__entry->prio = prio;
 	),
 
-	TP_printk("order=%d gfp_mask=0x%x mode=%d",
+	TP_printk("order=%d gfp_mask=0x%x priority=%d",
 		__entry->order,
 		__entry->gfp_mask,
-		(int)__entry->mode)
+		__entry->prio)
 );
 
 DECLARE_EVENT_CLASS(mm_compaction_suitable_template,
diff --git a/mm/compaction.c b/mm/compaction.c
index e611f3f90f5f..19a4f4fd6632 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1572,7 +1572,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 }
 
 static enum compact_result compact_zone_order(struct zone *zone, int order,
-		gfp_t gfp_mask, enum migrate_mode mode, int *contended,
+		gfp_t gfp_mask, enum compact_priority prio, int *contended,
 		unsigned int alloc_flags, int classzone_idx)
 {
 	enum compact_result ret;
@@ -1582,7 +1582,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.order = order,
 		.gfp_mask = gfp_mask,
 		.zone = zone,
-		.mode = mode,
+		.mode = (prio == COMPACT_PRIO_ASYNC) ?
+					MIGRATE_ASYNC :	MIGRATE_SYNC_LIGHT,
 		.alloc_flags = alloc_flags,
 		.classzone_idx = classzone_idx,
 		.direct_compaction = true,
@@ -1615,7 +1616,7 @@ int sysctl_extfrag_threshold = 500;
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum migrate_mode mode, int *contended)
+		enum compact_priority prio, int *contended)
 {
 	int may_enter_fs = gfp_mask & __GFP_FS;
 	int may_perform_io = gfp_mask & __GFP_IO;
@@ -1630,7 +1631,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 	if (!order || !may_enter_fs || !may_perform_io)
 		return COMPACT_SKIPPED;
 
-	trace_mm_compaction_try_to_compact_pages(order, gfp_mask, mode);
+	trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
 
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
@@ -1643,7 +1644,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			continue;
 		}
 
-		status = compact_zone_order(zone, order, gfp_mask, mode,
+		status = compact_zone_order(zone, order, gfp_mask, prio,
 				&zone_contended, alloc_flags,
 				ac_classzone_idx(ac));
 		rc = max(status, rc);
@@ -1677,7 +1678,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			goto break_loop;
 		}
 
-		if (mode != MIGRATE_ASYNC && (status == COMPACT_COMPLETE ||
+		if (prio != COMPACT_PRIO_ASYNC && (status == COMPACT_COMPLETE ||
 					status == COMPACT_PARTIAL_SKIPPED)) {
 			/*
 			 * We think that allocation won't succeed in this zone
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d7fc4c86e077..4466543a57ab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3180,7 +3180,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum migrate_mode mode, enum compact_result *compact_result)
+		enum compact_priority prio, enum compact_result *compact_result)
 {
 	struct page *page;
 	int contended_compaction;
@@ -3190,7 +3190,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 	current->flags |= PF_MEMALLOC;
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-						mode, &contended_compaction);
+						prio, &contended_compaction);
 	current->flags &= ~PF_MEMALLOC;
 
 	if (*compact_result <= COMPACT_INACTIVE)
@@ -3244,7 +3244,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 static inline bool
 should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
-		     enum compact_result compact_result, enum migrate_mode *migrate_mode,
+		     enum compact_result compact_result,
+		     enum compact_priority *compact_priority,
 		     int compaction_retries)
 {
 	int max_retries = MAX_COMPACT_RETRIES;
@@ -3255,11 +3256,11 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	/*
 	 * compaction considers all the zone as desperately out of memory
 	 * so it doesn't really make much sense to retry except when the
-	 * failure could be caused by weak migration mode.
+	 * failure could be caused by insufficient priority
 	 */
 	if (compaction_failed(compact_result)) {
-		if (*migrate_mode == MIGRATE_ASYNC) {
-			*migrate_mode = MIGRATE_SYNC_LIGHT;
+		if (*compact_priority > MIN_COMPACT_PRIORITY) {
+			(*compact_priority)--;
 			return true;
 		}
 		return false;
@@ -3293,7 +3294,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum migrate_mode mode, enum compact_result *compact_result)
+		enum compact_priority prio, enum compact_result *compact_result)
 {
 	*compact_result = COMPACT_SKIPPED;
 	return NULL;
@@ -3302,7 +3303,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 static inline bool
 should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_flags,
 		     enum compact_result compact_result,
-		     enum migrate_mode *migrate_mode,
+		     enum compact_priority *compact_priority,
 		     int compaction_retries)
 {
 	struct zone *zone;
@@ -3554,7 +3555,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
-	enum migrate_mode migration_mode = MIGRATE_SYNC_LIGHT;
+	enum compact_priority compact_priority = DEF_COMPACT_PRIORITY;
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
@@ -3603,7 +3604,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER) {
 		page = __alloc_pages_direct_compact(gfp_mask, order,
 						alloc_flags, ac,
-						MIGRATE_ASYNC,
+						INIT_COMPACT_PRIORITY,
 						&compact_result);
 		if (page)
 			goto got_pg;
@@ -3636,7 +3637,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 			 * sync compaction could be very expensive, so keep
 			 * using async compaction.
 			 */
-			migration_mode = MIGRATE_ASYNC;
+			compact_priority = INIT_COMPACT_PRIORITY;
 		}
 	}
 
@@ -3697,8 +3698,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/* Try direct compaction and then allocating */
 	page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
-					migration_mode,
-					&compact_result);
+					compact_priority, &compact_result);
 	if (page)
 		goto got_pg;
 
@@ -3738,7 +3738,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 */
 	if (did_some_progress > 0 &&
 			should_compact_retry(ac, order, alloc_flags,
-				compact_result, &migration_mode,
+				compact_result, &compact_priority,
 				compaction_retries))
 		goto retry;
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 07/18] mm, compaction: introduce direct compaction priority
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

In the context of direct compaction, for some types of allocations we would
like the compaction to either succeed or definitely fail while trying as hard
as possible. Current async/sync_light migration mode is insufficient, as there
are heuristics such as caching scanner positions, marking pageblocks as
unsuitable or deferring compaction for a zone. At least the final compaction
attempt should be able to override these heuristics.

To communicate how hard compaction should try, we replace migration mode with
a new enum compact_priority and change the relevant function signatures. In
compact_zone_order() where struct compact_control is constructed, the priority
is mapped to suitable control flags. This patch itself has no functional
change, as the current priority levels are mapped back to the same migration
modes as before. Expanding them will be done next.

Note that !CONFIG_COMPACTION variant of try_to_compact_pages() is removed, as
the only caller exists under CONFIG_COMPACTION.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h        | 22 +++++++++++++---------
 include/trace/events/compaction.h | 12 ++++++------
 mm/compaction.c                   | 13 +++++++------
 mm/page_alloc.c                   | 28 ++++++++++++++--------------
 4 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a58c852a268f..ba67bc8edbb6 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -1,6 +1,18 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+/*
+ * Determines how hard direct compaction should try to succeed.
+ * Lower value means higher priority, analogically to reclaim priority.
+ */
+enum compact_priority {
+	COMPACT_PRIO_SYNC_LIGHT,
+	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
+	DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
+	COMPACT_PRIO_ASYNC,
+	INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
+};
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* When adding new states, please adjust include/trace/events/compaction.h */
 enum compact_result {
@@ -66,7 +78,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 			unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum migrate_mode mode, int *contended);
+		enum compact_priority prio, int *contended);
 extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
@@ -151,14 +163,6 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
-static inline enum compact_result try_to_compact_pages(gfp_t gfp_mask,
-			unsigned int order, int alloc_flags,
-			const struct alloc_context *ac,
-			enum migrate_mode mode, int *contended)
-{
-	return COMPACT_CONTINUE;
-}
-
 static inline void compact_pgdat(pg_data_t *pgdat, int order)
 {
 }
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index 36e2d6fb1360..c2ba402ab256 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -226,26 +226,26 @@ TRACE_EVENT(mm_compaction_try_to_compact_pages,
 	TP_PROTO(
 		int order,
 		gfp_t gfp_mask,
-		enum migrate_mode mode),
+		int prio),
 
-	TP_ARGS(order, gfp_mask, mode),
+	TP_ARGS(order, gfp_mask, prio),
 
 	TP_STRUCT__entry(
 		__field(int, order)
 		__field(gfp_t, gfp_mask)
-		__field(enum migrate_mode, mode)
+		__field(int, prio)
 	),
 
 	TP_fast_assign(
 		__entry->order = order;
 		__entry->gfp_mask = gfp_mask;
-		__entry->mode = mode;
+		__entry->prio = prio;
 	),
 
-	TP_printk("order=%d gfp_mask=0x%x mode=%d",
+	TP_printk("order=%d gfp_mask=0x%x priority=%d",
 		__entry->order,
 		__entry->gfp_mask,
-		(int)__entry->mode)
+		__entry->prio)
 );
 
 DECLARE_EVENT_CLASS(mm_compaction_suitable_template,
diff --git a/mm/compaction.c b/mm/compaction.c
index e611f3f90f5f..19a4f4fd6632 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1572,7 +1572,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 }
 
 static enum compact_result compact_zone_order(struct zone *zone, int order,
-		gfp_t gfp_mask, enum migrate_mode mode, int *contended,
+		gfp_t gfp_mask, enum compact_priority prio, int *contended,
 		unsigned int alloc_flags, int classzone_idx)
 {
 	enum compact_result ret;
@@ -1582,7 +1582,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.order = order,
 		.gfp_mask = gfp_mask,
 		.zone = zone,
-		.mode = mode,
+		.mode = (prio == COMPACT_PRIO_ASYNC) ?
+					MIGRATE_ASYNC :	MIGRATE_SYNC_LIGHT,
 		.alloc_flags = alloc_flags,
 		.classzone_idx = classzone_idx,
 		.direct_compaction = true,
@@ -1615,7 +1616,7 @@ int sysctl_extfrag_threshold = 500;
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum migrate_mode mode, int *contended)
+		enum compact_priority prio, int *contended)
 {
 	int may_enter_fs = gfp_mask & __GFP_FS;
 	int may_perform_io = gfp_mask & __GFP_IO;
@@ -1630,7 +1631,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 	if (!order || !may_enter_fs || !may_perform_io)
 		return COMPACT_SKIPPED;
 
-	trace_mm_compaction_try_to_compact_pages(order, gfp_mask, mode);
+	trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
 
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
@@ -1643,7 +1644,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			continue;
 		}
 
-		status = compact_zone_order(zone, order, gfp_mask, mode,
+		status = compact_zone_order(zone, order, gfp_mask, prio,
 				&zone_contended, alloc_flags,
 				ac_classzone_idx(ac));
 		rc = max(status, rc);
@@ -1677,7 +1678,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			goto break_loop;
 		}
 
-		if (mode != MIGRATE_ASYNC && (status == COMPACT_COMPLETE ||
+		if (prio != COMPACT_PRIO_ASYNC && (status == COMPACT_COMPLETE ||
 					status == COMPACT_PARTIAL_SKIPPED)) {
 			/*
 			 * We think that allocation won't succeed in this zone
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d7fc4c86e077..4466543a57ab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3180,7 +3180,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum migrate_mode mode, enum compact_result *compact_result)
+		enum compact_priority prio, enum compact_result *compact_result)
 {
 	struct page *page;
 	int contended_compaction;
@@ -3190,7 +3190,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 	current->flags |= PF_MEMALLOC;
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-						mode, &contended_compaction);
+						prio, &contended_compaction);
 	current->flags &= ~PF_MEMALLOC;
 
 	if (*compact_result <= COMPACT_INACTIVE)
@@ -3244,7 +3244,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 static inline bool
 should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
-		     enum compact_result compact_result, enum migrate_mode *migrate_mode,
+		     enum compact_result compact_result,
+		     enum compact_priority *compact_priority,
 		     int compaction_retries)
 {
 	int max_retries = MAX_COMPACT_RETRIES;
@@ -3255,11 +3256,11 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	/*
 	 * compaction considers all the zone as desperately out of memory
 	 * so it doesn't really make much sense to retry except when the
-	 * failure could be caused by weak migration mode.
+	 * failure could be caused by insufficient priority
 	 */
 	if (compaction_failed(compact_result)) {
-		if (*migrate_mode == MIGRATE_ASYNC) {
-			*migrate_mode = MIGRATE_SYNC_LIGHT;
+		if (*compact_priority > MIN_COMPACT_PRIORITY) {
+			(*compact_priority)--;
 			return true;
 		}
 		return false;
@@ -3293,7 +3294,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum migrate_mode mode, enum compact_result *compact_result)
+		enum compact_priority prio, enum compact_result *compact_result)
 {
 	*compact_result = COMPACT_SKIPPED;
 	return NULL;
@@ -3302,7 +3303,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 static inline bool
 should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_flags,
 		     enum compact_result compact_result,
-		     enum migrate_mode *migrate_mode,
+		     enum compact_priority *compact_priority,
 		     int compaction_retries)
 {
 	struct zone *zone;
@@ -3554,7 +3555,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	struct page *page = NULL;
 	unsigned int alloc_flags;
 	unsigned long did_some_progress;
-	enum migrate_mode migration_mode = MIGRATE_SYNC_LIGHT;
+	enum compact_priority compact_priority = DEF_COMPACT_PRIORITY;
 	enum compact_result compact_result;
 	int compaction_retries = 0;
 	int no_progress_loops = 0;
@@ -3603,7 +3604,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (can_direct_reclaim && order > PAGE_ALLOC_COSTLY_ORDER) {
 		page = __alloc_pages_direct_compact(gfp_mask, order,
 						alloc_flags, ac,
-						MIGRATE_ASYNC,
+						INIT_COMPACT_PRIORITY,
 						&compact_result);
 		if (page)
 			goto got_pg;
@@ -3636,7 +3637,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 			 * sync compaction could be very expensive, so keep
 			 * using async compaction.
 			 */
-			migration_mode = MIGRATE_ASYNC;
+			compact_priority = INIT_COMPACT_PRIORITY;
 		}
 	}
 
@@ -3697,8 +3698,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
 	/* Try direct compaction and then allocating */
 	page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
-					migration_mode,
-					&compact_result);
+					compact_priority, &compact_result);
 	if (page)
 		goto got_pg;
 
@@ -3738,7 +3738,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 */
 	if (did_some_progress > 0 &&
 			should_compact_retry(ac, order, alloc_flags,
-				compact_result, &migration_mode,
+				compact_result, &compact_priority,
 				compaction_retries))
 		goto retry;
 
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 08/18] mm, compaction: simplify contended compaction handling
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

Async compaction detects contention either due to failing trylock on zone->lock
or lru_lock, or by need_resched(). Since 1f9efdef4f3f ("mm, compaction:
khugepaged should not give up due to need_resched()") the code got quite
complicated to distinguish these two up to the __alloc_pages_slowpath() level,
so different decisions could be taken for khugepaged allocations.

After the recent changes, khugepaged allocations don't check for contended
compaction anymore, so we again don't need to distinguish lock and sched
contention, and simplify the current convoluted code a lot.

However, I believe it's also possible to simplify even more and completely
remove the check for contended compaction after the initial async compaction
for costly orders, which was originally aimed at THP page fault allocations.
There are several reasons why this can be done now:

- with the new defaults, THP page faults no longer do reclaim/compaction at
  all, unless the system admin has overridden the default, or application has
  indicated via madvise that it can benefit from THP's. In both cases, it
  means that the potential extra latency is expected and worth the benefits.
- even if reclaim/compaction proceeds after this patch where it previously
  wouldn't, the second compaction attempt is still async and will detect the
  contention and back off, if the contention persists
- there are still heuristics like deferred compaction and pageblock skip bits
  in place that prevent excessive THP page fault latencies

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h | 13 ++-------
 mm/compaction.c            | 72 +++++++++-------------------------------------
 mm/internal.h              |  5 +---
 mm/page_alloc.c            | 28 +-----------------
 4 files changed, 17 insertions(+), 101 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index ba67bc8edbb6..b3bb66e7ce55 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -55,14 +55,6 @@ enum compact_result {
 	COMPACT_PARTIAL,
 };
 
-/* Used to signal whether compaction detected need_sched() or lock contention */
-/* No contention detected */
-#define COMPACT_CONTENDED_NONE	0
-/* Either need_sched() was true or fatal signal pending */
-#define COMPACT_CONTENDED_SCHED	1
-/* Zone lock or lru_lock was contended in async compaction */
-#define COMPACT_CONTENDED_LOCK	2
-
 struct alloc_context; /* in mm/internal.h */
 
 #ifdef CONFIG_COMPACTION
@@ -76,9 +68,8 @@ extern int sysctl_compact_unevictable_allowed;
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
-			unsigned int order,
-		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio, int *contended);
+		unsigned int order, unsigned int alloc_flags,
+		const struct alloc_context *ac, enum compact_priority prio);
 extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
diff --git a/mm/compaction.c b/mm/compaction.c
index 19a4f4fd6632..826b6d95a05b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -279,7 +279,7 @@ static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags,
 {
 	if (cc->mode == MIGRATE_ASYNC) {
 		if (!spin_trylock_irqsave(lock, *flags)) {
-			cc->contended = COMPACT_CONTENDED_LOCK;
+			cc->contended = true;
 			return false;
 		}
 	} else {
@@ -313,13 +313,13 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
 	}
 
 	if (fatal_signal_pending(current)) {
-		cc->contended = COMPACT_CONTENDED_SCHED;
+		cc->contended = true;
 		return true;
 	}
 
 	if (need_resched()) {
 		if (cc->mode == MIGRATE_ASYNC) {
-			cc->contended = COMPACT_CONTENDED_SCHED;
+			cc->contended = true;
 			return true;
 		}
 		cond_resched();
@@ -342,7 +342,7 @@ static inline bool compact_should_abort(struct compact_control *cc)
 	/* async compaction aborts if contended */
 	if (need_resched()) {
 		if (cc->mode == MIGRATE_ASYNC) {
-			cc->contended = COMPACT_CONTENDED_SCHED;
+			cc->contended = true;
 			return true;
 		}
 
@@ -1565,14 +1565,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync, ret);
 
-	if (ret == COMPACT_CONTENDED)
-		ret = COMPACT_PARTIAL;
-
 	return ret;
 }
 
 static enum compact_result compact_zone_order(struct zone *zone, int order,
-		gfp_t gfp_mask, enum compact_priority prio, int *contended,
+		gfp_t gfp_mask, enum compact_priority prio,
 		unsigned int alloc_flags, int classzone_idx)
 {
 	enum compact_result ret;
@@ -1596,7 +1593,6 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
 
-	*contended = cc.contended;
 	return ret;
 }
 
@@ -1609,23 +1605,18 @@ int sysctl_extfrag_threshold = 500;
  * @alloc_flags: The allocation flags of the current allocation
  * @ac: The context of current allocation
  * @mode: The migration mode for async, sync light, or sync migration
- * @contended: Return value that determines if compaction was aborted due to
- *	       need_resched() or lock contention
  *
  * This is the main entry point for direct page compaction.
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio, int *contended)
+		enum compact_priority prio)
 {
 	int may_enter_fs = gfp_mask & __GFP_FS;
 	int may_perform_io = gfp_mask & __GFP_IO;
 	struct zoneref *z;
 	struct zone *zone;
 	enum compact_result rc = COMPACT_SKIPPED;
-	int all_zones_contended = COMPACT_CONTENDED_LOCK; /* init for &= op */
-
-	*contended = COMPACT_CONTENDED_NONE;
 
 	/* Check if the GFP flags allow compaction */
 	if (!order || !may_enter_fs || !may_perform_io)
@@ -1637,7 +1628,6 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
 								ac->nodemask) {
 		enum compact_result status;
-		int zone_contended;
 
 		if (compaction_deferred(zone, order)) {
 			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
@@ -1645,14 +1635,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		}
 
 		status = compact_zone_order(zone, order, gfp_mask, prio,
-				&zone_contended, alloc_flags,
-				ac_classzone_idx(ac));
+					alloc_flags, ac_classzone_idx(ac));
 		rc = max(status, rc);
-		/*
-		 * It takes at least one zone that wasn't lock contended
-		 * to clear all_zones_contended.
-		 */
-		all_zones_contended &= zone_contended;
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
@@ -1664,59 +1648,29 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			 * succeeds in this zone.
 			 */
 			compaction_defer_reset(zone, order, false);
-			/*
-			 * It is possible that async compaction aborted due to
-			 * need_resched() and the watermarks were ok thanks to
-			 * somebody else freeing memory. The allocation can
-			 * however still fail so we better signal the
-			 * need_resched() contention anyway (this will not
-			 * prevent the allocation attempt).
-			 */
-			if (zone_contended == COMPACT_CONTENDED_SCHED)
-				*contended = COMPACT_CONTENDED_SCHED;
 
-			goto break_loop;
+			break;
 		}
 
 		if (prio != COMPACT_PRIO_ASYNC && (status == COMPACT_COMPLETE ||
-					status == COMPACT_PARTIAL_SKIPPED)) {
+					status == COMPACT_PARTIAL_SKIPPED))
 			/*
 			 * We think that allocation won't succeed in this zone
 			 * so we defer compaction there. If it ends up
 			 * succeeding after all, it will be reset.
 			 */
 			defer_compaction(zone, order);
-		}
 
 		/*
 		 * We might have stopped compacting due to need_resched() in
 		 * async compaction, or due to a fatal signal detected. In that
-		 * case do not try further zones and signal need_resched()
-		 * contention.
-		 */
-		if ((zone_contended == COMPACT_CONTENDED_SCHED)
-					|| fatal_signal_pending(current)) {
-			*contended = COMPACT_CONTENDED_SCHED;
-			goto break_loop;
-		}
-
-		continue;
-break_loop:
-		/*
-		 * We might not have tried all the zones, so  be conservative
-		 * and assume they are not all lock contended.
+		 * case do not try further zones
 		 */
-		all_zones_contended = 0;
-		break;
+		if ((prio == COMPACT_PRIO_ASYNC && need_resched())
+					|| fatal_signal_pending(current))
+			break;
 	}
 
-	/*
-	 * If at least one zone wasn't deferred or skipped, we report if all
-	 * zones that were tried were lock contended.
-	 */
-	if (rc > COMPACT_INACTIVE && all_zones_contended)
-		*contended = COMPACT_CONTENDED_LOCK;
-
 	return rc;
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index a37e5b6f9d25..c7d6a395385b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -180,10 +180,7 @@ struct compact_control {
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
 	struct zone *zone;
-	int contended;			/* Signal need_sched() or lock
-					 * contention detected during
-					 * compaction
-					 */
+	bool contended;			/* Signal lock or sched contention */
 };
 
 unsigned long
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4466543a57ab..27923af8e534 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3183,14 +3183,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		enum compact_priority prio, enum compact_result *compact_result)
 {
 	struct page *page;
-	int contended_compaction;
 
 	if (!order)
 		return NULL;
 
 	current->flags |= PF_MEMALLOC;
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-						prio, &contended_compaction);
+									prio);
 	current->flags &= ~PF_MEMALLOC;
 
 	if (*compact_result <= COMPACT_INACTIVE)
@@ -3219,24 +3218,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	 */
 	count_vm_event(COMPACTFAIL);
 
-	/*
-	 * In all zones where compaction was attempted (and not
-	 * deferred or skipped), lock contention has been detected.
-	 * For THP allocation we do not want to disrupt the others
-	 * so we fallback to base pages instead.
-	 */
-	if (contended_compaction == COMPACT_CONTENDED_LOCK)
-		*compact_result = COMPACT_CONTENDED;
-
-	/*
-	 * If compaction was aborted due to need_resched(), we do not
-	 * want to further increase allocation latency, unless it is
-	 * khugepaged trying to collapse.
-	 */
-	if (contended_compaction == COMPACT_CONTENDED_SCHED
-		&& !(current->flags & PF_KTHREAD))
-		*compact_result = COMPACT_CONTENDED;
-
 	cond_resched();
 
 	return NULL;
@@ -3626,13 +3607,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				goto nopage;
 
 			/*
-			 * Compaction is contended so rather back off than cause
-			 * excessive stalls.
-			 */
-			if (compact_result == COMPACT_CONTENDED)
-				goto nopage;
-
-			/*
 			 * Looks like reclaim/compaction is worth trying, but
 			 * sync compaction could be very expensive, so keep
 			 * using async compaction.
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 08/18] mm, compaction: simplify contended compaction handling
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

Async compaction detects contention either due to failing trylock on zone->lock
or lru_lock, or by need_resched(). Since 1f9efdef4f3f ("mm, compaction:
khugepaged should not give up due to need_resched()") the code got quite
complicated to distinguish these two up to the __alloc_pages_slowpath() level,
so different decisions could be taken for khugepaged allocations.

After the recent changes, khugepaged allocations don't check for contended
compaction anymore, so we again don't need to distinguish lock and sched
contention, and simplify the current convoluted code a lot.

However, I believe it's also possible to simplify even more and completely
remove the check for contended compaction after the initial async compaction
for costly orders, which was originally aimed at THP page fault allocations.
There are several reasons why this can be done now:

- with the new defaults, THP page faults no longer do reclaim/compaction at
  all, unless the system admin has overridden the default, or application has
  indicated via madvise that it can benefit from THP's. In both cases, it
  means that the potential extra latency is expected and worth the benefits.
- even if reclaim/compaction proceeds after this patch where it previously
  wouldn't, the second compaction attempt is still async and will detect the
  contention and back off, if the contention persists
- there are still heuristics like deferred compaction and pageblock skip bits
  in place that prevent excessive THP page fault latencies

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h | 13 ++-------
 mm/compaction.c            | 72 +++++++++-------------------------------------
 mm/internal.h              |  5 +---
 mm/page_alloc.c            | 28 +-----------------
 4 files changed, 17 insertions(+), 101 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index ba67bc8edbb6..b3bb66e7ce55 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -55,14 +55,6 @@ enum compact_result {
 	COMPACT_PARTIAL,
 };
 
-/* Used to signal whether compaction detected need_sched() or lock contention */
-/* No contention detected */
-#define COMPACT_CONTENDED_NONE	0
-/* Either need_sched() was true or fatal signal pending */
-#define COMPACT_CONTENDED_SCHED	1
-/* Zone lock or lru_lock was contended in async compaction */
-#define COMPACT_CONTENDED_LOCK	2
-
 struct alloc_context; /* in mm/internal.h */
 
 #ifdef CONFIG_COMPACTION
@@ -76,9 +68,8 @@ extern int sysctl_compact_unevictable_allowed;
 
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
-			unsigned int order,
-		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio, int *contended);
+		unsigned int order, unsigned int alloc_flags,
+		const struct alloc_context *ac, enum compact_priority prio);
 extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
diff --git a/mm/compaction.c b/mm/compaction.c
index 19a4f4fd6632..826b6d95a05b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -279,7 +279,7 @@ static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags,
 {
 	if (cc->mode == MIGRATE_ASYNC) {
 		if (!spin_trylock_irqsave(lock, *flags)) {
-			cc->contended = COMPACT_CONTENDED_LOCK;
+			cc->contended = true;
 			return false;
 		}
 	} else {
@@ -313,13 +313,13 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
 	}
 
 	if (fatal_signal_pending(current)) {
-		cc->contended = COMPACT_CONTENDED_SCHED;
+		cc->contended = true;
 		return true;
 	}
 
 	if (need_resched()) {
 		if (cc->mode == MIGRATE_ASYNC) {
-			cc->contended = COMPACT_CONTENDED_SCHED;
+			cc->contended = true;
 			return true;
 		}
 		cond_resched();
@@ -342,7 +342,7 @@ static inline bool compact_should_abort(struct compact_control *cc)
 	/* async compaction aborts if contended */
 	if (need_resched()) {
 		if (cc->mode == MIGRATE_ASYNC) {
-			cc->contended = COMPACT_CONTENDED_SCHED;
+			cc->contended = true;
 			return true;
 		}
 
@@ -1565,14 +1565,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync, ret);
 
-	if (ret == COMPACT_CONTENDED)
-		ret = COMPACT_PARTIAL;
-
 	return ret;
 }
 
 static enum compact_result compact_zone_order(struct zone *zone, int order,
-		gfp_t gfp_mask, enum compact_priority prio, int *contended,
+		gfp_t gfp_mask, enum compact_priority prio,
 		unsigned int alloc_flags, int classzone_idx)
 {
 	enum compact_result ret;
@@ -1596,7 +1593,6 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
 
-	*contended = cc.contended;
 	return ret;
 }
 
@@ -1609,23 +1605,18 @@ int sysctl_extfrag_threshold = 500;
  * @alloc_flags: The allocation flags of the current allocation
  * @ac: The context of current allocation
  * @mode: The migration mode for async, sync light, or sync migration
- * @contended: Return value that determines if compaction was aborted due to
- *	       need_resched() or lock contention
  *
  * This is the main entry point for direct page compaction.
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio, int *contended)
+		enum compact_priority prio)
 {
 	int may_enter_fs = gfp_mask & __GFP_FS;
 	int may_perform_io = gfp_mask & __GFP_IO;
 	struct zoneref *z;
 	struct zone *zone;
 	enum compact_result rc = COMPACT_SKIPPED;
-	int all_zones_contended = COMPACT_CONTENDED_LOCK; /* init for &= op */
-
-	*contended = COMPACT_CONTENDED_NONE;
 
 	/* Check if the GFP flags allow compaction */
 	if (!order || !may_enter_fs || !may_perform_io)
@@ -1637,7 +1628,6 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
 								ac->nodemask) {
 		enum compact_result status;
-		int zone_contended;
 
 		if (compaction_deferred(zone, order)) {
 			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
@@ -1645,14 +1635,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		}
 
 		status = compact_zone_order(zone, order, gfp_mask, prio,
-				&zone_contended, alloc_flags,
-				ac_classzone_idx(ac));
+					alloc_flags, ac_classzone_idx(ac));
 		rc = max(status, rc);
-		/*
-		 * It takes at least one zone that wasn't lock contended
-		 * to clear all_zones_contended.
-		 */
-		all_zones_contended &= zone_contended;
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
@@ -1664,59 +1648,29 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			 * succeeds in this zone.
 			 */
 			compaction_defer_reset(zone, order, false);
-			/*
-			 * It is possible that async compaction aborted due to
-			 * need_resched() and the watermarks were ok thanks to
-			 * somebody else freeing memory. The allocation can
-			 * however still fail so we better signal the
-			 * need_resched() contention anyway (this will not
-			 * prevent the allocation attempt).
-			 */
-			if (zone_contended == COMPACT_CONTENDED_SCHED)
-				*contended = COMPACT_CONTENDED_SCHED;
 
-			goto break_loop;
+			break;
 		}
 
 		if (prio != COMPACT_PRIO_ASYNC && (status == COMPACT_COMPLETE ||
-					status == COMPACT_PARTIAL_SKIPPED)) {
+					status == COMPACT_PARTIAL_SKIPPED))
 			/*
 			 * We think that allocation won't succeed in this zone
 			 * so we defer compaction there. If it ends up
 			 * succeeding after all, it will be reset.
 			 */
 			defer_compaction(zone, order);
-		}
 
 		/*
 		 * We might have stopped compacting due to need_resched() in
 		 * async compaction, or due to a fatal signal detected. In that
-		 * case do not try further zones and signal need_resched()
-		 * contention.
-		 */
-		if ((zone_contended == COMPACT_CONTENDED_SCHED)
-					|| fatal_signal_pending(current)) {
-			*contended = COMPACT_CONTENDED_SCHED;
-			goto break_loop;
-		}
-
-		continue;
-break_loop:
-		/*
-		 * We might not have tried all the zones, so  be conservative
-		 * and assume they are not all lock contended.
+		 * case do not try further zones
 		 */
-		all_zones_contended = 0;
-		break;
+		if ((prio == COMPACT_PRIO_ASYNC && need_resched())
+					|| fatal_signal_pending(current))
+			break;
 	}
 
-	/*
-	 * If at least one zone wasn't deferred or skipped, we report if all
-	 * zones that were tried were lock contended.
-	 */
-	if (rc > COMPACT_INACTIVE && all_zones_contended)
-		*contended = COMPACT_CONTENDED_LOCK;
-
 	return rc;
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index a37e5b6f9d25..c7d6a395385b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -180,10 +180,7 @@ struct compact_control {
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
 	const int classzone_idx;	/* zone index of a direct compactor */
 	struct zone *zone;
-	int contended;			/* Signal need_sched() or lock
-					 * contention detected during
-					 * compaction
-					 */
+	bool contended;			/* Signal lock or sched contention */
 };
 
 unsigned long
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4466543a57ab..27923af8e534 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3183,14 +3183,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		enum compact_priority prio, enum compact_result *compact_result)
 {
 	struct page *page;
-	int contended_compaction;
 
 	if (!order)
 		return NULL;
 
 	current->flags |= PF_MEMALLOC;
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-						prio, &contended_compaction);
+									prio);
 	current->flags &= ~PF_MEMALLOC;
 
 	if (*compact_result <= COMPACT_INACTIVE)
@@ -3219,24 +3218,6 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	 */
 	count_vm_event(COMPACTFAIL);
 
-	/*
-	 * In all zones where compaction was attempted (and not
-	 * deferred or skipped), lock contention has been detected.
-	 * For THP allocation we do not want to disrupt the others
-	 * so we fallback to base pages instead.
-	 */
-	if (contended_compaction == COMPACT_CONTENDED_LOCK)
-		*compact_result = COMPACT_CONTENDED;
-
-	/*
-	 * If compaction was aborted due to need_resched(), we do not
-	 * want to further increase allocation latency, unless it is
-	 * khugepaged trying to collapse.
-	 */
-	if (contended_compaction == COMPACT_CONTENDED_SCHED
-		&& !(current->flags & PF_KTHREAD))
-		*compact_result = COMPACT_CONTENDED;
-
 	cond_resched();
 
 	return NULL;
@@ -3626,13 +3607,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				goto nopage;
 
 			/*
-			 * Compaction is contended so rather back off than cause
-			 * excessive stalls.
-			 */
-			if (compact_result == COMPACT_CONTENDED)
-				goto nopage;
-
-			/*
 			 * Looks like reclaim/compaction is worth trying, but
 			 * sync compaction could be very expensive, so keep
 			 * using async compaction.
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 09/18] mm, compaction: make whole_zone flag ignore cached scanner positions
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

A recent patch has added whole_zone flag that compaction sets when scanning
starts from the zone boundary, in order to report that zone has been fully
scanned in one attempt. For allocations that want to try really hard or cannot
fail, we will want to introduce a mode where scanning whole zone is guaranteed
regardless of the cached positions.

This patch reuses the whole_zone flag in a way that if it's already passed true
to compaction, the cached scanner positions are ignored. Employing this flag
during reclaim/compaction loop will be done in the next patch. This patch
however converts compaction invoked from userspace via procfs to use this flag.
Before this patch, the cached positions were first reset to zone boundaries and
then read back from struct zone, so there was a window where a parallel
compaction could replace the reset values, making the manual compaction less
effective. Using the flag instead of performing reset is more robust.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/compaction.c | 15 +++++----------
 mm/internal.h   |  2 +-
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 826b6d95a05b..78c99300b911 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1443,11 +1443,13 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	 */
 	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
-	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+	if (cc->whole_zone || cc->free_pfn < start_pfn ||
+						cc->free_pfn >= end_pfn) {
 		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
 		zone->compact_cached_free_pfn = cc->free_pfn;
 	}
-	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
+	if (cc->whole_zone || cc->migrate_pfn < start_pfn ||
+						cc->migrate_pfn >= end_pfn) {
 		cc->migrate_pfn = start_pfn;
 		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
 		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
@@ -1693,14 +1695,6 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		INIT_LIST_HEAD(&cc->freepages);
 		INIT_LIST_HEAD(&cc->migratepages);
 
-		/*
-		 * When called via /proc/sys/vm/compact_memory
-		 * this makes sure we compact the whole zone regardless of
-		 * cached scanner positions.
-		 */
-		if (is_via_compact_memory(cc->order))
-			__reset_isolation_suitable(zone);
-
 		if (is_via_compact_memory(cc->order) ||
 				!compaction_deferred(zone, cc->order))
 			compact_zone(zone, cc);
@@ -1736,6 +1730,7 @@ static void compact_node(int nid)
 		.order = -1,
 		.mode = MIGRATE_SYNC,
 		.ignore_skip_hint = true,
+		.whole_zone = true,
 	};
 
 	__compact_pgdat(NODE_DATA(nid), &cc);
diff --git a/mm/internal.h b/mm/internal.h
index c7d6a395385b..a4d3ce761839 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -174,7 +174,7 @@ struct compact_control {
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
-	bool whole_zone;		/* Whole zone has been scanned */
+	bool whole_zone;		/* Whole zone should/has been scanned */
 	int order;			/* order a direct compactor needs */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 09/18] mm, compaction: make whole_zone flag ignore cached scanner positions
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

A recent patch has added whole_zone flag that compaction sets when scanning
starts from the zone boundary, in order to report that zone has been fully
scanned in one attempt. For allocations that want to try really hard or cannot
fail, we will want to introduce a mode where scanning whole zone is guaranteed
regardless of the cached positions.

This patch reuses the whole_zone flag in a way that if it's already passed true
to compaction, the cached scanner positions are ignored. Employing this flag
during reclaim/compaction loop will be done in the next patch. This patch
however converts compaction invoked from userspace via procfs to use this flag.
Before this patch, the cached positions were first reset to zone boundaries and
then read back from struct zone, so there was a window where a parallel
compaction could replace the reset values, making the manual compaction less
effective. Using the flag instead of performing reset is more robust.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/compaction.c | 15 +++++----------
 mm/internal.h   |  2 +-
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 826b6d95a05b..78c99300b911 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1443,11 +1443,13 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	 */
 	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
-	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+	if (cc->whole_zone || cc->free_pfn < start_pfn ||
+						cc->free_pfn >= end_pfn) {
 		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
 		zone->compact_cached_free_pfn = cc->free_pfn;
 	}
-	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
+	if (cc->whole_zone || cc->migrate_pfn < start_pfn ||
+						cc->migrate_pfn >= end_pfn) {
 		cc->migrate_pfn = start_pfn;
 		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
 		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
@@ -1693,14 +1695,6 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		INIT_LIST_HEAD(&cc->freepages);
 		INIT_LIST_HEAD(&cc->migratepages);
 
-		/*
-		 * When called via /proc/sys/vm/compact_memory
-		 * this makes sure we compact the whole zone regardless of
-		 * cached scanner positions.
-		 */
-		if (is_via_compact_memory(cc->order))
-			__reset_isolation_suitable(zone);
-
 		if (is_via_compact_memory(cc->order) ||
 				!compaction_deferred(zone, cc->order))
 			compact_zone(zone, cc);
@@ -1736,6 +1730,7 @@ static void compact_node(int nid)
 		.order = -1,
 		.mode = MIGRATE_SYNC,
 		.ignore_skip_hint = true,
+		.whole_zone = true,
 	};
 
 	__compact_pgdat(NODE_DATA(nid), &cc);
diff --git a/mm/internal.h b/mm/internal.h
index c7d6a395385b..a4d3ce761839 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -174,7 +174,7 @@ struct compact_control {
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
-	bool whole_zone;		/* Whole zone has been scanned */
+	bool whole_zone;		/* Whole zone should/has been scanned */
 	int order;			/* order a direct compactor needs */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 10/18] mm, compaction: cleanup unused functions
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

Since kswapd compaction moved to kcompactd, compact_pgdat() is not called
anymore, so we remove it. The only caller of __compact_pgdat() is
compact_node(), so we merge them and remove code that was only reachable from
kswapd.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h |  5 ----
 mm/compaction.c            | 60 +++++++++++++---------------------------------
 2 files changed, 17 insertions(+), 48 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index b3bb66e7ce55..22a5fb9c509c 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -70,7 +70,6 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
 		const struct alloc_context *ac, enum compact_priority prio);
-extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
 		unsigned int alloc_flags, int classzone_idx);
@@ -154,10 +153,6 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
-static inline void compact_pgdat(pg_data_t *pgdat, int order)
-{
-}
-
 static inline void reset_isolation_suitable(pg_data_t *pgdat)
 {
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index 78c99300b911..af50f20de369 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1678,10 +1678,18 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 
 
 /* Compact all zones within a node */
-static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
+static void compact_node(int nid)
 {
+	pg_data_t *pgdat = NODE_DATA(nid);
 	int zoneid;
 	struct zone *zone;
+	struct compact_control cc = {
+		.order = -1,
+		.mode = MIGRATE_SYNC,
+		.ignore_skip_hint = true,
+		.whole_zone = true,
+	};
+
 
 	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
 
@@ -1689,53 +1697,19 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		if (!populated_zone(zone))
 			continue;
 
-		cc->nr_freepages = 0;
-		cc->nr_migratepages = 0;
-		cc->zone = zone;
-		INIT_LIST_HEAD(&cc->freepages);
-		INIT_LIST_HEAD(&cc->migratepages);
-
-		if (is_via_compact_memory(cc->order) ||
-				!compaction_deferred(zone, cc->order))
-			compact_zone(zone, cc);
-
-		VM_BUG_ON(!list_empty(&cc->freepages));
-		VM_BUG_ON(!list_empty(&cc->migratepages));
+		cc.nr_freepages = 0;
+		cc.nr_migratepages = 0;
+		cc.zone = zone;
+		INIT_LIST_HEAD(&cc.freepages);
+		INIT_LIST_HEAD(&cc.migratepages);
 
-		if (is_via_compact_memory(cc->order))
-			continue;
+		compact_zone(zone, &cc);
 
-		if (zone_watermark_ok(zone, cc->order,
-				low_wmark_pages(zone), 0, 0))
-			compaction_defer_reset(zone, cc->order, false);
+		VM_BUG_ON(!list_empty(&cc.freepages));
+		VM_BUG_ON(!list_empty(&cc.migratepages));
 	}
 }
 
-void compact_pgdat(pg_data_t *pgdat, int order)
-{
-	struct compact_control cc = {
-		.order = order,
-		.mode = MIGRATE_ASYNC,
-	};
-
-	if (!order)
-		return;
-
-	__compact_pgdat(pgdat, &cc);
-}
-
-static void compact_node(int nid)
-{
-	struct compact_control cc = {
-		.order = -1,
-		.mode = MIGRATE_SYNC,
-		.ignore_skip_hint = true,
-		.whole_zone = true,
-	};
-
-	__compact_pgdat(NODE_DATA(nid), &cc);
-}
-
 /* Compact all nodes in the system */
 static void compact_nodes(void)
 {
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 10/18] mm, compaction: cleanup unused functions
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

Since kswapd compaction moved to kcompactd, compact_pgdat() is not called
anymore, so we remove it. The only caller of __compact_pgdat() is
compact_node(), so we merge them and remove code that was only reachable from
kswapd.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h |  5 ----
 mm/compaction.c            | 60 +++++++++++++---------------------------------
 2 files changed, 17 insertions(+), 48 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index b3bb66e7ce55..22a5fb9c509c 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -70,7 +70,6 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
 		const struct alloc_context *ac, enum compact_priority prio);
-extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
 		unsigned int alloc_flags, int classzone_idx);
@@ -154,10 +153,6 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
-static inline void compact_pgdat(pg_data_t *pgdat, int order)
-{
-}
-
 static inline void reset_isolation_suitable(pg_data_t *pgdat)
 {
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index 78c99300b911..af50f20de369 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1678,10 +1678,18 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 
 
 /* Compact all zones within a node */
-static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
+static void compact_node(int nid)
 {
+	pg_data_t *pgdat = NODE_DATA(nid);
 	int zoneid;
 	struct zone *zone;
+	struct compact_control cc = {
+		.order = -1,
+		.mode = MIGRATE_SYNC,
+		.ignore_skip_hint = true,
+		.whole_zone = true,
+	};
+
 
 	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
 
@@ -1689,53 +1697,19 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		if (!populated_zone(zone))
 			continue;
 
-		cc->nr_freepages = 0;
-		cc->nr_migratepages = 0;
-		cc->zone = zone;
-		INIT_LIST_HEAD(&cc->freepages);
-		INIT_LIST_HEAD(&cc->migratepages);
-
-		if (is_via_compact_memory(cc->order) ||
-				!compaction_deferred(zone, cc->order))
-			compact_zone(zone, cc);
-
-		VM_BUG_ON(!list_empty(&cc->freepages));
-		VM_BUG_ON(!list_empty(&cc->migratepages));
+		cc.nr_freepages = 0;
+		cc.nr_migratepages = 0;
+		cc.zone = zone;
+		INIT_LIST_HEAD(&cc.freepages);
+		INIT_LIST_HEAD(&cc.migratepages);
 
-		if (is_via_compact_memory(cc->order))
-			continue;
+		compact_zone(zone, &cc);
 
-		if (zone_watermark_ok(zone, cc->order,
-				low_wmark_pages(zone), 0, 0))
-			compaction_defer_reset(zone, cc->order, false);
+		VM_BUG_ON(!list_empty(&cc.freepages));
+		VM_BUG_ON(!list_empty(&cc.migratepages));
 	}
 }
 
-void compact_pgdat(pg_data_t *pgdat, int order)
-{
-	struct compact_control cc = {
-		.order = order,
-		.mode = MIGRATE_ASYNC,
-	};
-
-	if (!order)
-		return;
-
-	__compact_pgdat(pgdat, &cc);
-}
-
-static void compact_node(int nid)
-{
-	struct compact_control cc = {
-		.order = -1,
-		.mode = MIGRATE_SYNC,
-		.ignore_skip_hint = true,
-		.whole_zone = true,
-	};
-
-	__compact_pgdat(NODE_DATA(nid), &cc);
-}
-
 /* Compact all nodes in the system */
 static void compact_nodes(void)
 {
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 11/18] mm, compaction: add the ultimate direct compaction priority
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

During reclaim/compaction loop, it's desirable to get a final answer from
unsuccessful compaction so we can either fail the allocation or invoke the OOM
killer. However, heuristics such as deferred compaction or pageblock skip bits
can cause compaction to skip parts or whole zones and lead to premature OOM's,
failures or excessive reclaim/compaction retries.

To remedy this, we introduce a new direct compaction priority called
COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to:

- ignore deferred compaction status for a zone
- ignore pageblock skip hints
- ignore cached scanner positions and scan the whole zone

The new priority should get eventually picked up by should_compact_retry() and
this should improve success rates for costly allocations using __GFP_REPEAT,
such as hugetlbfs allocations, and reduce some corner-case OOM's for non-costly
allocations.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h | 3 ++-
 mm/compaction.c            | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 22a5fb9c509c..29dc7c05bd3b 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -6,8 +6,9 @@
  * Lower value means higher priority, analogically to reclaim priority.
  */
 enum compact_priority {
+	COMPACT_PRIO_SYNC_FULL,
+	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL,
 	COMPACT_PRIO_SYNC_LIGHT,
-	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
 	DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
 	COMPACT_PRIO_ASYNC,
 	INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
diff --git a/mm/compaction.c b/mm/compaction.c
index af50f20de369..a399e7ca4630 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1586,6 +1586,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.alloc_flags = alloc_flags,
 		.classzone_idx = classzone_idx,
 		.direct_compaction = true,
+		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
+		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
@@ -1631,7 +1633,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 								ac->nodemask) {
 		enum compact_result status;
 
-		if (compaction_deferred(zone, order)) {
+		if (prio > COMPACT_PRIO_SYNC_FULL
+					&& compaction_deferred(zone, order)) {
 			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
 			continue;
 		}
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 11/18] mm, compaction: add the ultimate direct compaction priority
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

During reclaim/compaction loop, it's desirable to get a final answer from
unsuccessful compaction so we can either fail the allocation or invoke the OOM
killer. However, heuristics such as deferred compaction or pageblock skip bits
can cause compaction to skip parts or whole zones and lead to premature OOM's,
failures or excessive reclaim/compaction retries.

To remedy this, we introduce a new direct compaction priority called
COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to:

- ignore deferred compaction status for a zone
- ignore pageblock skip hints
- ignore cached scanner positions and scan the whole zone

The new priority should get eventually picked up by should_compact_retry() and
this should improve success rates for costly allocations using __GFP_REPEAT,
such as hugetlbfs allocations, and reduce some corner-case OOM's for non-costly
allocations.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h | 3 ++-
 mm/compaction.c            | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 22a5fb9c509c..29dc7c05bd3b 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -6,8 +6,9 @@
  * Lower value means higher priority, analogically to reclaim priority.
  */
 enum compact_priority {
+	COMPACT_PRIO_SYNC_FULL,
+	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL,
 	COMPACT_PRIO_SYNC_LIGHT,
-	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
 	DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
 	COMPACT_PRIO_ASYNC,
 	INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
diff --git a/mm/compaction.c b/mm/compaction.c
index af50f20de369..a399e7ca4630 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1586,6 +1586,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.alloc_flags = alloc_flags,
 		.classzone_idx = classzone_idx,
 		.direct_compaction = true,
+		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
+		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
@@ -1631,7 +1633,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 								ac->nodemask) {
 		enum compact_result status;
 
-		if (compaction_deferred(zone, order)) {
+		if (prio > COMPACT_PRIO_SYNC_FULL
+					&& compaction_deferred(zone, order)) {
 			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
 			continue;
 		}
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 12/18] mm, compaction: more reliably increase direct compaction priority
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

During reclaim/compaction loop, compaction priority can be increased by the
should_compact_retry() function, but the current code is not optimal. Priority
is only increased when compaction_failed() is true, which means that compaction
has scanned the whole zone. This may not happen even after multiple attempts
with the lower priority due to parallel activity, so we might needlessly
struggle on the lower priority.

We can remove these corner cases by increasing compaction priority regardless
of compaction_failed(). Examining further the compaction result can be
postponed only after reaching the highest priority. This is a simple solution
and we don't need to worry about reaching the highest priority "too soon" here,
because hen should_compact_retry() is called it means that the system is
already struggling and the allocation is supposed to either try as hard as
possible, or it cannot fail at all. There's not much point staying at lower
priorities with heuristics that may result in only partial compaction.

The only exception here is the COMPACT_SKIPPED result, which means that
compaction could not run at all due to being below order-0 watermarks. In that
case, don't increase compaction priority, and check if compaction could proceed
when everything reclaimable was reclaimed. Before this patch, this was tied to
compaction_withdrawn(), but the other results considered there are in fact only
possible due to low compaction priority so we can ignore them thanks to the
patch. Since there are no other callers of compaction_withdrawn(), remove it.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h | 46 ----------------------------------------------
 mm/page_alloc.c            | 37 ++++++++++++++++++++++---------------
 2 files changed, 22 insertions(+), 61 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 29dc7c05bd3b..4bef69a83f8f 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -105,47 +105,6 @@ static inline bool compaction_failed(enum compact_result result)
 	return false;
 }
 
-/*
- * Compaction  has backed off for some reason. It might be throttling or
- * lock contention. Retrying is still worthwhile.
- */
-static inline bool compaction_withdrawn(enum compact_result result)
-{
-	/*
-	 * Compaction backed off due to watermark checks for order-0
-	 * so the regular reclaim has to try harder and reclaim something.
-	 */
-	if (result == COMPACT_SKIPPED)
-		return true;
-
-	/*
-	 * If compaction is deferred for high-order allocations, it is
-	 * because sync compaction recently failed. If this is the case
-	 * and the caller requested a THP allocation, we do not want
-	 * to heavily disrupt the system, so we fail the allocation
-	 * instead of entering direct reclaim.
-	 */
-	if (result == COMPACT_DEFERRED)
-		return true;
-
-	/*
-	 * If compaction in async mode encounters contention or blocks higher
-	 * priority task we back off early rather than cause stalls.
-	 */
-	if (result == COMPACT_CONTENDED)
-		return true;
-
-	/*
-	 * Page scanners have met but we haven't scanned full zones so this
-	 * is a back off in fact.
-	 */
-	if (result == COMPACT_PARTIAL_SKIPPED)
-		return true;
-
-	return false;
-}
-
-
 bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 					int alloc_flags);
 
@@ -183,11 +142,6 @@ static inline bool compaction_failed(enum compact_result result)
 	return false;
 }
 
-static inline bool compaction_withdrawn(enum compact_result result)
-{
-	return true;
-}
-
 static inline int kcompactd_run(int nid)
 {
 	return 0;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 27923af8e534..dee486936ccf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3235,28 +3235,35 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 		return false;
 
 	/*
-	 * compaction considers all the zone as desperately out of memory
-	 * so it doesn't really make much sense to retry except when the
-	 * failure could be caused by insufficient priority
+	 * Compaction backed off due to watermark checks for order-0
+	 * so the regular reclaim has to try harder and reclaim something
+	 * Retry only if it looks like reclaim might have a chance.
 	 */
-	if (compaction_failed(compact_result)) {
-		if (*compact_priority > MIN_COMPACT_PRIORITY) {
-			(*compact_priority)--;
-			return true;
-		}
-		return false;
+	if (compact_result == COMPACT_SKIPPED)
+		return compaction_zonelist_suitable(ac, order, alloc_flags);
+
+	/*
+	 * Compaction could have withdrawn early or skip some zones or
+	 * pageblocks. We were asked to retry, which means the allocation
+	 * should try really hard, so increase the priority if possible.
+	 */
+	if (*compact_priority > MIN_COMPACT_PRIORITY) {
+		(*compact_priority)--;
+		return true;
 	}
 
 	/*
-	 * make sure the compaction wasn't deferred or didn't bail out early
-	 * due to locks contention before we declare that we should give up.
-	 * But do not retry if the given zonelist is not suitable for
-	 * compaction.
+	 * Compaction considers all the zones as unfixably fragmented and we
+	 * are on the highest priority, which means it can't be due to
+	 * heuristics and it doesn't really make much sense to retry.
 	 */
-	if (compaction_withdrawn(compact_result))
-		return compaction_zonelist_suitable(ac, order, alloc_flags);
+	if (compaction_failed(compact_result))
+		return false;
 
 	/*
+	 * The remaining possibility is that compaction made progress and
+	 * created a high-order page, but it was allocated by somebody else.
+	 * To prevent thrashing, limit the number of retries in such case.
 	 * !costly requests are much more important than __GFP_REPEAT
 	 * costly ones because they are de facto nofail and invoke OOM
 	 * killer to move on while costly can fail and users are ready
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 12/18] mm, compaction: more reliably increase direct compaction priority
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

During reclaim/compaction loop, compaction priority can be increased by the
should_compact_retry() function, but the current code is not optimal. Priority
is only increased when compaction_failed() is true, which means that compaction
has scanned the whole zone. This may not happen even after multiple attempts
with the lower priority due to parallel activity, so we might needlessly
struggle on the lower priority.

We can remove these corner cases by increasing compaction priority regardless
of compaction_failed(). Examining further the compaction result can be
postponed only after reaching the highest priority. This is a simple solution
and we don't need to worry about reaching the highest priority "too soon" here,
because hen should_compact_retry() is called it means that the system is
already struggling and the allocation is supposed to either try as hard as
possible, or it cannot fail at all. There's not much point staying at lower
priorities with heuristics that may result in only partial compaction.

The only exception here is the COMPACT_SKIPPED result, which means that
compaction could not run at all due to being below order-0 watermarks. In that
case, don't increase compaction priority, and check if compaction could proceed
when everything reclaimable was reclaimed. Before this patch, this was tied to
compaction_withdrawn(), but the other results considered there are in fact only
possible due to low compaction priority so we can ignore them thanks to the
patch. Since there are no other callers of compaction_withdrawn(), remove it.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h | 46 ----------------------------------------------
 mm/page_alloc.c            | 37 ++++++++++++++++++++++---------------
 2 files changed, 22 insertions(+), 61 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 29dc7c05bd3b..4bef69a83f8f 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -105,47 +105,6 @@ static inline bool compaction_failed(enum compact_result result)
 	return false;
 }
 
-/*
- * Compaction  has backed off for some reason. It might be throttling or
- * lock contention. Retrying is still worthwhile.
- */
-static inline bool compaction_withdrawn(enum compact_result result)
-{
-	/*
-	 * Compaction backed off due to watermark checks for order-0
-	 * so the regular reclaim has to try harder and reclaim something.
-	 */
-	if (result == COMPACT_SKIPPED)
-		return true;
-
-	/*
-	 * If compaction is deferred for high-order allocations, it is
-	 * because sync compaction recently failed. If this is the case
-	 * and the caller requested a THP allocation, we do not want
-	 * to heavily disrupt the system, so we fail the allocation
-	 * instead of entering direct reclaim.
-	 */
-	if (result == COMPACT_DEFERRED)
-		return true;
-
-	/*
-	 * If compaction in async mode encounters contention or blocks higher
-	 * priority task we back off early rather than cause stalls.
-	 */
-	if (result == COMPACT_CONTENDED)
-		return true;
-
-	/*
-	 * Page scanners have met but we haven't scanned full zones so this
-	 * is a back off in fact.
-	 */
-	if (result == COMPACT_PARTIAL_SKIPPED)
-		return true;
-
-	return false;
-}
-
-
 bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 					int alloc_flags);
 
@@ -183,11 +142,6 @@ static inline bool compaction_failed(enum compact_result result)
 	return false;
 }
 
-static inline bool compaction_withdrawn(enum compact_result result)
-{
-	return true;
-}
-
 static inline int kcompactd_run(int nid)
 {
 	return 0;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 27923af8e534..dee486936ccf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3235,28 +3235,35 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 		return false;
 
 	/*
-	 * compaction considers all the zone as desperately out of memory
-	 * so it doesn't really make much sense to retry except when the
-	 * failure could be caused by insufficient priority
+	 * Compaction backed off due to watermark checks for order-0
+	 * so the regular reclaim has to try harder and reclaim something
+	 * Retry only if it looks like reclaim might have a chance.
 	 */
-	if (compaction_failed(compact_result)) {
-		if (*compact_priority > MIN_COMPACT_PRIORITY) {
-			(*compact_priority)--;
-			return true;
-		}
-		return false;
+	if (compact_result == COMPACT_SKIPPED)
+		return compaction_zonelist_suitable(ac, order, alloc_flags);
+
+	/*
+	 * Compaction could have withdrawn early or skip some zones or
+	 * pageblocks. We were asked to retry, which means the allocation
+	 * should try really hard, so increase the priority if possible.
+	 */
+	if (*compact_priority > MIN_COMPACT_PRIORITY) {
+		(*compact_priority)--;
+		return true;
 	}
 
 	/*
-	 * make sure the compaction wasn't deferred or didn't bail out early
-	 * due to locks contention before we declare that we should give up.
-	 * But do not retry if the given zonelist is not suitable for
-	 * compaction.
+	 * Compaction considers all the zones as unfixably fragmented and we
+	 * are on the highest priority, which means it can't be due to
+	 * heuristics and it doesn't really make much sense to retry.
 	 */
-	if (compaction_withdrawn(compact_result))
-		return compaction_zonelist_suitable(ac, order, alloc_flags);
+	if (compaction_failed(compact_result))
+		return false;
 
 	/*
+	 * The remaining possibility is that compaction made progress and
+	 * created a high-order page, but it was allocated by somebody else.
+	 * To prevent thrashing, limit the number of retries in such case.
 	 * !costly requests are much more important than __GFP_REPEAT
 	 * costly ones because they are de facto nofail and invoke OOM
 	 * killer to move on while costly can fail and users are ready
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 13/18] mm, compaction: use correct watermark when checking allocation success
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The __compact_finished() function uses low watermark in a check that has to
pass if the direct compaction is to finish and allocation should succeed. This
is too pessimistic, as the allocation will typically use min watermark. It may
happen that during compaction, we drop below the low watermark (due to parallel
activity), but still form the target high-order page. By checking against low
watermark, we might needlessly continue compaction.

Similarly, __compaction_suitable() uses low watermark in a check whether
allocation can succeed without compaction. Again, this is unnecessarily
pessimistic.

After this patch, these check will use direct compactor's alloc_flags to
determine the watermark, which is effectively the min watermark.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a399e7ca4630..4b21a26694a2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1262,7 +1262,7 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 		return COMPACT_CONTINUE;
 
 	/* Compaction run is not finished if the watermark is not met */
-	watermark = low_wmark_pages(zone);
+	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
 
 	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
 							cc->alloc_flags))
@@ -1327,7 +1327,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	if (is_via_compact_memory(order))
 		return COMPACT_CONTINUE;
 
-	watermark = low_wmark_pages(zone);
+	watermark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 	/*
 	 * If watermarks for high-order allocation are already met, there
 	 * should be no need for compaction at all.
@@ -1341,7 +1341,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 * This is because during migration, copies of pages need to be
 	 * allocated and for a short time, the footprint is higher
 	 */
-	watermark += (2UL << order);
+	watermark = low_wmark_pages(zone) + (2UL << order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 				 alloc_flags, wmark_target))
 		return COMPACT_SKIPPED;
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 13/18] mm, compaction: use correct watermark when checking allocation success
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The __compact_finished() function uses low watermark in a check that has to
pass if the direct compaction is to finish and allocation should succeed. This
is too pessimistic, as the allocation will typically use min watermark. It may
happen that during compaction, we drop below the low watermark (due to parallel
activity), but still form the target high-order page. By checking against low
watermark, we might needlessly continue compaction.

Similarly, __compaction_suitable() uses low watermark in a check whether
allocation can succeed without compaction. Again, this is unnecessarily
pessimistic.

After this patch, these check will use direct compactor's alloc_flags to
determine the watermark, which is effectively the min watermark.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a399e7ca4630..4b21a26694a2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1262,7 +1262,7 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 		return COMPACT_CONTINUE;
 
 	/* Compaction run is not finished if the watermark is not met */
-	watermark = low_wmark_pages(zone);
+	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
 
 	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
 							cc->alloc_flags))
@@ -1327,7 +1327,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	if (is_via_compact_memory(order))
 		return COMPACT_CONTINUE;
 
-	watermark = low_wmark_pages(zone);
+	watermark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 	/*
 	 * If watermarks for high-order allocation are already met, there
 	 * should be no need for compaction at all.
@@ -1341,7 +1341,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 * This is because during migration, copies of pages need to be
 	 * allocated and for a short time, the footprint is higher
 	 */
-	watermark += (2UL << order);
+	watermark = low_wmark_pages(zone) + (2UL << order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 				 alloc_flags, wmark_target))
 		return COMPACT_SKIPPED;
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 14/18] mm, compaction: create compact_gap wrapper
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

Compaction uses a watermark gap of (2UL << order) pages at various places and
it's not immediately obvious why. Abstract it through a compact_gap() wrapper
to create a single place with a thorough explanation.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h | 16 ++++++++++++++++
 mm/compaction.c            |  7 +++----
 mm/vmscan.c                |  4 ++--
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 4bef69a83f8f..654cb74418c4 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -58,6 +58,22 @@ enum compact_result {
 
 struct alloc_context; /* in mm/internal.h */
 
+/*
+ * Number of free order-0 pages that should be available above given watermark
+ * to make sure compaction has reasonable chance of not running out of free
+ * pages that it needs to isolate as migration target during its work.
+ */
+static inline unsigned long compact_gap(unsigned int order)
+{
+	/*
+	 * Although all the isolations for migration are temporary, compaction
+	 * may have up to 1 << order pages on its list and then try to split
+	 * an (order - 1) free page. At that point, a gap of 1 << order might
+	 * not be enough, so it's safer to require twice that amount.
+	 */
+	return 2UL << order;
+}
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
diff --git a/mm/compaction.c b/mm/compaction.c
index 4b21a26694a2..bcab680ccb8a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1337,11 +1337,10 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 		return COMPACT_PARTIAL;
 
 	/*
-	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
-	 * This is because during migration, copies of pages need to be
-	 * allocated and for a short time, the footprint is higher
+	 * Watermarks for order-0 must be met for compaction to be able to
+	 * isolate free pages for migration targets.
 	 */
-	watermark = low_wmark_pages(zone) + (2UL << order);
+	watermark = low_wmark_pages(zone) + compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 				 alloc_flags, wmark_target))
 		return COMPACT_SKIPPED;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4a2f4512fca..00034ec9229b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2345,7 +2345,7 @@ static inline bool should_continue_reclaim(struct zone *zone,
 	 * If we have not reclaimed enough pages for compaction and the
 	 * inactive lists are large enough, continue reclaiming
 	 */
-	pages_for_compaction = (2UL << sc->order);
+	pages_for_compaction = compact_gap(sc->order);
 	inactive_lru_pages = zone_page_state(zone, NR_INACTIVE_FILE);
 	if (get_nr_swap_pages() > 0)
 		inactive_lru_pages += zone_page_state(zone, NR_INACTIVE_ANON);
@@ -2472,7 +2472,7 @@ static inline bool compaction_ready(struct zone *zone, int order, int classzone_
 	 */
 	balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
 			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
-	watermark = high_wmark_pages(zone) + balance_gap + (2UL << order);
+	watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
 	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
 
 	/*
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 14/18] mm, compaction: create compact_gap wrapper
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

Compaction uses a watermark gap of (2UL << order) pages at various places and
it's not immediately obvious why. Abstract it through a compact_gap() wrapper
to create a single place with a thorough explanation.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h | 16 ++++++++++++++++
 mm/compaction.c            |  7 +++----
 mm/vmscan.c                |  4 ++--
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 4bef69a83f8f..654cb74418c4 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -58,6 +58,22 @@ enum compact_result {
 
 struct alloc_context; /* in mm/internal.h */
 
+/*
+ * Number of free order-0 pages that should be available above given watermark
+ * to make sure compaction has reasonable chance of not running out of free
+ * pages that it needs to isolate as migration target during its work.
+ */
+static inline unsigned long compact_gap(unsigned int order)
+{
+	/*
+	 * Although all the isolations for migration are temporary, compaction
+	 * may have up to 1 << order pages on its list and then try to split
+	 * an (order - 1) free page. At that point, a gap of 1 << order might
+	 * not be enough, so it's safer to require twice that amount.
+	 */
+	return 2UL << order;
+}
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
diff --git a/mm/compaction.c b/mm/compaction.c
index 4b21a26694a2..bcab680ccb8a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1337,11 +1337,10 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 		return COMPACT_PARTIAL;
 
 	/*
-	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
-	 * This is because during migration, copies of pages need to be
-	 * allocated and for a short time, the footprint is higher
+	 * Watermarks for order-0 must be met for compaction to be able to
+	 * isolate free pages for migration targets.
 	 */
-	watermark = low_wmark_pages(zone) + (2UL << order);
+	watermark = low_wmark_pages(zone) + compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 				 alloc_flags, wmark_target))
 		return COMPACT_SKIPPED;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c4a2f4512fca..00034ec9229b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2345,7 +2345,7 @@ static inline bool should_continue_reclaim(struct zone *zone,
 	 * If we have not reclaimed enough pages for compaction and the
 	 * inactive lists are large enough, continue reclaiming
 	 */
-	pages_for_compaction = (2UL << sc->order);
+	pages_for_compaction = compact_gap(sc->order);
 	inactive_lru_pages = zone_page_state(zone, NR_INACTIVE_FILE);
 	if (get_nr_swap_pages() > 0)
 		inactive_lru_pages += zone_page_state(zone, NR_INACTIVE_ANON);
@@ -2472,7 +2472,7 @@ static inline bool compaction_ready(struct zone *zone, int order, int classzone_
 	 */
 	balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
 			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
-	watermark = high_wmark_pages(zone) + balance_gap + (2UL << order);
+	watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
 	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
 
 	/*
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 15/18] mm, compaction: use proper alloc_flags in __compaction_suitable()
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The __compaction_suitable() function checks the low watermark plus a
compact_gap() gap to decide if there's enough free memory to perform
compaction. This check uses direct compactor's alloc_flags, but that's wrong,
since these flags are not applicable for freepage isolation.

For example, alloc_flags may indicate access to memory reserves, making
compaction proceed, and then fail watermark check during the isolation.

A similar problem exists for ALLOC_CMA, which may be part of alloc_flags, but
not during freepage isolation. In this case however it makes sense to use
ALLOC_CMA both in __compaction_suitable() and __isolate_free_page(), since
there's actually nothing preventing the freepage scanner to isolate from CMA
pageblocks, with the assumption that a page that could be migrated once by
compaction can be migrated also later by CMA allocation. Thus we should count
pages in CMA pageblocks when considering compaction suitability and when
isolating freepages.

To sum up, this patch should remove some false positives from
__compaction_suitable(), and allow compaction to proceed when free pages
required for compaction reside in the CMA pageblocks.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 12 ++++++++++--
 mm/page_alloc.c |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index bcab680ccb8a..4ffa0870192b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1338,11 +1338,19 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
-	 * isolate free pages for migration targets.
+	 * isolate free pages for migration targets. This means that the
+	 * watermark and alloc_flags have to match, or be more pessimistic than
+	 * the check in __isolate_free_page(). We don't use the direct
+	 * compactor's alloc_flags, as they are not relevant for freepage
+	 * isolation. We however do use the direct compactor's classzone_idx to
+	 * skip over zones where lowmem reserves would prevent allocation even
+	 * if compaction succeeds.
+	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
+	 * suitable migration targets
 	 */
 	watermark = low_wmark_pages(zone) + compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
-				 alloc_flags, wmark_target))
+						ALLOC_CMA, wmark_target))
 		return COMPACT_SKIPPED;
 
 	/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dee486936ccf..09dc9db8a7e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2490,7 +2490,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 	if (!is_migrate_isolate(mt)) {
 		/* Obey watermarks as if the page was being allocated */
 		watermark = low_wmark_pages(zone) + (1 << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
 
 		__mod_zone_freepage_state(zone, -(1UL << order), mt);
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 15/18] mm, compaction: use proper alloc_flags in __compaction_suitable()
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The __compaction_suitable() function checks the low watermark plus a
compact_gap() gap to decide if there's enough free memory to perform
compaction. This check uses direct compactor's alloc_flags, but that's wrong,
since these flags are not applicable for freepage isolation.

For example, alloc_flags may indicate access to memory reserves, making
compaction proceed, and then fail watermark check during the isolation.

A similar problem exists for ALLOC_CMA, which may be part of alloc_flags, but
not during freepage isolation. In this case however it makes sense to use
ALLOC_CMA both in __compaction_suitable() and __isolate_free_page(), since
there's actually nothing preventing the freepage scanner to isolate from CMA
pageblocks, with the assumption that a page that could be migrated once by
compaction can be migrated also later by CMA allocation. Thus we should count
pages in CMA pageblocks when considering compaction suitability and when
isolating freepages.

To sum up, this patch should remove some false positives from
__compaction_suitable(), and allow compaction to proceed when free pages
required for compaction reside in the CMA pageblocks.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 12 ++++++++++--
 mm/page_alloc.c |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index bcab680ccb8a..4ffa0870192b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1338,11 +1338,19 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
-	 * isolate free pages for migration targets.
+	 * isolate free pages for migration targets. This means that the
+	 * watermark and alloc_flags have to match, or be more pessimistic than
+	 * the check in __isolate_free_page(). We don't use the direct
+	 * compactor's alloc_flags, as they are not relevant for freepage
+	 * isolation. We however do use the direct compactor's classzone_idx to
+	 * skip over zones where lowmem reserves would prevent allocation even
+	 * if compaction succeeds.
+	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
+	 * suitable migration targets
 	 */
 	watermark = low_wmark_pages(zone) + compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
-				 alloc_flags, wmark_target))
+						ALLOC_CMA, wmark_target))
 		return COMPACT_SKIPPED;
 
 	/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dee486936ccf..09dc9db8a7e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2490,7 +2490,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 	if (!is_migrate_isolate(mt)) {
 		/* Obey watermarks as if the page was being allocated */
 		watermark = low_wmark_pages(zone) + (1 << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
 
 		__mod_zone_freepage_state(zone, -(1UL << order), mt);
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 16/18] mm, compaction: require only min watermarks for non-costly orders
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The __compaction_suitable() function checks the low watermark plus a
compact_gap() gap to decide if there's enough free memory to perform
compaction. Then __isolate_free_page uses low watermark check to decide if
particular free page can be isolated. In the latter case, using low watermark
is needlessly pessimistic, as the free page isolations are only temporary. For
__compaction_suitable() the higher watermark makes sense for high-order
allocations where more freepages increase the chance of success, and we can
typically fail with some order-0 fallback when the system is struggling to
reach that watermark. But for low-order allocation, forming the page should not
be that hard. So using low watermark here might just prevent compaction from
even trying, and eventually lead to OOM killer even if we are above min
watermarks.

So after this patch, we use min watermark for non-costly orders in
__compaction_suitable(), and for all orders in __isolate_free_page().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 6 +++++-
 mm/page_alloc.c | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4ffa0870192b..d854519a5302 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1345,10 +1345,14 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 * isolation. We however do use the direct compactor's classzone_idx to
 	 * skip over zones where lowmem reserves would prevent allocation even
 	 * if compaction succeeds.
+	 * For costly orders, we require low watermark instead of min for
+	 * compaction to proceed to increase its chances.
 	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
 	 * suitable migration targets
 	 */
-	watermark = low_wmark_pages(zone) + compact_gap(order);
+	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
+				low_wmark_pages(zone) : min_wmark_pages(zone);
+	watermark += compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 						ALLOC_CMA, wmark_target))
 		return COMPACT_SKIPPED;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 09dc9db8a7e9..5b4c9e567fc1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2489,7 +2489,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 
 	if (!is_migrate_isolate(mt)) {
 		/* Obey watermarks as if the page was being allocated */
-		watermark = low_wmark_pages(zone) + (1 << order);
+		watermark = min_wmark_pages(zone) + (1UL << order);
 		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 16/18] mm, compaction: require only min watermarks for non-costly orders
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The __compaction_suitable() function checks the low watermark plus a
compact_gap() gap to decide if there's enough free memory to perform
compaction. Then __isolate_free_page uses low watermark check to decide if
particular free page can be isolated. In the latter case, using low watermark
is needlessly pessimistic, as the free page isolations are only temporary. For
__compaction_suitable() the higher watermark makes sense for high-order
allocations where more freepages increase the chance of success, and we can
typically fail with some order-0 fallback when the system is struggling to
reach that watermark. But for low-order allocation, forming the page should not
be that hard. So using low watermark here might just prevent compaction from
even trying, and eventually lead to OOM killer even if we are above min
watermarks.

So after this patch, we use min watermark for non-costly orders in
__compaction_suitable(), and for all orders in __isolate_free_page().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 6 +++++-
 mm/page_alloc.c | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4ffa0870192b..d854519a5302 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1345,10 +1345,14 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 * isolation. We however do use the direct compactor's classzone_idx to
 	 * skip over zones where lowmem reserves would prevent allocation even
 	 * if compaction succeeds.
+	 * For costly orders, we require low watermark instead of min for
+	 * compaction to proceed to increase its chances.
 	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
 	 * suitable migration targets
 	 */
-	watermark = low_wmark_pages(zone) + compact_gap(order);
+	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
+				low_wmark_pages(zone) : min_wmark_pages(zone);
+	watermark += compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 						ALLOC_CMA, wmark_target))
 		return COMPACT_SKIPPED;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 09dc9db8a7e9..5b4c9e567fc1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2489,7 +2489,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 
 	if (!is_migrate_isolate(mt)) {
 		/* Obey watermarks as if the page was being allocated */
-		watermark = low_wmark_pages(zone) + (1 << order);
+		watermark = min_wmark_pages(zone) + (1UL << order);
 		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
 
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 17/18] mm, vmscan: make compaction_ready() more accurate and readable
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The compaction_ready() is used during direct reclaim for costly order
allocations to skip reclaim for zones where compaction should be attempted
instead. It's combining the standard compaction_suitable() check with its own
watermark check based on high watermark with extra gap, and the result is
confusing at best.

This patch attempts to better structure and document the checks involved.
First, compaction_suitable() can determine that the allocation should either
succeed already, or that compaction doesn't have enough free pages to proceed.
The third possibility is that compaction has enough free pages, but we still
decide to reclaim first - unless we are already above the high watermark with
gap.  This does not mean that the reclaim will actually reach this watermark
during single attempt, this is rather an over-reclaim protection. So document
the code as such. The check for compaction_deferred() is removed completely, as
it in fact had no proper role here.

The result after this patch is mainly a less confusing code. We also skip some
over-reclaim in cases where the allocation should already succed.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/vmscan.c | 49 +++++++++++++++++++++++--------------------------
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 00034ec9229b..640d2e615c36 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2456,40 +2456,37 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 }
 
 /*
- * Returns true if compaction should go ahead for a high-order request, or
- * the high-order allocation would succeed without compaction.
+ * Returns true if compaction should go ahead for a costly-order request, or
+ * the allocation would already succeed without compaction. Return false if we
+ * should reclaim first.
  */
 static inline bool compaction_ready(struct zone *zone, int order, int classzone_idx)
 {
-	unsigned long balance_gap, watermark;
-	bool watermark_ok;
+	unsigned long watermark;
+	enum compact_result suitable;
 
-	/*
-	 * Compaction takes time to run and there are potentially other
-	 * callers using the pages just freed. Continue reclaiming until
-	 * there is a buffer of free pages available to give compaction
-	 * a reasonable chance of completing and allocating the page
-	 */
-	balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
-			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
-	watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
-	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
-
-	/*
-	 * If compaction is deferred, reclaim up to a point where
-	 * compaction will have a chance of success when re-enabled
-	 */
-	if (compaction_deferred(zone, order))
-		return watermark_ok;
+	suitable = compaction_suitable(zone, order, 0, classzone_idx);
+	if (suitable == COMPACT_PARTIAL)
+		/* Allocation should succeed already. Don't reclaim. */
+		return true;
+	if (suitable == COMPACT_SKIPPED)
+		/* Compaction cannot yet proceed. Do reclaim. */
+		return false;
 
 	/*
-	 * If compaction is not ready to start and allocation is not likely
-	 * to succeed without it, then keep reclaiming.
+	 * Compaction is already possible, but it takes time to run and there
+	 * are potentially other callers using the pages just freed. So proceed
+	 * with reclaim to make a buffer of free pages available to give
+	 * compaction a reasonable chance of completing and allocating the page.
+	 * Note that we won't actually reclaim the whole buffer in one attempt
+	 * as the target watermark in should_continue_reclaim() is lower. But if
+	 * we are already above the high+gap watermark, don't reclaim at all.
 	 */
-	if (compaction_suitable(zone, order, 0, classzone_idx) == COMPACT_SKIPPED)
-		return false;
+	watermark = high_wmark_pages(zone) + compact_gap(order);
+	watermark += min(low_wmark_pages(zone), DIV_ROUND_UP(
+			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
 
-	return watermark_ok;
+	return zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
 }
 
 /*
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 17/18] mm, vmscan: make compaction_ready() more accurate and readable
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The compaction_ready() is used during direct reclaim for costly order
allocations to skip reclaim for zones where compaction should be attempted
instead. It's combining the standard compaction_suitable() check with its own
watermark check based on high watermark with extra gap, and the result is
confusing at best.

This patch attempts to better structure and document the checks involved.
First, compaction_suitable() can determine that the allocation should either
succeed already, or that compaction doesn't have enough free pages to proceed.
The third possibility is that compaction has enough free pages, but we still
decide to reclaim first - unless we are already above the high watermark with
gap.  This does not mean that the reclaim will actually reach this watermark
during single attempt, this is rather an over-reclaim protection. So document
the code as such. The check for compaction_deferred() is removed completely, as
it in fact had no proper role here.

The result after this patch is mainly a less confusing code. We also skip some
over-reclaim in cases where the allocation should already succed.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/vmscan.c | 49 +++++++++++++++++++++++--------------------------
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 00034ec9229b..640d2e615c36 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2456,40 +2456,37 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 }
 
 /*
- * Returns true if compaction should go ahead for a high-order request, or
- * the high-order allocation would succeed without compaction.
+ * Returns true if compaction should go ahead for a costly-order request, or
+ * the allocation would already succeed without compaction. Return false if we
+ * should reclaim first.
  */
 static inline bool compaction_ready(struct zone *zone, int order, int classzone_idx)
 {
-	unsigned long balance_gap, watermark;
-	bool watermark_ok;
+	unsigned long watermark;
+	enum compact_result suitable;
 
-	/*
-	 * Compaction takes time to run and there are potentially other
-	 * callers using the pages just freed. Continue reclaiming until
-	 * there is a buffer of free pages available to give compaction
-	 * a reasonable chance of completing and allocating the page
-	 */
-	balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
-			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
-	watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
-	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
-
-	/*
-	 * If compaction is deferred, reclaim up to a point where
-	 * compaction will have a chance of success when re-enabled
-	 */
-	if (compaction_deferred(zone, order))
-		return watermark_ok;
+	suitable = compaction_suitable(zone, order, 0, classzone_idx);
+	if (suitable == COMPACT_PARTIAL)
+		/* Allocation should succeed already. Don't reclaim. */
+		return true;
+	if (suitable == COMPACT_SKIPPED)
+		/* Compaction cannot yet proceed. Do reclaim. */
+		return false;
 
 	/*
-	 * If compaction is not ready to start and allocation is not likely
-	 * to succeed without it, then keep reclaiming.
+	 * Compaction is already possible, but it takes time to run and there
+	 * are potentially other callers using the pages just freed. So proceed
+	 * with reclaim to make a buffer of free pages available to give
+	 * compaction a reasonable chance of completing and allocating the page.
+	 * Note that we won't actually reclaim the whole buffer in one attempt
+	 * as the target watermark in should_continue_reclaim() is lower. But if
+	 * we are already above the high+gap watermark, don't reclaim at all.
 	 */
-	if (compaction_suitable(zone, order, 0, classzone_idx) == COMPACT_SKIPPED)
-		return false;
+	watermark = high_wmark_pages(zone) + compact_gap(order);
+	watermark += min(low_wmark_pages(zone), DIV_ROUND_UP(
+			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
 
-	return watermark_ok;
+	return zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
 }
 
 /*
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim()
  2016-05-31 13:08 ` Vlastimil Babka
@ 2016-05-31 13:08   ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The should_continue_reclaim() function decides during direct reclaim/compaction
whether shrink_zone() should continue reclaming, or whether compaction is ready
to proceed in that zone. This relies mainly on the compaction_suitable() check,
but by passing a zero classzone_idx, there can be false positives and reclaim
terminates prematurely. Fix this by passing proper classzone_idx.

Additionally, the function checks whether (2UL << pages) were reclaimed. This
however overlaps with the same gap used by compaction_suitable(), and since the
number sc->nr_reclaimed is accumulated over all reclaimed zones, it doesn't
make much sense for deciding about a given single zone anyway. So just drop
this code.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/vmscan.c | 31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 640d2e615c36..391e5d2c4e32 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2309,11 +2309,9 @@ static bool in_reclaim_compaction(struct scan_control *sc)
 static inline bool should_continue_reclaim(struct zone *zone,
 					unsigned long nr_reclaimed,
 					unsigned long nr_scanned,
-					struct scan_control *sc)
+					struct scan_control *sc,
+					int classzone_idx)
 {
-	unsigned long pages_for_compaction;
-	unsigned long inactive_lru_pages;
-
 	/* If not in reclaim/compaction mode, stop */
 	if (!in_reclaim_compaction(sc))
 		return false;
@@ -2341,20 +2339,8 @@ static inline bool should_continue_reclaim(struct zone *zone,
 			return false;
 	}
 
-	/*
-	 * If we have not reclaimed enough pages for compaction and the
-	 * inactive lists are large enough, continue reclaiming
-	 */
-	pages_for_compaction = compact_gap(sc->order);
-	inactive_lru_pages = zone_page_state(zone, NR_INACTIVE_FILE);
-	if (get_nr_swap_pages() > 0)
-		inactive_lru_pages += zone_page_state(zone, NR_INACTIVE_ANON);
-	if (sc->nr_reclaimed < pages_for_compaction &&
-			inactive_lru_pages > pages_for_compaction)
-		return true;
-
 	/* If compaction would go ahead or the allocation would succeed, stop */
-	switch (compaction_suitable(zone, sc->order, 0, 0)) {
+	switch (compaction_suitable(zone, sc->order, 0, classzone_idx)) {
 	case COMPACT_PARTIAL:
 	case COMPACT_CONTINUE:
 		return false;
@@ -2364,11 +2350,12 @@ static inline bool should_continue_reclaim(struct zone *zone,
 }
 
 static bool shrink_zone(struct zone *zone, struct scan_control *sc,
-			bool is_classzone)
+			int classzone_idx)
 {
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long nr_reclaimed, nr_scanned;
 	bool reclaimable = false;
+	bool is_classzone = (classzone_idx == zone_idx(zone));
 
 	do {
 		struct mem_cgroup *root = sc->target_mem_cgroup;
@@ -2450,7 +2437,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 			reclaimable = true;
 
 	} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
-					 sc->nr_scanned - nr_scanned, sc));
+			 sc->nr_scanned - nr_scanned, sc, classzone_idx));
 
 	return reclaimable;
 }
@@ -2580,7 +2567,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			/* need some check for avoid more shrink_zone() */
 		}
 
-		shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
+		shrink_zone(zone, sc, classzone_idx);
 	}
 
 	/*
@@ -3076,7 +3063,7 @@ static bool kswapd_shrink_zone(struct zone *zone,
 						balance_gap, classzone_idx))
 		return true;
 
-	shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
+	shrink_zone(zone, sc, classzone_idx);
 
 	clear_bit(ZONE_WRITEBACK, &zone->flags);
 
@@ -3678,7 +3665,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * priorities until we have enough memory freed.
 		 */
 		do {
-			shrink_zone(zone, &sc, true);
+			shrink_zone(zone, &sc, zone_idx(zone));
 		} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
 	}
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim()
@ 2016-05-31 13:08   ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-05-31 13:08 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, Vlastimil Babka

The should_continue_reclaim() function decides during direct reclaim/compaction
whether shrink_zone() should continue reclaming, or whether compaction is ready
to proceed in that zone. This relies mainly on the compaction_suitable() check,
but by passing a zero classzone_idx, there can be false positives and reclaim
terminates prematurely. Fix this by passing proper classzone_idx.

Additionally, the function checks whether (2UL << pages) were reclaimed. This
however overlaps with the same gap used by compaction_suitable(), and since the
number sc->nr_reclaimed is accumulated over all reclaimed zones, it doesn't
make much sense for deciding about a given single zone anyway. So just drop
this code.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/vmscan.c | 31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 640d2e615c36..391e5d2c4e32 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2309,11 +2309,9 @@ static bool in_reclaim_compaction(struct scan_control *sc)
 static inline bool should_continue_reclaim(struct zone *zone,
 					unsigned long nr_reclaimed,
 					unsigned long nr_scanned,
-					struct scan_control *sc)
+					struct scan_control *sc,
+					int classzone_idx)
 {
-	unsigned long pages_for_compaction;
-	unsigned long inactive_lru_pages;
-
 	/* If not in reclaim/compaction mode, stop */
 	if (!in_reclaim_compaction(sc))
 		return false;
@@ -2341,20 +2339,8 @@ static inline bool should_continue_reclaim(struct zone *zone,
 			return false;
 	}
 
-	/*
-	 * If we have not reclaimed enough pages for compaction and the
-	 * inactive lists are large enough, continue reclaiming
-	 */
-	pages_for_compaction = compact_gap(sc->order);
-	inactive_lru_pages = zone_page_state(zone, NR_INACTIVE_FILE);
-	if (get_nr_swap_pages() > 0)
-		inactive_lru_pages += zone_page_state(zone, NR_INACTIVE_ANON);
-	if (sc->nr_reclaimed < pages_for_compaction &&
-			inactive_lru_pages > pages_for_compaction)
-		return true;
-
 	/* If compaction would go ahead or the allocation would succeed, stop */
-	switch (compaction_suitable(zone, sc->order, 0, 0)) {
+	switch (compaction_suitable(zone, sc->order, 0, classzone_idx)) {
 	case COMPACT_PARTIAL:
 	case COMPACT_CONTINUE:
 		return false;
@@ -2364,11 +2350,12 @@ static inline bool should_continue_reclaim(struct zone *zone,
 }
 
 static bool shrink_zone(struct zone *zone, struct scan_control *sc,
-			bool is_classzone)
+			int classzone_idx)
 {
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long nr_reclaimed, nr_scanned;
 	bool reclaimable = false;
+	bool is_classzone = (classzone_idx == zone_idx(zone));
 
 	do {
 		struct mem_cgroup *root = sc->target_mem_cgroup;
@@ -2450,7 +2437,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
 			reclaimable = true;
 
 	} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
-					 sc->nr_scanned - nr_scanned, sc));
+			 sc->nr_scanned - nr_scanned, sc, classzone_idx));
 
 	return reclaimable;
 }
@@ -2580,7 +2567,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			/* need some check for avoid more shrink_zone() */
 		}
 
-		shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
+		shrink_zone(zone, sc, classzone_idx);
 	}
 
 	/*
@@ -3076,7 +3063,7 @@ static bool kswapd_shrink_zone(struct zone *zone,
 						balance_gap, classzone_idx))
 		return true;
 
-	shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
+	shrink_zone(zone, sc, classzone_idx);
 
 	clear_bit(ZONE_WRITEBACK, &zone->flags);
 
@@ -3678,7 +3665,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * priorities until we have enough memory freed.
 		 */
 		do {
-			shrink_zone(zone, &sc, true);
+			shrink_zone(zone, &sc, zone_idx(zone));
 		} while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0);
 	}
 
-- 
2.8.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 03/18] mm, page_alloc: don't retry initial attempt in slowpath
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 13:26     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:03, Vlastimil Babka wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index da3a62a94b4a..9f83259a18a8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3367,10 +3367,9 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>  	bool drained = false;
>  
>  	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
> -	if (unlikely(!(*did_some_progress)))
> -		return NULL;
>  
>  retry:
> +	/* We attempt even when no progress, as kswapd might have done some */
>  	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);

Is this really likely to happen, though? Sure we might have last few
reclaimable pages on the LRU lists but I am not sure this would make a
large difference then.

That being said, I do not think this is harmful but I find it a bit
weird to invoke a reclaim and then ignore the feedback... Will leave the
decision up to you but the original patch seemed neater.

>  
>  	/*
> @@ -3378,7 +3377,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>  	 * pages are pinned on the per-cpu lists or in high alloc reserves.
>  	 * Shrink them them and try again
>  	 */
> -	if (!page && !drained) {
> +	if (!page && *did_some_progress && !drained) {
>  		unreserve_highatomic_pageblock(ac);
>  		drain_all_pages(NULL);
>  		drained = true;

I do not remember this in the previous version. Why shouldn't we
unreserve highatomic reserves when there was no progress?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 03/18] mm, page_alloc: don't retry initial attempt in slowpath
@ 2016-06-01 13:26     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:03, Vlastimil Babka wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index da3a62a94b4a..9f83259a18a8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3367,10 +3367,9 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>  	bool drained = false;
>  
>  	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
> -	if (unlikely(!(*did_some_progress)))
> -		return NULL;
>  
>  retry:
> +	/* We attempt even when no progress, as kswapd might have done some */
>  	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);

Is this really likely to happen, though? Sure we might have last few
reclaimable pages on the LRU lists but I am not sure this would make a
large difference then.

That being said, I do not think this is harmful but I find it a bit
weird to invoke a reclaim and then ignore the feedback... Will leave the
decision up to you but the original patch seemed neater.

>  
>  	/*
> @@ -3378,7 +3377,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>  	 * pages are pinned on the per-cpu lists or in high alloc reserves.
>  	 * Shrink them them and try again
>  	 */
> -	if (!page && !drained) {
> +	if (!page && *did_some_progress && !drained) {
>  		unreserve_highatomic_pageblock(ac);
>  		drain_all_pages(NULL);
>  		drained = true;

I do not remember this in the previous version. Why shouldn't we
unreserve highatomic reserves when there was no progress?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 06/18] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 13:33     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:06, Vlastimil Babka wrote:
> After the previous patch, we can distinguish costly allocations that should be
> really lightweight, such as THP page faults, with __GFP_NORETRY. This means we
> don't need to recognize khugepaged allocations via PF_KTHREAD anymore. We can
> also change THP page faults in areas where madvise(MADV_HUGEPAGE) was used to
> try as hard as khugepaged, as the process has indicated that it benefits from
> THP's and is willing to pay some initial latency costs.

Relying on PF_KTHREAD was an ugly hack so it is nice to see it go away.

> We can also make the flags handling less cryptic by distinguishing
> GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
> GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding __GFP_NORETRY
> or __GFP_KSWAPD_RECLAIM is done where needed.

I like it for some reason ;)
 
> The patch effectively changes the current GFP_TRANSHUGE users as follows:
> 
> * get_huge_zero_page() - the zero page lifetime should be relatively long and
>   it's shared by multiple users, so it's worth spending some effort on it.
>   We use GFP_TRANSHUGE, and __GFP_NORETRY is not added. This also restores
>   direct reclaim to this allocation, which was unintentionally removed by
>   commit e4a49efe4e7e ("mm: thp: set THP defrag by default to madvise and add
>   a stall-free defrag option")
> 
> * alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency is not
>   an issue. So if khugepaged "defrag" is enabled (the default), do reclaim
>   via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the PF_KTHREAD check
>   from page alloc.
>   As a side-effect, khugepaged will now no longer check if the initial
>   compaction was deferred or contended. This is OK, as khugepaged sleep times
>   between collapsion attempts are long enough to prevent noticeable disruption,
>   so we should allow it to spend some effort.
> 
> * migrate_misplaced_transhuge_page() - already was masking out __GFP_RECLAIM,
>   so just convert to GFP_TRANSHUGE_LIGHT which is equivalent.
> 
> * alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise) are
>   now allocating without __GFP_NORETRY. Other vma's keep using __GFP_NORETRY
>   if direct reclaim/compaction is at all allowed (by default it's allowed only
>   for madvised vma's). The rest is conversion to GFP_TRANSHUGE(_LIGHT).
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/gfp.h            | 14 ++++++++------
>  include/trace/events/mmflags.h |  1 +
>  mm/huge_memory.c               | 27 +++++++++++++++------------
>  mm/migrate.c                   |  2 +-
>  mm/page_alloc.c                |  6 ++----
>  tools/perf/builtin-kmem.c      |  1 +
>  6 files changed, 28 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 570383a41853..a6ebe0dccd67 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -238,9 +238,11 @@ struct vm_area_struct;
>   *   are expected to be movable via page reclaim or page migration. Typically,
>   *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.
>   *
> - * GFP_TRANSHUGE is used for THP allocations. They are compound allocations
> - *   that will fail quickly if memory is not available and will not wake
> - *   kswapd on failure.
> + * GFP_TRANSHUGE and GFP_TRANSHUGE_LIGHT are used for THP allocations. They are
> + *   compound allocations that will generally fail quickly if memory is not
> + *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
> + *   version does not attempt reclaim/compaction at all and is by default used
> + *   in page fault path, while the non-light is used by khugepaged.
>   */
>  #define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
>  #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> @@ -255,9 +257,9 @@ struct vm_area_struct;
>  #define GFP_DMA32	__GFP_DMA32
>  #define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
>  #define GFP_HIGHUSER_MOVABLE	(GFP_HIGHUSER | __GFP_MOVABLE)
> -#define GFP_TRANSHUGE	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> -			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
> -			 ~__GFP_RECLAIM)
> +#define GFP_TRANSHUGE_LIGHT	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> +			 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
> +#define GFP_TRANSHUGE	(GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
>  
>  /* Convert GFP flags to their corresponding migrate type */
>  #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index 43cedbf0c759..5a81ab48a2fb 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -11,6 +11,7 @@
>  
>  #define __def_gfpflag_names						\
>  	{(unsigned long)GFP_TRANSHUGE,		"GFP_TRANSHUGE"},	\
> +	{(unsigned long)GFP_TRANSHUGE_LIGHT,	"GFP_TRANSHUGE_LIGHT"}, \
>  	{(unsigned long)GFP_HIGHUSER_MOVABLE,	"GFP_HIGHUSER_MOVABLE"},\
>  	{(unsigned long)GFP_HIGHUSER,		"GFP_HIGHUSER"},	\
>  	{(unsigned long)GFP_USER,		"GFP_USER"},		\
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9ed58530f695..37db58802385 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -864,29 +864,32 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  }
>  
>  /*
> - * If THP is set to always then directly reclaim/compact as necessary
> - * If set to defer then do no reclaim and defer to khugepaged
> + * If THP defrag is set to always then directly reclaim/compact as necessary
> + * If set to defer then do only background reclaim/compact and defer to khugepaged
>   * If set to madvise and the VMA is flagged then directly reclaim/compact
> + * When direct reclaim/compact is allowed, don't retry except for flagged VMA's
>   */
>  static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
>  {
> -	gfp_t reclaim_flags = 0;
> +	bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
>  
> -	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags) &&
> -	    (vma->vm_flags & VM_HUGEPAGE))
> -		reclaim_flags = __GFP_DIRECT_RECLAIM;
> -	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
> -		reclaim_flags = __GFP_KSWAPD_RECLAIM;
> -	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
> -		reclaim_flags = __GFP_DIRECT_RECLAIM;
> +	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
> +				&transparent_hugepage_flags) && vma_madvised)
> +		return GFP_TRANSHUGE;
> +	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
> +						&transparent_hugepage_flags))
> +		return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
> +	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
> +						&transparent_hugepage_flags))
> +		return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
>  
> -	return GFP_TRANSHUGE | reclaim_flags;
> +	return GFP_TRANSHUGE_LIGHT;
>  }
>  
>  /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
>  static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
>  {
> -	return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
> +	return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
>  }
>  
>  /* Caller must hold page table lock. */
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 9baf41c877ff..d09e985f644d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1772,7 +1772,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>  		goto out_dropref;
>  
>  	new_page = alloc_pages_node(node,
> -		(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
> +		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>  		HPAGE_PMD_ORDER);
>  	if (!new_page)
>  		goto out_fail;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 529999c48333..d7fc4c86e077 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3634,11 +3634,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  			/*
>  			 * Looks like reclaim/compaction is worth trying, but
>  			 * sync compaction could be very expensive, so keep
> -			 * using async compaction, unless it's khugepaged
> -			 * trying to collapse.
> +			 * using async compaction.
>  			 */
> -			if (!(current->flags & PF_KTHREAD))
> -				migration_mode = MIGRATE_ASYNC;
> +			migration_mode = MIGRATE_ASYNC;
>  		}
>  	}
>  
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 58adfee230de..5f67a3bd98a5 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -608,6 +608,7 @@ static const struct {
>  	const char *compact;
>  } gfp_compact_table[] = {
>  	{ "GFP_TRANSHUGE",		"THP" },
> +	{ "GFP_TRANSHUGE_LIGHT",	"THL" },
>  	{ "GFP_HIGHUSER_MOVABLE",	"HUM" },
>  	{ "GFP_HIGHUSER",		"HU" },
>  	{ "GFP_USER",			"U" },
> -- 
> 2.8.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 06/18] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
@ 2016-06-01 13:33     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:06, Vlastimil Babka wrote:
> After the previous patch, we can distinguish costly allocations that should be
> really lightweight, such as THP page faults, with __GFP_NORETRY. This means we
> don't need to recognize khugepaged allocations via PF_KTHREAD anymore. We can
> also change THP page faults in areas where madvise(MADV_HUGEPAGE) was used to
> try as hard as khugepaged, as the process has indicated that it benefits from
> THP's and is willing to pay some initial latency costs.

Relying on PF_KTHREAD was an ugly hack so it is nice to see it go away.

> We can also make the flags handling less cryptic by distinguishing
> GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
> GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding __GFP_NORETRY
> or __GFP_KSWAPD_RECLAIM is done where needed.

I like it for some reason ;)
 
> The patch effectively changes the current GFP_TRANSHUGE users as follows:
> 
> * get_huge_zero_page() - the zero page lifetime should be relatively long and
>   it's shared by multiple users, so it's worth spending some effort on it.
>   We use GFP_TRANSHUGE, and __GFP_NORETRY is not added. This also restores
>   direct reclaim to this allocation, which was unintentionally removed by
>   commit e4a49efe4e7e ("mm: thp: set THP defrag by default to madvise and add
>   a stall-free defrag option")
> 
> * alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency is not
>   an issue. So if khugepaged "defrag" is enabled (the default), do reclaim
>   via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the PF_KTHREAD check
>   from page alloc.
>   As a side-effect, khugepaged will now no longer check if the initial
>   compaction was deferred or contended. This is OK, as khugepaged sleep times
>   between collapsion attempts are long enough to prevent noticeable disruption,
>   so we should allow it to spend some effort.
> 
> * migrate_misplaced_transhuge_page() - already was masking out __GFP_RECLAIM,
>   so just convert to GFP_TRANSHUGE_LIGHT which is equivalent.
> 
> * alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise) are
>   now allocating without __GFP_NORETRY. Other vma's keep using __GFP_NORETRY
>   if direct reclaim/compaction is at all allowed (by default it's allowed only
>   for madvised vma's). The rest is conversion to GFP_TRANSHUGE(_LIGHT).
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/gfp.h            | 14 ++++++++------
>  include/trace/events/mmflags.h |  1 +
>  mm/huge_memory.c               | 27 +++++++++++++++------------
>  mm/migrate.c                   |  2 +-
>  mm/page_alloc.c                |  6 ++----
>  tools/perf/builtin-kmem.c      |  1 +
>  6 files changed, 28 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 570383a41853..a6ebe0dccd67 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -238,9 +238,11 @@ struct vm_area_struct;
>   *   are expected to be movable via page reclaim or page migration. Typically,
>   *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.
>   *
> - * GFP_TRANSHUGE is used for THP allocations. They are compound allocations
> - *   that will fail quickly if memory is not available and will not wake
> - *   kswapd on failure.
> + * GFP_TRANSHUGE and GFP_TRANSHUGE_LIGHT are used for THP allocations. They are
> + *   compound allocations that will generally fail quickly if memory is not
> + *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
> + *   version does not attempt reclaim/compaction at all and is by default used
> + *   in page fault path, while the non-light is used by khugepaged.
>   */
>  #define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
>  #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> @@ -255,9 +257,9 @@ struct vm_area_struct;
>  #define GFP_DMA32	__GFP_DMA32
>  #define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
>  #define GFP_HIGHUSER_MOVABLE	(GFP_HIGHUSER | __GFP_MOVABLE)
> -#define GFP_TRANSHUGE	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> -			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
> -			 ~__GFP_RECLAIM)
> +#define GFP_TRANSHUGE_LIGHT	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
> +			 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
> +#define GFP_TRANSHUGE	(GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
>  
>  /* Convert GFP flags to their corresponding migrate type */
>  #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index 43cedbf0c759..5a81ab48a2fb 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -11,6 +11,7 @@
>  
>  #define __def_gfpflag_names						\
>  	{(unsigned long)GFP_TRANSHUGE,		"GFP_TRANSHUGE"},	\
> +	{(unsigned long)GFP_TRANSHUGE_LIGHT,	"GFP_TRANSHUGE_LIGHT"}, \
>  	{(unsigned long)GFP_HIGHUSER_MOVABLE,	"GFP_HIGHUSER_MOVABLE"},\
>  	{(unsigned long)GFP_HIGHUSER,		"GFP_HIGHUSER"},	\
>  	{(unsigned long)GFP_USER,		"GFP_USER"},		\
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9ed58530f695..37db58802385 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -864,29 +864,32 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  }
>  
>  /*
> - * If THP is set to always then directly reclaim/compact as necessary
> - * If set to defer then do no reclaim and defer to khugepaged
> + * If THP defrag is set to always then directly reclaim/compact as necessary
> + * If set to defer then do only background reclaim/compact and defer to khugepaged
>   * If set to madvise and the VMA is flagged then directly reclaim/compact
> + * When direct reclaim/compact is allowed, don't retry except for flagged VMA's
>   */
>  static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
>  {
> -	gfp_t reclaim_flags = 0;
> +	bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
>  
> -	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags) &&
> -	    (vma->vm_flags & VM_HUGEPAGE))
> -		reclaim_flags = __GFP_DIRECT_RECLAIM;
> -	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
> -		reclaim_flags = __GFP_KSWAPD_RECLAIM;
> -	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
> -		reclaim_flags = __GFP_DIRECT_RECLAIM;
> +	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
> +				&transparent_hugepage_flags) && vma_madvised)
> +		return GFP_TRANSHUGE;
> +	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
> +						&transparent_hugepage_flags))
> +		return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
> +	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
> +						&transparent_hugepage_flags))
> +		return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
>  
> -	return GFP_TRANSHUGE | reclaim_flags;
> +	return GFP_TRANSHUGE_LIGHT;
>  }
>  
>  /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
>  static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
>  {
> -	return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
> +	return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
>  }
>  
>  /* Caller must hold page table lock. */
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 9baf41c877ff..d09e985f644d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1772,7 +1772,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>  		goto out_dropref;
>  
>  	new_page = alloc_pages_node(node,
> -		(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
> +		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>  		HPAGE_PMD_ORDER);
>  	if (!new_page)
>  		goto out_fail;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 529999c48333..d7fc4c86e077 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3634,11 +3634,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  			/*
>  			 * Looks like reclaim/compaction is worth trying, but
>  			 * sync compaction could be very expensive, so keep
> -			 * using async compaction, unless it's khugepaged
> -			 * trying to collapse.
> +			 * using async compaction.
>  			 */
> -			if (!(current->flags & PF_KTHREAD))
> -				migration_mode = MIGRATE_ASYNC;
> +			migration_mode = MIGRATE_ASYNC;
>  		}
>  	}
>  
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 58adfee230de..5f67a3bd98a5 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -608,6 +608,7 @@ static const struct {
>  	const char *compact;
>  } gfp_compact_table[] = {
>  	{ "GFP_TRANSHUGE",		"THP" },
> +	{ "GFP_TRANSHUGE_LIGHT",	"THL" },
>  	{ "GFP_HIGHUSER_MOVABLE",	"HUM" },
>  	{ "GFP_HIGHUSER",		"HU" },
>  	{ "GFP_USER",			"U" },
> -- 
> 2.8.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 10/18] mm, compaction: cleanup unused functions
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 13:45     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:10, Vlastimil Babka wrote:
> Since kswapd compaction moved to kcompactd, compact_pgdat() is not called
> anymore, so we remove it. The only caller of __compact_pgdat() is
> compact_node(), so we merge them and remove code that was only reachable from
> kswapd.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/compaction.h |  5 ----
>  mm/compaction.c            | 60 +++++++++++++---------------------------------
>  2 files changed, 17 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index b3bb66e7ce55..22a5fb9c509c 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -70,7 +70,6 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
>  extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
>  		unsigned int order, unsigned int alloc_flags,
>  		const struct alloc_context *ac, enum compact_priority prio);
> -extern void compact_pgdat(pg_data_t *pgdat, int order);
>  extern void reset_isolation_suitable(pg_data_t *pgdat);
>  extern enum compact_result compaction_suitable(struct zone *zone, int order,
>  		unsigned int alloc_flags, int classzone_idx);
> @@ -154,10 +153,6 @@ extern void kcompactd_stop(int nid);
>  extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
>  
>  #else
> -static inline void compact_pgdat(pg_data_t *pgdat, int order)
> -{
> -}
> -
>  static inline void reset_isolation_suitable(pg_data_t *pgdat)
>  {
>  }
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 78c99300b911..af50f20de369 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1678,10 +1678,18 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  
>  
>  /* Compact all zones within a node */
> -static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
> +static void compact_node(int nid)
>  {
> +	pg_data_t *pgdat = NODE_DATA(nid);
>  	int zoneid;
>  	struct zone *zone;
> +	struct compact_control cc = {
> +		.order = -1,
> +		.mode = MIGRATE_SYNC,
> +		.ignore_skip_hint = true,
> +		.whole_zone = true,
> +	};
> +
>  
>  	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
>  
> @@ -1689,53 +1697,19 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
>  		if (!populated_zone(zone))
>  			continue;
>  
> -		cc->nr_freepages = 0;
> -		cc->nr_migratepages = 0;
> -		cc->zone = zone;
> -		INIT_LIST_HEAD(&cc->freepages);
> -		INIT_LIST_HEAD(&cc->migratepages);
> -
> -		if (is_via_compact_memory(cc->order) ||
> -				!compaction_deferred(zone, cc->order))
> -			compact_zone(zone, cc);
> -
> -		VM_BUG_ON(!list_empty(&cc->freepages));
> -		VM_BUG_ON(!list_empty(&cc->migratepages));
> +		cc.nr_freepages = 0;
> +		cc.nr_migratepages = 0;
> +		cc.zone = zone;
> +		INIT_LIST_HEAD(&cc.freepages);
> +		INIT_LIST_HEAD(&cc.migratepages);
>  
> -		if (is_via_compact_memory(cc->order))
> -			continue;
> +		compact_zone(zone, &cc);
>  
> -		if (zone_watermark_ok(zone, cc->order,
> -				low_wmark_pages(zone), 0, 0))
> -			compaction_defer_reset(zone, cc->order, false);
> +		VM_BUG_ON(!list_empty(&cc.freepages));
> +		VM_BUG_ON(!list_empty(&cc.migratepages));
>  	}
>  }
>  
> -void compact_pgdat(pg_data_t *pgdat, int order)
> -{
> -	struct compact_control cc = {
> -		.order = order,
> -		.mode = MIGRATE_ASYNC,
> -	};
> -
> -	if (!order)
> -		return;
> -
> -	__compact_pgdat(pgdat, &cc);
> -}
> -
> -static void compact_node(int nid)
> -{
> -	struct compact_control cc = {
> -		.order = -1,
> -		.mode = MIGRATE_SYNC,
> -		.ignore_skip_hint = true,
> -		.whole_zone = true,
> -	};
> -
> -	__compact_pgdat(NODE_DATA(nid), &cc);
> -}
> -
>  /* Compact all nodes in the system */
>  static void compact_nodes(void)
>  {
> -- 
> 2.8.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 10/18] mm, compaction: cleanup unused functions
@ 2016-06-01 13:45     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:10, Vlastimil Babka wrote:
> Since kswapd compaction moved to kcompactd, compact_pgdat() is not called
> anymore, so we remove it. The only caller of __compact_pgdat() is
> compact_node(), so we merge them and remove code that was only reachable from
> kswapd.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/compaction.h |  5 ----
>  mm/compaction.c            | 60 +++++++++++++---------------------------------
>  2 files changed, 17 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index b3bb66e7ce55..22a5fb9c509c 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -70,7 +70,6 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
>  extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
>  		unsigned int order, unsigned int alloc_flags,
>  		const struct alloc_context *ac, enum compact_priority prio);
> -extern void compact_pgdat(pg_data_t *pgdat, int order);
>  extern void reset_isolation_suitable(pg_data_t *pgdat);
>  extern enum compact_result compaction_suitable(struct zone *zone, int order,
>  		unsigned int alloc_flags, int classzone_idx);
> @@ -154,10 +153,6 @@ extern void kcompactd_stop(int nid);
>  extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
>  
>  #else
> -static inline void compact_pgdat(pg_data_t *pgdat, int order)
> -{
> -}
> -
>  static inline void reset_isolation_suitable(pg_data_t *pgdat)
>  {
>  }
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 78c99300b911..af50f20de369 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1678,10 +1678,18 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  
>  
>  /* Compact all zones within a node */
> -static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
> +static void compact_node(int nid)
>  {
> +	pg_data_t *pgdat = NODE_DATA(nid);
>  	int zoneid;
>  	struct zone *zone;
> +	struct compact_control cc = {
> +		.order = -1,
> +		.mode = MIGRATE_SYNC,
> +		.ignore_skip_hint = true,
> +		.whole_zone = true,
> +	};
> +
>  
>  	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
>  
> @@ -1689,53 +1697,19 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
>  		if (!populated_zone(zone))
>  			continue;
>  
> -		cc->nr_freepages = 0;
> -		cc->nr_migratepages = 0;
> -		cc->zone = zone;
> -		INIT_LIST_HEAD(&cc->freepages);
> -		INIT_LIST_HEAD(&cc->migratepages);
> -
> -		if (is_via_compact_memory(cc->order) ||
> -				!compaction_deferred(zone, cc->order))
> -			compact_zone(zone, cc);
> -
> -		VM_BUG_ON(!list_empty(&cc->freepages));
> -		VM_BUG_ON(!list_empty(&cc->migratepages));
> +		cc.nr_freepages = 0;
> +		cc.nr_migratepages = 0;
> +		cc.zone = zone;
> +		INIT_LIST_HEAD(&cc.freepages);
> +		INIT_LIST_HEAD(&cc.migratepages);
>  
> -		if (is_via_compact_memory(cc->order))
> -			continue;
> +		compact_zone(zone, &cc);
>  
> -		if (zone_watermark_ok(zone, cc->order,
> -				low_wmark_pages(zone), 0, 0))
> -			compaction_defer_reset(zone, cc->order, false);
> +		VM_BUG_ON(!list_empty(&cc.freepages));
> +		VM_BUG_ON(!list_empty(&cc.migratepages));
>  	}
>  }
>  
> -void compact_pgdat(pg_data_t *pgdat, int order)
> -{
> -	struct compact_control cc = {
> -		.order = order,
> -		.mode = MIGRATE_ASYNC,
> -	};
> -
> -	if (!order)
> -		return;
> -
> -	__compact_pgdat(pgdat, &cc);
> -}
> -
> -static void compact_node(int nid)
> -{
> -	struct compact_control cc = {
> -		.order = -1,
> -		.mode = MIGRATE_SYNC,
> -		.ignore_skip_hint = true,
> -		.whole_zone = true,
> -	};
> -
> -	__compact_pgdat(NODE_DATA(nid), &cc);
> -}
> -
>  /* Compact all nodes in the system */
>  static void compact_nodes(void)
>  {
> -- 
> 2.8.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 12/18] mm, compaction: more reliably increase direct compaction priority
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 13:51     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:12, Vlastimil Babka wrote:
> During reclaim/compaction loop, compaction priority can be increased by the
> should_compact_retry() function, but the current code is not optimal. Priority
> is only increased when compaction_failed() is true, which means that compaction
> has scanned the whole zone. This may not happen even after multiple attempts
> with the lower priority due to parallel activity, so we might needlessly
> struggle on the lower priority.
> 
> We can remove these corner cases by increasing compaction priority regardless
> of compaction_failed(). Examining further the compaction result can be
> postponed only after reaching the highest priority. This is a simple solution
> and we don't need to worry about reaching the highest priority "too soon" here,
> because hen should_compact_retry() is called it means that the system is
> already struggling and the allocation is supposed to either try as hard as
> possible, or it cannot fail at all. There's not much point staying at lower
> priorities with heuristics that may result in only partial compaction.
> 
> The only exception here is the COMPACT_SKIPPED result, which means that
> compaction could not run at all due to being below order-0 watermarks. In that
> case, don't increase compaction priority, and check if compaction could proceed
> when everything reclaimable was reclaimed. Before this patch, this was tied to
> compaction_withdrawn(), but the other results considered there are in fact only
> possible due to low compaction priority so we can ignore them thanks to the
> patch. Since there are no other callers of compaction_withdrawn(), remove it.

I agree with the change in general. I think that keeping compaction_withdrawn
even with a single check is better because it abstracts the fact from a
specific constant.

Now that I think about that some more I guess you also want to update
compaction_retries inside should_compact_retry as well, or at least
update it only when we have reached the lowest priority. What do you
think?
 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Other than that this makes sense
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/compaction.h | 46 ----------------------------------------------
>  mm/page_alloc.c            | 37 ++++++++++++++++++++++---------------
>  2 files changed, 22 insertions(+), 61 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 29dc7c05bd3b..4bef69a83f8f 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -105,47 +105,6 @@ static inline bool compaction_failed(enum compact_result result)
>  	return false;
>  }
>  
> -/*
> - * Compaction  has backed off for some reason. It might be throttling or
> - * lock contention. Retrying is still worthwhile.
> - */
> -static inline bool compaction_withdrawn(enum compact_result result)
> -{
> -	/*
> -	 * Compaction backed off due to watermark checks for order-0
> -	 * so the regular reclaim has to try harder and reclaim something.
> -	 */
> -	if (result == COMPACT_SKIPPED)
> -		return true;
> -
> -	/*
> -	 * If compaction is deferred for high-order allocations, it is
> -	 * because sync compaction recently failed. If this is the case
> -	 * and the caller requested a THP allocation, we do not want
> -	 * to heavily disrupt the system, so we fail the allocation
> -	 * instead of entering direct reclaim.
> -	 */
> -	if (result == COMPACT_DEFERRED)
> -		return true;
> -
> -	/*
> -	 * If compaction in async mode encounters contention or blocks higher
> -	 * priority task we back off early rather than cause stalls.
> -	 */
> -	if (result == COMPACT_CONTENDED)
> -		return true;
> -
> -	/*
> -	 * Page scanners have met but we haven't scanned full zones so this
> -	 * is a back off in fact.
> -	 */
> -	if (result == COMPACT_PARTIAL_SKIPPED)
> -		return true;
> -
> -	return false;
> -}
> -
> -
>  bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>  					int alloc_flags);
>  
> @@ -183,11 +142,6 @@ static inline bool compaction_failed(enum compact_result result)
>  	return false;
>  }
>  
> -static inline bool compaction_withdrawn(enum compact_result result)
> -{
> -	return true;
> -}
> -
>  static inline int kcompactd_run(int nid)
>  {
>  	return 0;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 27923af8e534..dee486936ccf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3235,28 +3235,35 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  		return false;
>  
>  	/*
> -	 * compaction considers all the zone as desperately out of memory
> -	 * so it doesn't really make much sense to retry except when the
> -	 * failure could be caused by insufficient priority
> +	 * Compaction backed off due to watermark checks for order-0
> +	 * so the regular reclaim has to try harder and reclaim something
> +	 * Retry only if it looks like reclaim might have a chance.
>  	 */
> -	if (compaction_failed(compact_result)) {
> -		if (*compact_priority > MIN_COMPACT_PRIORITY) {
> -			(*compact_priority)--;
> -			return true;
> -		}
> -		return false;
> +	if (compact_result == COMPACT_SKIPPED)
> +		return compaction_zonelist_suitable(ac, order, alloc_flags);
> +
> +	/*
> +	 * Compaction could have withdrawn early or skip some zones or
> +	 * pageblocks. We were asked to retry, which means the allocation
> +	 * should try really hard, so increase the priority if possible.
> +	 */
> +	if (*compact_priority > MIN_COMPACT_PRIORITY) {
> +		(*compact_priority)--;
> +		return true;
>  	}
>  
>  	/*
> -	 * make sure the compaction wasn't deferred or didn't bail out early
> -	 * due to locks contention before we declare that we should give up.
> -	 * But do not retry if the given zonelist is not suitable for
> -	 * compaction.
> +	 * Compaction considers all the zones as unfixably fragmented and we
> +	 * are on the highest priority, which means it can't be due to
> +	 * heuristics and it doesn't really make much sense to retry.
>  	 */
> -	if (compaction_withdrawn(compact_result))
> -		return compaction_zonelist_suitable(ac, order, alloc_flags);
> +	if (compaction_failed(compact_result))
> +		return false;
>  
>  	/*
> +	 * The remaining possibility is that compaction made progress and
> +	 * created a high-order page, but it was allocated by somebody else.
> +	 * To prevent thrashing, limit the number of retries in such case.
>  	 * !costly requests are much more important than __GFP_REPEAT
>  	 * costly ones because they are de facto nofail and invoke OOM
>  	 * killer to move on while costly can fail and users are ready
> -- 
> 2.8.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 12/18] mm, compaction: more reliably increase direct compaction priority
@ 2016-06-01 13:51     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:12, Vlastimil Babka wrote:
> During reclaim/compaction loop, compaction priority can be increased by the
> should_compact_retry() function, but the current code is not optimal. Priority
> is only increased when compaction_failed() is true, which means that compaction
> has scanned the whole zone. This may not happen even after multiple attempts
> with the lower priority due to parallel activity, so we might needlessly
> struggle on the lower priority.
> 
> We can remove these corner cases by increasing compaction priority regardless
> of compaction_failed(). Examining further the compaction result can be
> postponed only after reaching the highest priority. This is a simple solution
> and we don't need to worry about reaching the highest priority "too soon" here,
> because hen should_compact_retry() is called it means that the system is
> already struggling and the allocation is supposed to either try as hard as
> possible, or it cannot fail at all. There's not much point staying at lower
> priorities with heuristics that may result in only partial compaction.
> 
> The only exception here is the COMPACT_SKIPPED result, which means that
> compaction could not run at all due to being below order-0 watermarks. In that
> case, don't increase compaction priority, and check if compaction could proceed
> when everything reclaimable was reclaimed. Before this patch, this was tied to
> compaction_withdrawn(), but the other results considered there are in fact only
> possible due to low compaction priority so we can ignore them thanks to the
> patch. Since there are no other callers of compaction_withdrawn(), remove it.

I agree with the change in general. I think that keeping compaction_withdrawn
even with a single check is better because it abstracts the fact from a
specific constant.

Now that I think about that some more I guess you also want to update
compaction_retries inside should_compact_retry as well, or at least
update it only when we have reached the lowest priority. What do you
think?
 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Other than that this makes sense
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/compaction.h | 46 ----------------------------------------------
>  mm/page_alloc.c            | 37 ++++++++++++++++++++++---------------
>  2 files changed, 22 insertions(+), 61 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 29dc7c05bd3b..4bef69a83f8f 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -105,47 +105,6 @@ static inline bool compaction_failed(enum compact_result result)
>  	return false;
>  }
>  
> -/*
> - * Compaction  has backed off for some reason. It might be throttling or
> - * lock contention. Retrying is still worthwhile.
> - */
> -static inline bool compaction_withdrawn(enum compact_result result)
> -{
> -	/*
> -	 * Compaction backed off due to watermark checks for order-0
> -	 * so the regular reclaim has to try harder and reclaim something.
> -	 */
> -	if (result == COMPACT_SKIPPED)
> -		return true;
> -
> -	/*
> -	 * If compaction is deferred for high-order allocations, it is
> -	 * because sync compaction recently failed. If this is the case
> -	 * and the caller requested a THP allocation, we do not want
> -	 * to heavily disrupt the system, so we fail the allocation
> -	 * instead of entering direct reclaim.
> -	 */
> -	if (result == COMPACT_DEFERRED)
> -		return true;
> -
> -	/*
> -	 * If compaction in async mode encounters contention or blocks higher
> -	 * priority task we back off early rather than cause stalls.
> -	 */
> -	if (result == COMPACT_CONTENDED)
> -		return true;
> -
> -	/*
> -	 * Page scanners have met but we haven't scanned full zones so this
> -	 * is a back off in fact.
> -	 */
> -	if (result == COMPACT_PARTIAL_SKIPPED)
> -		return true;
> -
> -	return false;
> -}
> -
> -
>  bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>  					int alloc_flags);
>  
> @@ -183,11 +142,6 @@ static inline bool compaction_failed(enum compact_result result)
>  	return false;
>  }
>  
> -static inline bool compaction_withdrawn(enum compact_result result)
> -{
> -	return true;
> -}
> -
>  static inline int kcompactd_run(int nid)
>  {
>  	return 0;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 27923af8e534..dee486936ccf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3235,28 +3235,35 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  		return false;
>  
>  	/*
> -	 * compaction considers all the zone as desperately out of memory
> -	 * so it doesn't really make much sense to retry except when the
> -	 * failure could be caused by insufficient priority
> +	 * Compaction backed off due to watermark checks for order-0
> +	 * so the regular reclaim has to try harder and reclaim something
> +	 * Retry only if it looks like reclaim might have a chance.
>  	 */
> -	if (compaction_failed(compact_result)) {
> -		if (*compact_priority > MIN_COMPACT_PRIORITY) {
> -			(*compact_priority)--;
> -			return true;
> -		}
> -		return false;
> +	if (compact_result == COMPACT_SKIPPED)
> +		return compaction_zonelist_suitable(ac, order, alloc_flags);
> +
> +	/*
> +	 * Compaction could have withdrawn early or skip some zones or
> +	 * pageblocks. We were asked to retry, which means the allocation
> +	 * should try really hard, so increase the priority if possible.
> +	 */
> +	if (*compact_priority > MIN_COMPACT_PRIORITY) {
> +		(*compact_priority)--;
> +		return true;
>  	}
>  
>  	/*
> -	 * make sure the compaction wasn't deferred or didn't bail out early
> -	 * due to locks contention before we declare that we should give up.
> -	 * But do not retry if the given zonelist is not suitable for
> -	 * compaction.
> +	 * Compaction considers all the zones as unfixably fragmented and we
> +	 * are on the highest priority, which means it can't be due to
> +	 * heuristics and it doesn't really make much sense to retry.
>  	 */
> -	if (compaction_withdrawn(compact_result))
> -		return compaction_zonelist_suitable(ac, order, alloc_flags);
> +	if (compaction_failed(compact_result))
> +		return false;
>  
>  	/*
> +	 * The remaining possibility is that compaction made progress and
> +	 * created a high-order page, but it was allocated by somebody else.
> +	 * To prevent thrashing, limit the number of retries in such case.
>  	 * !costly requests are much more important than __GFP_REPEAT
>  	 * costly ones because they are de facto nofail and invoke OOM
>  	 * killer to move on while costly can fail and users are ready
> -- 
> 2.8.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 13/18] mm, compaction: use correct watermark when checking allocation success
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 13:59     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:13, Vlastimil Babka wrote:
> The __compact_finished() function uses low watermark in a check that has to
> pass if the direct compaction is to finish and allocation should succeed. This
> is too pessimistic, as the allocation will typically use min watermark. It may
> happen that during compaction, we drop below the low watermark (due to parallel
> activity), but still form the target high-order page. By checking against low
> watermark, we might needlessly continue compaction.
> 
> Similarly, __compaction_suitable() uses low watermark in a check whether
> allocation can succeed without compaction. Again, this is unnecessarily
> pessimistic.
> 
> After this patch, these check will use direct compactor's alloc_flags to
> determine the watermark, which is effectively the min watermark.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/compaction.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a399e7ca4630..4b21a26694a2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1262,7 +1262,7 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
>  		return COMPACT_CONTINUE;
>  
>  	/* Compaction run is not finished if the watermark is not met */
> -	watermark = low_wmark_pages(zone);
> +	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
>  
>  	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
>  							cc->alloc_flags))
> @@ -1327,7 +1327,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  	if (is_via_compact_memory(order))
>  		return COMPACT_CONTINUE;
>  
> -	watermark = low_wmark_pages(zone);
> +	watermark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
>  	/*
>  	 * If watermarks for high-order allocation are already met, there
>  	 * should be no need for compaction at all.
> @@ -1341,7 +1341,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  	 * This is because during migration, copies of pages need to be
>  	 * allocated and for a short time, the footprint is higher
>  	 */
> -	watermark += (2UL << order);
> +	watermark = low_wmark_pages(zone) + (2UL << order);
>  	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>  				 alloc_flags, wmark_target))
>  		return COMPACT_SKIPPED;
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 13/18] mm, compaction: use correct watermark when checking allocation success
@ 2016-06-01 13:59     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 13:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:13, Vlastimil Babka wrote:
> The __compact_finished() function uses low watermark in a check that has to
> pass if the direct compaction is to finish and allocation should succeed. This
> is too pessimistic, as the allocation will typically use min watermark. It may
> happen that during compaction, we drop below the low watermark (due to parallel
> activity), but still form the target high-order page. By checking against low
> watermark, we might needlessly continue compaction.
> 
> Similarly, __compaction_suitable() uses low watermark in a check whether
> allocation can succeed without compaction. Again, this is unnecessarily
> pessimistic.
> 
> After this patch, these check will use direct compactor's alloc_flags to
> determine the watermark, which is effectively the min watermark.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/compaction.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a399e7ca4630..4b21a26694a2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1262,7 +1262,7 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
>  		return COMPACT_CONTINUE;
>  
>  	/* Compaction run is not finished if the watermark is not met */
> -	watermark = low_wmark_pages(zone);
> +	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
>  
>  	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
>  							cc->alloc_flags))
> @@ -1327,7 +1327,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  	if (is_via_compact_memory(order))
>  		return COMPACT_CONTINUE;
>  
> -	watermark = low_wmark_pages(zone);
> +	watermark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
>  	/*
>  	 * If watermarks for high-order allocation are already met, there
>  	 * should be no need for compaction at all.
> @@ -1341,7 +1341,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  	 * This is because during migration, copies of pages need to be
>  	 * allocated and for a short time, the footprint is higher
>  	 */
> -	watermark += (2UL << order);
> +	watermark = low_wmark_pages(zone) + (2UL << order);
>  	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>  				 alloc_flags, wmark_target))
>  		return COMPACT_SKIPPED;
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 14/18] mm, compaction: create compact_gap wrapper
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 14:02     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 14:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:14, Vlastimil Babka wrote:
> Compaction uses a watermark gap of (2UL << order) pages at various places and
> it's not immediately obvious why. Abstract it through a compact_gap() wrapper
> to create a single place with a thorough explanation.

Yes the comment is helpful.
 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/compaction.h | 16 ++++++++++++++++
>  mm/compaction.c            |  7 +++----
>  mm/vmscan.c                |  4 ++--
>  3 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 4bef69a83f8f..654cb74418c4 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -58,6 +58,22 @@ enum compact_result {
>  
>  struct alloc_context; /* in mm/internal.h */
>  
> +/*
> + * Number of free order-0 pages that should be available above given watermark
> + * to make sure compaction has reasonable chance of not running out of free
> + * pages that it needs to isolate as migration target during its work.
> + */
> +static inline unsigned long compact_gap(unsigned int order)
> +{
> +	/*
> +	 * Although all the isolations for migration are temporary, compaction
> +	 * may have up to 1 << order pages on its list and then try to split
> +	 * an (order - 1) free page. At that point, a gap of 1 << order might
> +	 * not be enough, so it's safer to require twice that amount.
> +	 */
> +	return 2UL << order;
> +}
> +
>  #ifdef CONFIG_COMPACTION
>  extern int sysctl_compact_memory;
>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4b21a26694a2..bcab680ccb8a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1337,11 +1337,10 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  		return COMPACT_PARTIAL;
>  
>  	/*
> -	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
> -	 * This is because during migration, copies of pages need to be
> -	 * allocated and for a short time, the footprint is higher
> +	 * Watermarks for order-0 must be met for compaction to be able to
> +	 * isolate free pages for migration targets.
>  	 */
> -	watermark = low_wmark_pages(zone) + (2UL << order);
> +	watermark = low_wmark_pages(zone) + compact_gap(order);
>  	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>  				 alloc_flags, wmark_target))
>  		return COMPACT_SKIPPED;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c4a2f4512fca..00034ec9229b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2345,7 +2345,7 @@ static inline bool should_continue_reclaim(struct zone *zone,
>  	 * If we have not reclaimed enough pages for compaction and the
>  	 * inactive lists are large enough, continue reclaiming
>  	 */
> -	pages_for_compaction = (2UL << sc->order);
> +	pages_for_compaction = compact_gap(sc->order);
>  	inactive_lru_pages = zone_page_state(zone, NR_INACTIVE_FILE);
>  	if (get_nr_swap_pages() > 0)
>  		inactive_lru_pages += zone_page_state(zone, NR_INACTIVE_ANON);
> @@ -2472,7 +2472,7 @@ static inline bool compaction_ready(struct zone *zone, int order, int classzone_
>  	 */
>  	balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
>  			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
> -	watermark = high_wmark_pages(zone) + balance_gap + (2UL << order);
> +	watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
>  	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
>  
>  	/*
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 14/18] mm, compaction: create compact_gap wrapper
@ 2016-06-01 14:02     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 14:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:14, Vlastimil Babka wrote:
> Compaction uses a watermark gap of (2UL << order) pages at various places and
> it's not immediately obvious why. Abstract it through a compact_gap() wrapper
> to create a single place with a thorough explanation.

Yes the comment is helpful.
 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/compaction.h | 16 ++++++++++++++++
>  mm/compaction.c            |  7 +++----
>  mm/vmscan.c                |  4 ++--
>  3 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 4bef69a83f8f..654cb74418c4 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -58,6 +58,22 @@ enum compact_result {
>  
>  struct alloc_context; /* in mm/internal.h */
>  
> +/*
> + * Number of free order-0 pages that should be available above given watermark
> + * to make sure compaction has reasonable chance of not running out of free
> + * pages that it needs to isolate as migration target during its work.
> + */
> +static inline unsigned long compact_gap(unsigned int order)
> +{
> +	/*
> +	 * Although all the isolations for migration are temporary, compaction
> +	 * may have up to 1 << order pages on its list and then try to split
> +	 * an (order - 1) free page. At that point, a gap of 1 << order might
> +	 * not be enough, so it's safer to require twice that amount.
> +	 */
> +	return 2UL << order;
> +}
> +
>  #ifdef CONFIG_COMPACTION
>  extern int sysctl_compact_memory;
>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4b21a26694a2..bcab680ccb8a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1337,11 +1337,10 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  		return COMPACT_PARTIAL;
>  
>  	/*
> -	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
> -	 * This is because during migration, copies of pages need to be
> -	 * allocated and for a short time, the footprint is higher
> +	 * Watermarks for order-0 must be met for compaction to be able to
> +	 * isolate free pages for migration targets.
>  	 */
> -	watermark = low_wmark_pages(zone) + (2UL << order);
> +	watermark = low_wmark_pages(zone) + compact_gap(order);
>  	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>  				 alloc_flags, wmark_target))
>  		return COMPACT_SKIPPED;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c4a2f4512fca..00034ec9229b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2345,7 +2345,7 @@ static inline bool should_continue_reclaim(struct zone *zone,
>  	 * If we have not reclaimed enough pages for compaction and the
>  	 * inactive lists are large enough, continue reclaiming
>  	 */
> -	pages_for_compaction = (2UL << sc->order);
> +	pages_for_compaction = compact_gap(sc->order);
>  	inactive_lru_pages = zone_page_state(zone, NR_INACTIVE_FILE);
>  	if (get_nr_swap_pages() > 0)
>  		inactive_lru_pages += zone_page_state(zone, NR_INACTIVE_ANON);
> @@ -2472,7 +2472,7 @@ static inline bool compaction_ready(struct zone *zone, int order, int classzone_
>  	 */
>  	balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
>  			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
> -	watermark = high_wmark_pages(zone) + balance_gap + (2UL << order);
> +	watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
>  	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
>  
>  	/*
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 16/18] mm, compaction: require only min watermarks for non-costly orders
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 14:08     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 14:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:16, Vlastimil Babka wrote:
> The __compaction_suitable() function checks the low watermark plus a
> compact_gap() gap to decide if there's enough free memory to perform
> compaction. Then __isolate_free_page uses low watermark check to decide if
> particular free page can be isolated. In the latter case, using low watermark
> is needlessly pessimistic, as the free page isolations are only temporary. For
> __compaction_suitable() the higher watermark makes sense for high-order
> allocations where more freepages increase the chance of success, and we can
> typically fail with some order-0 fallback when the system is struggling to
> reach that watermark. But for low-order allocation, forming the page should not
> be that hard. So using low watermark here might just prevent compaction from
> even trying, and eventually lead to OOM killer even if we are above min
> watermarks.
> 
> So after this patch, we use min watermark for non-costly orders in
> __compaction_suitable(), and for all orders in __isolate_free_page().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/compaction.c | 6 +++++-
>  mm/page_alloc.c | 2 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4ffa0870192b..d854519a5302 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1345,10 +1345,14 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  	 * isolation. We however do use the direct compactor's classzone_idx to
>  	 * skip over zones where lowmem reserves would prevent allocation even
>  	 * if compaction succeeds.
> +	 * For costly orders, we require low watermark instead of min for
> +	 * compaction to proceed to increase its chances.
>  	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
>  	 * suitable migration targets
>  	 */
> -	watermark = low_wmark_pages(zone) + compact_gap(order);
> +	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
> +				low_wmark_pages(zone) : min_wmark_pages(zone);
> +	watermark += compact_gap(order);
>  	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>  						ALLOC_CMA, wmark_target))
>  		return COMPACT_SKIPPED;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 09dc9db8a7e9..5b4c9e567fc1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2489,7 +2489,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  
>  	if (!is_migrate_isolate(mt)) {
>  		/* Obey watermarks as if the page was being allocated */
> -		watermark = low_wmark_pages(zone) + (1 << order);
> +		watermark = min_wmark_pages(zone) + (1UL << order);
>  		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
>  			return 0;
>  
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 16/18] mm, compaction: require only min watermarks for non-costly orders
@ 2016-06-01 14:08     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 14:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:16, Vlastimil Babka wrote:
> The __compaction_suitable() function checks the low watermark plus a
> compact_gap() gap to decide if there's enough free memory to perform
> compaction. Then __isolate_free_page uses low watermark check to decide if
> particular free page can be isolated. In the latter case, using low watermark
> is needlessly pessimistic, as the free page isolations are only temporary. For
> __compaction_suitable() the higher watermark makes sense for high-order
> allocations where more freepages increase the chance of success, and we can
> typically fail with some order-0 fallback when the system is struggling to
> reach that watermark. But for low-order allocation, forming the page should not
> be that hard. So using low watermark here might just prevent compaction from
> even trying, and eventually lead to OOM killer even if we are above min
> watermarks.
> 
> So after this patch, we use min watermark for non-costly orders in
> __compaction_suitable(), and for all orders in __isolate_free_page().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/compaction.c | 6 +++++-
>  mm/page_alloc.c | 2 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4ffa0870192b..d854519a5302 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1345,10 +1345,14 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  	 * isolation. We however do use the direct compactor's classzone_idx to
>  	 * skip over zones where lowmem reserves would prevent allocation even
>  	 * if compaction succeeds.
> +	 * For costly orders, we require low watermark instead of min for
> +	 * compaction to proceed to increase its chances.
>  	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
>  	 * suitable migration targets
>  	 */
> -	watermark = low_wmark_pages(zone) + compact_gap(order);
> +	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
> +				low_wmark_pages(zone) : min_wmark_pages(zone);
> +	watermark += compact_gap(order);
>  	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>  						ALLOC_CMA, wmark_target))
>  		return COMPACT_SKIPPED;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 09dc9db8a7e9..5b4c9e567fc1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2489,7 +2489,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  
>  	if (!is_migrate_isolate(mt)) {
>  		/* Obey watermarks as if the page was being allocated */
> -		watermark = low_wmark_pages(zone) + (1 << order);
> +		watermark = min_wmark_pages(zone) + (1UL << order);
>  		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
>  			return 0;
>  
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 17/18] mm, vmscan: make compaction_ready() more accurate and readable
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 14:14     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 14:14 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:17, Vlastimil Babka wrote:
> The compaction_ready() is used during direct reclaim for costly order
> allocations to skip reclaim for zones where compaction should be attempted
> instead. It's combining the standard compaction_suitable() check with its own
> watermark check based on high watermark with extra gap, and the result is
> confusing at best.
> 
> This patch attempts to better structure and document the checks involved.
> First, compaction_suitable() can determine that the allocation should either
> succeed already, or that compaction doesn't have enough free pages to proceed.
> The third possibility is that compaction has enough free pages, but we still
> decide to reclaim first - unless we are already above the high watermark with
> gap.  This does not mean that the reclaim will actually reach this watermark
> during single attempt, this is rather an over-reclaim protection. So document
> the code as such. The check for compaction_deferred() is removed completely, as
> it in fact had no proper role here.
> 
> The result after this patch is mainly a less confusing code. We also skip some
> over-reclaim in cases where the allocation should already succed.

Yes this is indeed more understandable

> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/vmscan.c | 49 +++++++++++++++++++++++--------------------------
>  1 file changed, 23 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 00034ec9229b..640d2e615c36 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2456,40 +2456,37 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  }
>  
>  /*
> - * Returns true if compaction should go ahead for a high-order request, or
> - * the high-order allocation would succeed without compaction.
> + * Returns true if compaction should go ahead for a costly-order request, or
> + * the allocation would already succeed without compaction. Return false if we
> + * should reclaim first.
>   */
>  static inline bool compaction_ready(struct zone *zone, int order, int classzone_idx)
>  {
> -	unsigned long balance_gap, watermark;
> -	bool watermark_ok;
> +	unsigned long watermark;
> +	enum compact_result suitable;
>  
> -	/*
> -	 * Compaction takes time to run and there are potentially other
> -	 * callers using the pages just freed. Continue reclaiming until
> -	 * there is a buffer of free pages available to give compaction
> -	 * a reasonable chance of completing and allocating the page
> -	 */
> -	balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
> -			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
> -	watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
> -	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
> -
> -	/*
> -	 * If compaction is deferred, reclaim up to a point where
> -	 * compaction will have a chance of success when re-enabled
> -	 */
> -	if (compaction_deferred(zone, order))
> -		return watermark_ok;
> +	suitable = compaction_suitable(zone, order, 0, classzone_idx);
> +	if (suitable == COMPACT_PARTIAL)
> +		/* Allocation should succeed already. Don't reclaim. */
> +		return true;
> +	if (suitable == COMPACT_SKIPPED)
> +		/* Compaction cannot yet proceed. Do reclaim. */
> +		return false;
>  
>  	/*
> -	 * If compaction is not ready to start and allocation is not likely
> -	 * to succeed without it, then keep reclaiming.
> +	 * Compaction is already possible, but it takes time to run and there
> +	 * are potentially other callers using the pages just freed. So proceed
> +	 * with reclaim to make a buffer of free pages available to give
> +	 * compaction a reasonable chance of completing and allocating the page.
> +	 * Note that we won't actually reclaim the whole buffer in one attempt
> +	 * as the target watermark in should_continue_reclaim() is lower. But if
> +	 * we are already above the high+gap watermark, don't reclaim at all.
>  	 */
> -	if (compaction_suitable(zone, order, 0, classzone_idx) == COMPACT_SKIPPED)
> -		return false;
> +	watermark = high_wmark_pages(zone) + compact_gap(order);
> +	watermark += min(low_wmark_pages(zone), DIV_ROUND_UP(
> +			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
>  
> -	return watermark_ok;
> +	return zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
>  }
>  
>  /*
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 17/18] mm, vmscan: make compaction_ready() more accurate and readable
@ 2016-06-01 14:14     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 14:14 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:17, Vlastimil Babka wrote:
> The compaction_ready() is used during direct reclaim for costly order
> allocations to skip reclaim for zones where compaction should be attempted
> instead. It's combining the standard compaction_suitable() check with its own
> watermark check based on high watermark with extra gap, and the result is
> confusing at best.
> 
> This patch attempts to better structure and document the checks involved.
> First, compaction_suitable() can determine that the allocation should either
> succeed already, or that compaction doesn't have enough free pages to proceed.
> The third possibility is that compaction has enough free pages, but we still
> decide to reclaim first - unless we are already above the high watermark with
> gap.  This does not mean that the reclaim will actually reach this watermark
> during single attempt, this is rather an over-reclaim protection. So document
> the code as such. The check for compaction_deferred() is removed completely, as
> it in fact had no proper role here.
> 
> The result after this patch is mainly a less confusing code. We also skip some
> over-reclaim in cases where the allocation should already succed.

Yes this is indeed more understandable

> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/vmscan.c | 49 +++++++++++++++++++++++--------------------------
>  1 file changed, 23 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 00034ec9229b..640d2e615c36 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2456,40 +2456,37 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  }
>  
>  /*
> - * Returns true if compaction should go ahead for a high-order request, or
> - * the high-order allocation would succeed without compaction.
> + * Returns true if compaction should go ahead for a costly-order request, or
> + * the allocation would already succeed without compaction. Return false if we
> + * should reclaim first.
>   */
>  static inline bool compaction_ready(struct zone *zone, int order, int classzone_idx)
>  {
> -	unsigned long balance_gap, watermark;
> -	bool watermark_ok;
> +	unsigned long watermark;
> +	enum compact_result suitable;
>  
> -	/*
> -	 * Compaction takes time to run and there are potentially other
> -	 * callers using the pages just freed. Continue reclaiming until
> -	 * there is a buffer of free pages available to give compaction
> -	 * a reasonable chance of completing and allocating the page
> -	 */
> -	balance_gap = min(low_wmark_pages(zone), DIV_ROUND_UP(
> -			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
> -	watermark = high_wmark_pages(zone) + balance_gap + compact_gap(order);
> -	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
> -
> -	/*
> -	 * If compaction is deferred, reclaim up to a point where
> -	 * compaction will have a chance of success when re-enabled
> -	 */
> -	if (compaction_deferred(zone, order))
> -		return watermark_ok;
> +	suitable = compaction_suitable(zone, order, 0, classzone_idx);
> +	if (suitable == COMPACT_PARTIAL)
> +		/* Allocation should succeed already. Don't reclaim. */
> +		return true;
> +	if (suitable == COMPACT_SKIPPED)
> +		/* Compaction cannot yet proceed. Do reclaim. */
> +		return false;
>  
>  	/*
> -	 * If compaction is not ready to start and allocation is not likely
> -	 * to succeed without it, then keep reclaiming.
> +	 * Compaction is already possible, but it takes time to run and there
> +	 * are potentially other callers using the pages just freed. So proceed
> +	 * with reclaim to make a buffer of free pages available to give
> +	 * compaction a reasonable chance of completing and allocating the page.
> +	 * Note that we won't actually reclaim the whole buffer in one attempt
> +	 * as the target watermark in should_continue_reclaim() is lower. But if
> +	 * we are already above the high+gap watermark, don't reclaim at all.
>  	 */
> -	if (compaction_suitable(zone, order, 0, classzone_idx) == COMPACT_SKIPPED)
> -		return false;
> +	watermark = high_wmark_pages(zone) + compact_gap(order);
> +	watermark += min(low_wmark_pages(zone), DIV_ROUND_UP(
> +			zone->managed_pages, KSWAPD_ZONE_BALANCE_GAP_RATIO));
>  
> -	return watermark_ok;
> +	return zone_watermark_ok_safe(zone, 0, watermark, classzone_idx);
>  }
>  
>  /*
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim()
  2016-05-31 13:08   ` Vlastimil Babka
@ 2016-06-01 14:21     ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 14:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:18, Vlastimil Babka wrote:
[...]
> @@ -2364,11 +2350,12 @@ static inline bool should_continue_reclaim(struct zone *zone,
>  }
>  
>  static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> -			bool is_classzone)
> +			int classzone_idx)
>  {
>  	struct reclaim_state *reclaim_state = current->reclaim_state;
>  	unsigned long nr_reclaimed, nr_scanned;
>  	bool reclaimable = false;
> +	bool is_classzone = (classzone_idx == zone_idx(zone));
>  
>  	do {
>  		struct mem_cgroup *root = sc->target_mem_cgroup;
> @@ -2450,7 +2437,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  			reclaimable = true;
>  
>  	} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
> -					 sc->nr_scanned - nr_scanned, sc));
> +			 sc->nr_scanned - nr_scanned, sc, classzone_idx));
>  
>  	return reclaimable;
>  }
> @@ -2580,7 +2567,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  			/* need some check for avoid more shrink_zone() */
>  		}
>  
> -		shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
> +		shrink_zone(zone, sc, classzone_idx);

this should be is_classzone, right?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim()
@ 2016-06-01 14:21     ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 14:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Tue 31-05-16 15:08:18, Vlastimil Babka wrote:
[...]
> @@ -2364,11 +2350,12 @@ static inline bool should_continue_reclaim(struct zone *zone,
>  }
>  
>  static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> -			bool is_classzone)
> +			int classzone_idx)
>  {
>  	struct reclaim_state *reclaim_state = current->reclaim_state;
>  	unsigned long nr_reclaimed, nr_scanned;
>  	bool reclaimable = false;
> +	bool is_classzone = (classzone_idx == zone_idx(zone));
>  
>  	do {
>  		struct mem_cgroup *root = sc->target_mem_cgroup;
> @@ -2450,7 +2437,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>  			reclaimable = true;
>  
>  	} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
> -					 sc->nr_scanned - nr_scanned, sc));
> +			 sc->nr_scanned - nr_scanned, sc, classzone_idx));
>  
>  	return reclaimable;
>  }
> @@ -2580,7 +2567,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  			/* need some check for avoid more shrink_zone() */
>  		}
>  
> -		shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
> +		shrink_zone(zone, sc, classzone_idx);

this should be is_classzone, right?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 03/18] mm, page_alloc: don't retry initial attempt in slowpath
  2016-06-01 13:26     ` Michal Hocko
@ 2016-06-01 14:58       ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-06-01 14:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On 06/01/2016 03:26 PM, Michal Hocko wrote:
> On Tue 31-05-16 15:08:03, Vlastimil Babka wrote:
> [...]
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index da3a62a94b4a..9f83259a18a8 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3367,10 +3367,9 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>>  	bool drained = false;
>>  
>>  	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
>> -	if (unlikely(!(*did_some_progress)))
>> -		return NULL;
>>  
>>  retry:
>> +	/* We attempt even when no progress, as kswapd might have done some */
>>  	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> 
> Is this really likely to happen, though? Sure we might have last few
> reclaimable pages on the LRU lists but I am not sure this would make a
> large difference then.
> 
> That being said, I do not think this is harmful but I find it a bit
> weird to invoke a reclaim and then ignore the feedback... Will leave the
> decision up to you but the original patch seemed neater.

OK, I'll think about it.

>>  
>>  	/*
>> @@ -3378,7 +3377,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>>  	 * pages are pinned on the per-cpu lists or in high alloc reserves.
>>  	 * Shrink them them and try again
>>  	 */
>> -	if (!page && !drained) {
>> +	if (!page && *did_some_progress && !drained) {
>>  		unreserve_highatomic_pageblock(ac);
>>  		drain_all_pages(NULL);
>>  		drained = true;
> 
> I do not remember this in the previous version.

Because it's consequence of the new hunk above.

> Why shouldn't we
> unreserve highatomic reserves when there was no progress?

Previously the "return NULL" for no progress would also skip this. So I
wanted to change just the get_page_from_freelist() part. IIUC the
reasoning here is that if there was reclaim progress but we didn't
succeed getting the page, it can mean it's stuck on per-cpu or reserve.
If there was no progress, it's unlikely that anything is stuck there.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 03/18] mm, page_alloc: don't retry initial attempt in slowpath
@ 2016-06-01 14:58       ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-06-01 14:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On 06/01/2016 03:26 PM, Michal Hocko wrote:
> On Tue 31-05-16 15:08:03, Vlastimil Babka wrote:
> [...]
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index da3a62a94b4a..9f83259a18a8 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3367,10 +3367,9 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>>  	bool drained = false;
>>  
>>  	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
>> -	if (unlikely(!(*did_some_progress)))
>> -		return NULL;
>>  
>>  retry:
>> +	/* We attempt even when no progress, as kswapd might have done some */
>>  	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> 
> Is this really likely to happen, though? Sure we might have last few
> reclaimable pages on the LRU lists but I am not sure this would make a
> large difference then.
> 
> That being said, I do not think this is harmful but I find it a bit
> weird to invoke a reclaim and then ignore the feedback... Will leave the
> decision up to you but the original patch seemed neater.

OK, I'll think about it.

>>  
>>  	/*
>> @@ -3378,7 +3377,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>>  	 * pages are pinned on the per-cpu lists or in high alloc reserves.
>>  	 * Shrink them them and try again
>>  	 */
>> -	if (!page && !drained) {
>> +	if (!page && *did_some_progress && !drained) {
>>  		unreserve_highatomic_pageblock(ac);
>>  		drain_all_pages(NULL);
>>  		drained = true;
> 
> I do not remember this in the previous version.

Because it's consequence of the new hunk above.

> Why shouldn't we
> unreserve highatomic reserves when there was no progress?

Previously the "return NULL" for no progress would also skip this. So I
wanted to change just the get_page_from_freelist() part. IIUC the
reasoning here is that if there was reclaim progress but we didn't
succeed getting the page, it can mean it's stuck on per-cpu or reserve.
If there was no progress, it's unlikely that anything is stuck there.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim()
  2016-06-01 14:21     ` Michal Hocko
@ 2016-06-01 15:19       ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-06-01 15:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On 06/01/2016 04:21 PM, Michal Hocko wrote:
> On Tue 31-05-16 15:08:18, Vlastimil Babka wrote:
> [...]
>> @@ -2364,11 +2350,12 @@ static inline bool should_continue_reclaim(struct zone *zone,
>>  }
>>  
>>  static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>> -			bool is_classzone)
>> +			int classzone_idx)
>>  {
>>  	struct reclaim_state *reclaim_state = current->reclaim_state;
>>  	unsigned long nr_reclaimed, nr_scanned;
>>  	bool reclaimable = false;
>> +	bool is_classzone = (classzone_idx == zone_idx(zone));
>>  
>>  	do {
>>  		struct mem_cgroup *root = sc->target_mem_cgroup;
>> @@ -2450,7 +2437,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>>  			reclaimable = true;
>>  
>>  	} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
>> -					 sc->nr_scanned - nr_scanned, sc));
>> +			 sc->nr_scanned - nr_scanned, sc, classzone_idx));
>>  
>>  	return reclaimable;
>>  }
>> @@ -2580,7 +2567,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>>  			/* need some check for avoid more shrink_zone() */
>>  		}
>>  
>> -		shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
>> +		shrink_zone(zone, sc, classzone_idx);
> 
> this should be is_classzone, right?

No, this is shrink_zones() context, not shrink_zone().

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim()
@ 2016-06-01 15:19       ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-06-01 15:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On 06/01/2016 04:21 PM, Michal Hocko wrote:
> On Tue 31-05-16 15:08:18, Vlastimil Babka wrote:
> [...]
>> @@ -2364,11 +2350,12 @@ static inline bool should_continue_reclaim(struct zone *zone,
>>  }
>>  
>>  static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>> -			bool is_classzone)
>> +			int classzone_idx)
>>  {
>>  	struct reclaim_state *reclaim_state = current->reclaim_state;
>>  	unsigned long nr_reclaimed, nr_scanned;
>>  	bool reclaimable = false;
>> +	bool is_classzone = (classzone_idx == zone_idx(zone));
>>  
>>  	do {
>>  		struct mem_cgroup *root = sc->target_mem_cgroup;
>> @@ -2450,7 +2437,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
>>  			reclaimable = true;
>>  
>>  	} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
>> -					 sc->nr_scanned - nr_scanned, sc));
>> +			 sc->nr_scanned - nr_scanned, sc, classzone_idx));
>>  
>>  	return reclaimable;
>>  }
>> @@ -2580,7 +2567,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>>  			/* need some check for avoid more shrink_zone() */
>>  		}
>>  
>> -		shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
>> +		shrink_zone(zone, sc, classzone_idx);
> 
> this should be is_classzone, right?

No, this is shrink_zones() context, not shrink_zone().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim()
  2016-06-01 15:19       ` Vlastimil Babka
@ 2016-06-01 15:45         ` Michal Hocko
  -1 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 15:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Wed 01-06-16 17:19:26, Vlastimil Babka wrote:
> On 06/01/2016 04:21 PM, Michal Hocko wrote:
> > On Tue 31-05-16 15:08:18, Vlastimil Babka wrote:
> > [...]
> >> @@ -2364,11 +2350,12 @@ static inline bool should_continue_reclaim(struct zone *zone,
> >>  }
> >>  
> >>  static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> >> -			bool is_classzone)
> >> +			int classzone_idx)
> >>  {
> >>  	struct reclaim_state *reclaim_state = current->reclaim_state;
> >>  	unsigned long nr_reclaimed, nr_scanned;
> >>  	bool reclaimable = false;
> >> +	bool is_classzone = (classzone_idx == zone_idx(zone));
> >>  
> >>  	do {
> >>  		struct mem_cgroup *root = sc->target_mem_cgroup;
> >> @@ -2450,7 +2437,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> >>  			reclaimable = true;
> >>  
> >>  	} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
> >> -					 sc->nr_scanned - nr_scanned, sc));
> >> +			 sc->nr_scanned - nr_scanned, sc, classzone_idx));
> >>  
> >>  	return reclaimable;
> >>  }
> >> @@ -2580,7 +2567,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> >>  			/* need some check for avoid more shrink_zone() */
> >>  		}
> >>  
> >> -		shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
> >> +		shrink_zone(zone, sc, classzone_idx);
> > 
> > this should be is_classzone, right?
> 
> No, this is shrink_zones() context, not shrink_zone().

Ohh, right. They read to similar. I didn't spot anything else.

Acked-by: Michal Hocko <mhocko@suse.com>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim()
@ 2016-06-01 15:45         ` Michal Hocko
  0 siblings, 0 replies; 64+ messages in thread
From: Michal Hocko @ 2016-06-01 15:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On Wed 01-06-16 17:19:26, Vlastimil Babka wrote:
> On 06/01/2016 04:21 PM, Michal Hocko wrote:
> > On Tue 31-05-16 15:08:18, Vlastimil Babka wrote:
> > [...]
> >> @@ -2364,11 +2350,12 @@ static inline bool should_continue_reclaim(struct zone *zone,
> >>  }
> >>  
> >>  static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> >> -			bool is_classzone)
> >> +			int classzone_idx)
> >>  {
> >>  	struct reclaim_state *reclaim_state = current->reclaim_state;
> >>  	unsigned long nr_reclaimed, nr_scanned;
> >>  	bool reclaimable = false;
> >> +	bool is_classzone = (classzone_idx == zone_idx(zone));
> >>  
> >>  	do {
> >>  		struct mem_cgroup *root = sc->target_mem_cgroup;
> >> @@ -2450,7 +2437,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> >>  			reclaimable = true;
> >>  
> >>  	} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
> >> -					 sc->nr_scanned - nr_scanned, sc));
> >> +			 sc->nr_scanned - nr_scanned, sc, classzone_idx));
> >>  
> >>  	return reclaimable;
> >>  }
> >> @@ -2580,7 +2567,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> >>  			/* need some check for avoid more shrink_zone() */
> >>  		}
> >>  
> >> -		shrink_zone(zone, sc, zone_idx(zone) == classzone_idx);
> >> +		shrink_zone(zone, sc, classzone_idx);
> > 
> > this should be is_classzone, right?
> 
> No, this is shrink_zones() context, not shrink_zone().

Ohh, right. They read to similar. I didn't spot anything else.

Acked-by: Michal Hocko <mhocko@suse.com>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 12/18] mm, compaction: more reliably increase direct compaction priority
  2016-06-01 13:51     ` Michal Hocko
@ 2016-06-23 14:41       ` Vlastimil Babka
  -1 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-06-23 14:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On 06/01/2016 03:51 PM, Michal Hocko wrote:
> On Tue 31-05-16 15:08:12, Vlastimil Babka wrote:
>> During reclaim/compaction loop, compaction priority can be increased by the
>> should_compact_retry() function, but the current code is not optimal. Priority
>> is only increased when compaction_failed() is true, which means that compaction
>> has scanned the whole zone. This may not happen even after multiple attempts
>> with the lower priority due to parallel activity, so we might needlessly
>> struggle on the lower priority.
>>
>> We can remove these corner cases by increasing compaction priority regardless
>> of compaction_failed(). Examining further the compaction result can be
>> postponed only after reaching the highest priority. This is a simple solution
>> and we don't need to worry about reaching the highest priority "too soon" here,
>> because hen should_compact_retry() is called it means that the system is
>> already struggling and the allocation is supposed to either try as hard as
>> possible, or it cannot fail at all. There's not much point staying at lower
>> priorities with heuristics that may result in only partial compaction.
>>
>> The only exception here is the COMPACT_SKIPPED result, which means that
>> compaction could not run at all due to being below order-0 watermarks. In that
>> case, don't increase compaction priority, and check if compaction could proceed
>> when everything reclaimable was reclaimed. Before this patch, this was tied to
>> compaction_withdrawn(), but the other results considered there are in fact only
>> possible due to low compaction priority so we can ignore them thanks to the
>> patch. Since there are no other callers of compaction_withdrawn(), remove it.
>
> I agree with the change in general. I think that keeping compaction_withdrawn
> even with a single check is better because it abstracts the fact from a
> specific constant.

OK.

> Now that I think about that some more I guess you also want to update
> compaction_retries inside should_compact_retry as well, or at least
> update it only when we have reached the lowest priority. What do you
> think?

Makes sense, especially that after your suggestion, 
should_compact_retry() is not reached as long as should_reclaim_retry() 
returnes true. So I will do that.

>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> Other than that this makes sense
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 12/18] mm, compaction: more reliably increase direct compaction priority
@ 2016-06-23 14:41       ` Vlastimil Babka
  0 siblings, 0 replies; 64+ messages in thread
From: Vlastimil Babka @ 2016-06-23 14:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, Joonsoo Kim,
	David Rientjes, Rik van Riel

On 06/01/2016 03:51 PM, Michal Hocko wrote:
> On Tue 31-05-16 15:08:12, Vlastimil Babka wrote:
>> During reclaim/compaction loop, compaction priority can be increased by the
>> should_compact_retry() function, but the current code is not optimal. Priority
>> is only increased when compaction_failed() is true, which means that compaction
>> has scanned the whole zone. This may not happen even after multiple attempts
>> with the lower priority due to parallel activity, so we might needlessly
>> struggle on the lower priority.
>>
>> We can remove these corner cases by increasing compaction priority regardless
>> of compaction_failed(). Examining further the compaction result can be
>> postponed only after reaching the highest priority. This is a simple solution
>> and we don't need to worry about reaching the highest priority "too soon" here,
>> because hen should_compact_retry() is called it means that the system is
>> already struggling and the allocation is supposed to either try as hard as
>> possible, or it cannot fail at all. There's not much point staying at lower
>> priorities with heuristics that may result in only partial compaction.
>>
>> The only exception here is the COMPACT_SKIPPED result, which means that
>> compaction could not run at all due to being below order-0 watermarks. In that
>> case, don't increase compaction priority, and check if compaction could proceed
>> when everything reclaimable was reclaimed. Before this patch, this was tied to
>> compaction_withdrawn(), but the other results considered there are in fact only
>> possible due to low compaction priority so we can ignore them thanks to the
>> patch. Since there are no other callers of compaction_withdrawn(), remove it.
>
> I agree with the change in general. I think that keeping compaction_withdrawn
> even with a single check is better because it abstracts the fact from a
> specific constant.

OK.

> Now that I think about that some more I guess you also want to update
> compaction_retries inside should_compact_retry as well, or at least
> update it only when we have reached the lowest priority. What do you
> think?

Makes sense, especially that after your suggestion, 
should_compact_retry() is not reached as long as should_reclaim_retry() 
returnes true. So I will do that.

>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> Other than that this makes sense
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2016-06-23 14:41 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-31 13:08 [PATCH v2 00/18] make direct compaction more deterministic Vlastimil Babka
2016-05-31 13:08 ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 01/18] mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 02/18] mm, page_alloc: set alloc_flags only once in slowpath Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 03/18] mm, page_alloc: don't retry initial attempt " Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 13:26   ` Michal Hocko
2016-06-01 13:26     ` Michal Hocko
2016-06-01 14:58     ` Vlastimil Babka
2016-06-01 14:58       ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 04/18] mm, page_alloc: restructure direct compaction handling " Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 05/18] mm, page_alloc: make THP-specific decisions more generic Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 06/18] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 13:33   ` Michal Hocko
2016-06-01 13:33     ` Michal Hocko
2016-05-31 13:08 ` [PATCH v2 07/18] mm, compaction: introduce direct compaction priority Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 08/18] mm, compaction: simplify contended compaction handling Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 09/18] mm, compaction: make whole_zone flag ignore cached scanner positions Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 10/18] mm, compaction: cleanup unused functions Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 13:45   ` Michal Hocko
2016-06-01 13:45     ` Michal Hocko
2016-05-31 13:08 ` [PATCH v2 11/18] mm, compaction: add the ultimate direct compaction priority Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 12/18] mm, compaction: more reliably increase " Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 13:51   ` Michal Hocko
2016-06-01 13:51     ` Michal Hocko
2016-06-23 14:41     ` Vlastimil Babka
2016-06-23 14:41       ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 13/18] mm, compaction: use correct watermark when checking allocation success Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 13:59   ` Michal Hocko
2016-06-01 13:59     ` Michal Hocko
2016-05-31 13:08 ` [PATCH v2 14/18] mm, compaction: create compact_gap wrapper Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 14:02   ` Michal Hocko
2016-06-01 14:02     ` Michal Hocko
2016-05-31 13:08 ` [PATCH v2 15/18] mm, compaction: use proper alloc_flags in __compaction_suitable() Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-05-31 13:08 ` [PATCH v2 16/18] mm, compaction: require only min watermarks for non-costly orders Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 14:08   ` Michal Hocko
2016-06-01 14:08     ` Michal Hocko
2016-05-31 13:08 ` [PATCH v2 17/18] mm, vmscan: make compaction_ready() more accurate and readable Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 14:14   ` Michal Hocko
2016-06-01 14:14     ` Michal Hocko
2016-05-31 13:08 ` [PATCH v2 18/18] mm, vmscan: use proper classzone_idx in should_continue_reclaim() Vlastimil Babka
2016-05-31 13:08   ` Vlastimil Babka
2016-06-01 14:21   ` Michal Hocko
2016-06-01 14:21     ` Michal Hocko
2016-06-01 15:19     ` Vlastimil Babka
2016-06-01 15:19       ` Vlastimil Babka
2016-06-01 15:45       ` Michal Hocko
2016-06-01 15:45         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.