linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/11] make direct compaction more deterministic
@ 2016-08-10  9:12 Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 01/11] mm, compaction: make whole_zone flag ignore cached scanner positions Vlastimil Babka
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

Changes since v3
* The first part with cleanups (v4, v5) went separately to 4.8-rc1
* Rebased to 4.8-rc1
* Patch 1 - don't touch cached pfns in whole-zone compaction (Joonsoo)
* New patches 2 and 3 in response to Joonsoo pointing out missing adustments
  to watermark checks in patch 7 - turns out we can remove those watermark
  checks altogether.
* Patch 6 made less aggressive to avoid premature OOM (Joonsoo)

This is mostly a followup to Michal's oom detection rework, which highlighted
the need for direct compaction to provide better feedback in reclaim/compaction
loop, so that it can reliably recognize when compaction cannot make further
progress, and allocation should invoke OOM killer or fail. We've discussed
this at LSF/MM [1] where I proposed expanding the async/sync migration mode
used in compaction to more general "priorities". This patchset adds one new
priority that just overrides all the heuristics and makes compaction fully
scan all zones. I don't currently think that we need more fine-grained
priorities, but we'll see. Other than that there's some smaller fixes and
cleanups, mainly related to the THP-specific hacks.

I've tested this with stress-highalloc in GFP_KERNEL order-4 and
THP-like order-9 scenarios. There's some improvement for compaction stats
for the order-4, which is likely due to the better watermarks handling.
In the previous version I reported mostly noise wrt compaction stats, and
decreased direct reclaim - now the reclaim is without difference. I believe
this is due to the less aggressive compaction priority increase in patch 6.

"before" is a mmotm tree prior to 4.7 release plus the first part of the
series that was sent and merged separately

                                    before        after
order-4:

Compaction stalls                    27216       30759
Compaction success                   19598       25475
Compaction failures                   7617        5283
Page migrate success                370510      464919
Page migrate failure                 25712       27987
Compaction pages isolated           849601     1041581
Compaction migrate scanned       143146541   101084990
Compaction free scanned          208355124   144863510
Compaction cost                       1403        1210

order-9:

Compaction stalls                     7311        7401
Compaction success                    1634        1683
Compaction failures                   5677        5718
Page migrate success                194657      183988
Page migrate failure                  4753        4170
Compaction pages isolated           498790      456130
Compaction migrate scanned          565371      524174
Compaction free scanned            4230296     4250744
Compaction cost                        215         203

[1] https://lwn.net/Articles/684611/

Vlastimil Babka (11):
  mm, compaction: make whole_zone flag ignore cached scanner positions
  mm, compaction: cleanup unused functions
  mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS
  mm, compaction: don't recheck watermarks after COMPACT_SUCCESS
  mm, compaction: add the ultimate direct compaction priority
  mm, compaction: more reliably increase direct compaction priority
  mm, compaction: use correct watermark when checking compaction success
  mm, compaction: create compact_gap wrapper
  mm, compaction: use proper alloc_flags in __compaction_suitable()
  mm, compaction: require only min watermarks for non-costly orders
  mm, vmscan: make compaction_ready() more accurate and readable

 include/linux/compaction.h        |  32 +++++---
 include/trace/events/compaction.h |   2 +-
 mm/compaction.c                   | 154 +++++++++++++++++---------------------
 mm/internal.h                     |   2 +-
 mm/page_alloc.c                   |  22 +++---
 mm/vmscan.c                       |  49 ++++++------
 6 files changed, 128 insertions(+), 133 deletions(-)

-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 01/11] mm, compaction: make whole_zone flag ignore cached scanner positions
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 02/11] mm, compaction: cleanup unused functions Vlastimil Babka
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

A recent patch has added whole_zone flag that compaction sets when scanning
starts from the zone boundary, in order to report that zone has been fully
scanned in one attempt. For allocations that want to try really hard or cannot
fail, we will want to introduce a mode where scanning whole zone is guaranteed
regardless of the cached positions.

This patch reuses the whole_zone flag in a way that if it's already passed true
to compaction, the cached scanner positions are ignored. Employing this flag
during reclaim/compaction loop will be done in the next patch. This patch
however converts compaction invoked from userspace via procfs to use this flag.
Before this patch, the cached positions were first reset to zone boundaries and
then read back from struct zone, so there was a window where a parallel
compaction could replace the reset values, making the manual compaction less
effective. Using the flag instead of performing reset is more robust.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/compaction.c | 43 +++++++++++++++++++++----------------------
 mm/internal.h   |  2 +-
 2 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9affb2908304..5b0483ce6cb1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1492,23 +1492,29 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	/*
 	 * Setup to move all movable pages to the end of the zone. Used cached
-	 * information on where the scanners should start but check that it
-	 * is initialised by ensuring the values are within zone boundaries.
+	 * information on where the scanners should start (unless we explictly
+	 * want to compact the whole zone), but check that it is initialised
+	 * by ensuring the values are within zone boundaries.
 	 */
-	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
-	cc->free_pfn = zone->compact_cached_free_pfn;
-	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
-		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
-		zone->compact_cached_free_pfn = cc->free_pfn;
-	}
-	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
+	if (cc->whole_zone) {
 		cc->migrate_pfn = start_pfn;
-		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
-		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
-	}
+		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
+	} else {
+		cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
+		cc->free_pfn = zone->compact_cached_free_pfn;
+		if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+			cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
+			zone->compact_cached_free_pfn = cc->free_pfn;
+		}
+		if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
+			cc->migrate_pfn = start_pfn;
+			zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
+			zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
+		}
 
-	if (cc->migrate_pfn == start_pfn)
-		cc->whole_zone = true;
+		if (cc->migrate_pfn == start_pfn)
+			cc->whole_zone = true;
+	}
 
 	cc->last_migrated_pfn = 0;
 
@@ -1747,14 +1753,6 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		INIT_LIST_HEAD(&cc->freepages);
 		INIT_LIST_HEAD(&cc->migratepages);
 
-		/*
-		 * When called via /proc/sys/vm/compact_memory
-		 * this makes sure we compact the whole zone regardless of
-		 * cached scanner positions.
-		 */
-		if (is_via_compact_memory(cc->order))
-			__reset_isolation_suitable(zone);
-
 		if (is_via_compact_memory(cc->order) ||
 				!compaction_deferred(zone, cc->order))
 			compact_zone(zone, cc);
@@ -1790,6 +1788,7 @@ static void compact_node(int nid)
 		.order = -1,
 		.mode = MIGRATE_SYNC,
 		.ignore_skip_hint = true,
+		.whole_zone = true,
 	};
 
 	__compact_pgdat(NODE_DATA(nid), &cc);
diff --git a/mm/internal.h b/mm/internal.h
index 1501304f87a4..5214bf8e3171 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -179,7 +179,7 @@ struct compact_control {
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool direct_compaction;		/* False from kcompactd or /proc/... */
-	bool whole_zone;		/* Whole zone has been scanned */
+	bool whole_zone;		/* Whole zone should/has been scanned */
 	int order;			/* order a direct compactor needs */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 02/11] mm, compaction: cleanup unused functions
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 01/11] mm, compaction: make whole_zone flag ignore cached scanner positions Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 03/11] mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS Vlastimil Babka
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

Since kswapd compaction moved to kcompactd, compact_pgdat() is not called
anymore, so we remove it. The only caller of __compact_pgdat() is
compact_node(), so we merge them and remove code that was only reachable from
kswapd.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h |  5 ----
 mm/compaction.c            | 60 +++++++++++++---------------------------------
 2 files changed, 17 insertions(+), 48 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index d4e106b5dc27..1bb58581301c 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -70,7 +70,6 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
 		const struct alloc_context *ac, enum compact_priority prio);
-extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
 		unsigned int alloc_flags, int classzone_idx);
@@ -154,10 +153,6 @@ extern void kcompactd_stop(int nid);
 extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int classzone_idx);
 
 #else
-static inline void compact_pgdat(pg_data_t *pgdat, int order)
-{
-}
-
 static inline void reset_isolation_suitable(pg_data_t *pgdat)
 {
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index 5b0483ce6cb1..328bdfeece2d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1736,10 +1736,18 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 
 
 /* Compact all zones within a node */
-static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
+static void compact_node(int nid)
 {
+	pg_data_t *pgdat = NODE_DATA(nid);
 	int zoneid;
 	struct zone *zone;
+	struct compact_control cc = {
+		.order = -1,
+		.mode = MIGRATE_SYNC,
+		.ignore_skip_hint = true,
+		.whole_zone = true,
+	};
+
 
 	for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
 
@@ -1747,53 +1755,19 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		if (!populated_zone(zone))
 			continue;
 
-		cc->nr_freepages = 0;
-		cc->nr_migratepages = 0;
-		cc->zone = zone;
-		INIT_LIST_HEAD(&cc->freepages);
-		INIT_LIST_HEAD(&cc->migratepages);
-
-		if (is_via_compact_memory(cc->order) ||
-				!compaction_deferred(zone, cc->order))
-			compact_zone(zone, cc);
-
-		VM_BUG_ON(!list_empty(&cc->freepages));
-		VM_BUG_ON(!list_empty(&cc->migratepages));
+		cc.nr_freepages = 0;
+		cc.nr_migratepages = 0;
+		cc.zone = zone;
+		INIT_LIST_HEAD(&cc.freepages);
+		INIT_LIST_HEAD(&cc.migratepages);
 
-		if (is_via_compact_memory(cc->order))
-			continue;
+		compact_zone(zone, &cc);
 
-		if (zone_watermark_ok(zone, cc->order,
-				low_wmark_pages(zone), 0, 0))
-			compaction_defer_reset(zone, cc->order, false);
+		VM_BUG_ON(!list_empty(&cc.freepages));
+		VM_BUG_ON(!list_empty(&cc.migratepages));
 	}
 }
 
-void compact_pgdat(pg_data_t *pgdat, int order)
-{
-	struct compact_control cc = {
-		.order = order,
-		.mode = MIGRATE_ASYNC,
-	};
-
-	if (!order)
-		return;
-
-	__compact_pgdat(pgdat, &cc);
-}
-
-static void compact_node(int nid)
-{
-	struct compact_control cc = {
-		.order = -1,
-		.mode = MIGRATE_SYNC,
-		.ignore_skip_hint = true,
-		.whole_zone = true,
-	};
-
-	__compact_pgdat(NODE_DATA(nid), &cc);
-}
-
 /* Compact all nodes in the system */
 static void compact_nodes(void)
 {
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 03/11] mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 01/11] mm, compaction: make whole_zone flag ignore cached scanner positions Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 02/11] mm, compaction: cleanup unused functions Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-18  9:01   ` Michal Hocko
  2016-08-10  9:12 ` [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS Vlastimil Babka
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

COMPACT_PARTIAL has historically meant that compaction returned after doing
some work without fully compacting a zone. It however didn't distinguish if
compaction terminated because it succeeded in creating the requested high-order
page. This has changed recently and now we only return COMPACT_PARTIAL when
compaction thinks it succeeded, or the high-order watermark check in
compaction_suitable() passes and no compaction needs to be done.

So at this point we can make the return value clearer by renaming it to
COMPACT_SUCCESS. The next patch will remove some redundant tests for success
where compaction just returned COMPACT_SUCCESS.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h        |  8 ++++----
 include/trace/events/compaction.h |  2 +-
 mm/compaction.c                   | 12 ++++++------
 mm/vmscan.c                       |  2 +-
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 1bb58581301c..e88c037afe47 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -49,10 +49,10 @@ enum compact_result {
 	COMPACT_CONTENDED,
 
 	/*
-	 * direct compaction partially compacted a zone and there might be
-	 * suitable pages
+	 * direct compaction terminated after concluding that the allocation
+	 * should now succeed
 	 */
-	COMPACT_PARTIAL,
+	COMPACT_SUCCESS,
 };
 
 struct alloc_context; /* in mm/internal.h */
@@ -88,7 +88,7 @@ static inline bool compaction_made_progress(enum compact_result result)
 	 * that the compaction successfully isolated and migrated some
 	 * pageblocks.
 	 */
-	if (result == COMPACT_PARTIAL)
+	if (result == COMPACT_SUCCESS)
 		return true;
 
 	return false;
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index c2ba402ab256..cbdb90b6b308 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -13,7 +13,7 @@
 	EM( COMPACT_SKIPPED,		"skipped")		\
 	EM( COMPACT_DEFERRED,		"deferred")		\
 	EM( COMPACT_CONTINUE,		"continue")		\
-	EM( COMPACT_PARTIAL,		"partial")		\
+	EM( COMPACT_SUCCESS,		"success")		\
 	EM( COMPACT_PARTIAL_SKIPPED,	"partial_skipped")	\
 	EM( COMPACT_COMPLETE,		"complete")		\
 	EM( COMPACT_NO_SUITABLE_PAGE,	"no_suitable_page")	\
diff --git a/mm/compaction.c b/mm/compaction.c
index 328bdfeece2d..c355bf0d8599 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1329,13 +1329,13 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 
 		/* Job done if page is free of the right migratetype */
 		if (!list_empty(&area->free_list[migratetype]))
-			return COMPACT_PARTIAL;
+			return COMPACT_SUCCESS;
 
 #ifdef CONFIG_CMA
 		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
 		if (migratetype == MIGRATE_MOVABLE &&
 			!list_empty(&area->free_list[MIGRATE_CMA]))
-			return COMPACT_PARTIAL;
+			return COMPACT_SUCCESS;
 #endif
 		/*
 		 * Job done if allocation would steal freepages from
@@ -1343,7 +1343,7 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 		 */
 		if (find_suitable_fallback(area, order, migratetype,
 						true, &can_steal) != -1)
-			return COMPACT_PARTIAL;
+			return COMPACT_SUCCESS;
 	}
 
 	return COMPACT_NO_SUITABLE_PAGE;
@@ -1367,7 +1367,7 @@ static enum compact_result compact_finished(struct zone *zone,
  * compaction_suitable: Is this suitable to run compaction on this zone now?
  * Returns
  *   COMPACT_SKIPPED  - If there are too few free pages for compaction
- *   COMPACT_PARTIAL  - If the allocation would succeed without compaction
+ *   COMPACT_SUCCESS  - If the allocation would succeed without compaction
  *   COMPACT_CONTINUE - If compaction should run now
  */
 static enum compact_result __compaction_suitable(struct zone *zone, int order,
@@ -1388,7 +1388,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 */
 	if (zone_watermark_ok(zone, order, watermark, classzone_idx,
 								alloc_flags))
-		return COMPACT_PARTIAL;
+		return COMPACT_SUCCESS;
 
 	/*
 	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
@@ -1477,7 +1477,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
-	if (ret == COMPACT_PARTIAL || ret == COMPACT_SKIPPED)
+	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
 		return ret;
 
 	/* huh, compaction_suitable is returning something unexpected */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 374d95d04178..c84784765d3a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2514,7 +2514,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 			continue;
 
 		switch (compaction_suitable(zone, sc->order, 0, sc->reclaim_idx)) {
-		case COMPACT_PARTIAL:
+		case COMPACT_SUCCESS:
 		case COMPACT_CONTINUE:
 			return false;
 		default:
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
                   ` (2 preceding siblings ...)
  2016-08-10  9:12 ` [PATCH v6 03/11] mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-16  6:12   ` Joonsoo Kim
  2016-08-18  9:03   ` Michal Hocko
  2016-08-10  9:12 ` [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority Vlastimil Babka
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

Joonsoo has reminded me that in a later patch changing watermark checks
throughout compaction I forgot to update checks in try_to_compact_pages() and
compactd_do_work(). Closer inspection however shows that they are redundant now
that compact_zone() reliably reports success with COMPACT_SUCCESS, as they just
repeat (a subset) of checks that have just passed. So instead of checking
watermarks again, just test the return value.

Also remove the stray "bool success" variable from kcompactd_do_work().

Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index c355bf0d8599..a144f58f7193 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1698,9 +1698,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 					alloc_flags, ac_classzone_idx(ac));
 		rc = max(status, rc);
 
-		/* If a normal allocation would succeed, stop compacting */
-		if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
-					ac_classzone_idx(ac), alloc_flags)) {
+		/* The allocation should succeed, stop compacting */
+		if (status == COMPACT_SUCCESS) {
 			/*
 			 * We think the allocation will succeed in this zone,
 			 * but it is not certain, hence the false. The caller
@@ -1873,8 +1872,6 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 		.ignore_skip_hint = true,
 
 	};
-	bool success = false;
-
 	trace_mm_compaction_kcompactd_wake(pgdat->node_id, cc.order,
 							cc.classzone_idx);
 	count_vm_event(KCOMPACTD_WAKE);
@@ -1903,9 +1900,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 			return;
 		status = compact_zone(zone, &cc);
 
-		if (zone_watermark_ok(zone, cc.order, low_wmark_pages(zone),
-						cc.classzone_idx, 0)) {
-			success = true;
+		if (status == COMPACT_SUCCESS) {
 			compaction_defer_reset(zone, cc.order, false);
 		} else if (status == COMPACT_PARTIAL_SKIPPED || status == COMPACT_COMPLETE) {
 			/*
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
                   ` (3 preceding siblings ...)
  2016-08-10  9:12 ` [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-16  5:58   ` Joonsoo Kim
  2016-08-10  9:12 ` [PATCH v6 06/11] mm, compaction: more reliably increase " Vlastimil Babka
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

During reclaim/compaction loop, it's desirable to get a final answer from
unsuccessful compaction so we can either fail the allocation or invoke the OOM
killer. However, heuristics such as deferred compaction or pageblock skip bits
can cause compaction to skip parts or whole zones and lead to premature OOM's,
failures or excessive reclaim/compaction retries.

To remedy this, we introduce a new direct compaction priority called
COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to:

- ignore deferred compaction status for a zone
- ignore pageblock skip hints
- ignore cached scanner positions and scan the whole zone

The new priority should get eventually picked up by should_compact_retry() and
this should improve success rates for costly allocations using __GFP_REPEAT,
such as hugetlbfs allocations, and reduce some corner-case OOM's for non-costly
allocations.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h | 3 ++-
 mm/compaction.c            | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index e88c037afe47..a1fba9994728 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -6,8 +6,9 @@
  * Lower value means higher priority, analogically to reclaim priority.
  */
 enum compact_priority {
+	COMPACT_PRIO_SYNC_FULL,
+	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL,
 	COMPACT_PRIO_SYNC_LIGHT,
-	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
 	DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
 	COMPACT_PRIO_ASYNC,
 	INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
diff --git a/mm/compaction.c b/mm/compaction.c
index a144f58f7193..ae4f40afcca1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1644,6 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.alloc_flags = alloc_flags,
 		.classzone_idx = classzone_idx,
 		.direct_compaction = true,
+		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
+		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
@@ -1689,7 +1691,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 								ac->nodemask) {
 		enum compact_result status;
 
-		if (compaction_deferred(zone, order)) {
+		if (prio > COMPACT_PRIO_SYNC_FULL
+					&& compaction_deferred(zone, order)) {
 			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
 			continue;
 		}
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 06/11] mm, compaction: more reliably increase direct compaction priority
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
                   ` (4 preceding siblings ...)
  2016-08-10  9:12 ` [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-16  6:07   ` Joonsoo Kim
  2016-08-18  9:10   ` Michal Hocko
  2016-08-10  9:12 ` [PATCH v6 07/11] mm, compaction: use correct watermark when checking compaction success Vlastimil Babka
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

During reclaim/compaction loop, compaction priority can be increased by the
should_compact_retry() function, but the current code is not optimal. Priority
is only increased when compaction_failed() is true, which means that compaction
has scanned the whole zone. This may not happen even after multiple attempts
with a lower priority due to parallel activity, so we might needlessly
struggle on the lower priorities and possibly run out of compaction retry
attempts in the process.

After this patch we are guaranteed at least one attempt at the highest
compaction priority even if we exhaust all retries at the lower priorities.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fb975cec3518..b28517b918b0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3155,13 +3155,8 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	 * so it doesn't really make much sense to retry except when the
 	 * failure could be caused by insufficient priority
 	 */
-	if (compaction_failed(compact_result)) {
-		if (*compact_priority > MIN_COMPACT_PRIORITY) {
-			(*compact_priority)--;
-			return true;
-		}
-		return false;
-	}
+	if (compaction_failed(compact_result))
+		goto check_priority;
 
 	/*
 	 * make sure the compaction wasn't deferred or didn't bail out early
@@ -3185,6 +3180,15 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (compaction_retries <= max_retries)
 		return true;
 
+	/*
+	 * Make sure there is at least one attempt at the highest priority
+	 * if we exhausted all retries at the lower priorities
+	 */
+check_priority:
+	if (*compact_priority > MIN_COMPACT_PRIORITY) {
+		(*compact_priority)--;
+		return true;
+	}
 	return false;
 }
 #else
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 07/11] mm, compaction: use correct watermark when checking compaction success
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
                   ` (5 preceding siblings ...)
  2016-08-10  9:12 ` [PATCH v6 06/11] mm, compaction: more reliably increase " Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 08/11] mm, compaction: create compact_gap wrapper Vlastimil Babka
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

The __compact_finished() function uses low watermark in a check that has to
pass if the direct compaction is to finish and allocation should succeed. This
is too pessimistic, as the allocation will typically use min watermark. It may
happen that during compaction, we drop below the low watermark (due to parallel
activity), but still form the target high-order page. By checking against low
watermark, we might needlessly continue compaction.

Similarly, __compaction_suitable() uses low watermark in a check whether
allocation can succeed without compaction. Again, this is unnecessarily
pessimistic.

After this patch, these check will use direct compactor's alloc_flags to
determine the watermark, which is effectively the min watermark.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/compaction.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ae4f40afcca1..9bd788886b1a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1316,7 +1316,7 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 		return COMPACT_CONTINUE;
 
 	/* Compaction run is not finished if the watermark is not met */
-	watermark = low_wmark_pages(zone);
+	watermark = zone->watermark[cc->alloc_flags & ALLOC_WMARK_MASK];
 
 	if (!zone_watermark_ok(zone, cc->order, watermark, cc->classzone_idx,
 							cc->alloc_flags))
@@ -1381,7 +1381,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	if (is_via_compact_memory(order))
 		return COMPACT_CONTINUE;
 
-	watermark = low_wmark_pages(zone);
+	watermark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 	/*
 	 * If watermarks for high-order allocation are already met, there
 	 * should be no need for compaction at all.
@@ -1395,7 +1395,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 * This is because during migration, copies of pages need to be
 	 * allocated and for a short time, the footprint is higher
 	 */
-	watermark += (2UL << order);
+	watermark = low_wmark_pages(zone) + (2UL << order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 				 alloc_flags, wmark_target))
 		return COMPACT_SKIPPED;
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 08/11] mm, compaction: create compact_gap wrapper
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
                   ` (6 preceding siblings ...)
  2016-08-10  9:12 ` [PATCH v6 07/11] mm, compaction: use correct watermark when checking compaction success Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-16  6:15   ` Joonsoo Kim
  2016-08-10  9:12 ` [PATCH v6 09/11] mm, compaction: use proper alloc_flags in __compaction_suitable() Vlastimil Babka
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

Compaction uses a watermark gap of (2UL << order) pages at various places and
it's not immediately obvious why. Abstract it through a compact_gap() wrapper
to create a single place with a thorough explanation.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/compaction.h | 16 ++++++++++++++++
 mm/compaction.c            |  7 +++----
 mm/vmscan.c                |  6 +++---
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a1fba9994728..e7f0d34a90fe 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -58,6 +58,22 @@ enum compact_result {
 
 struct alloc_context; /* in mm/internal.h */
 
+/*
+ * Number of free order-0 pages that should be available above given watermark
+ * to make sure compaction has reasonable chance of not running out of free
+ * pages that it needs to isolate as migration target during its work.
+ */
+static inline unsigned long compact_gap(unsigned int order)
+{
+	/*
+	 * Although all the isolations for migration are temporary, compaction
+	 * may have up to 1 << order pages on its list and then try to split
+	 * an (order - 1) free page. At that point, a gap of 1 << order might
+	 * not be enough, so it's safer to require twice that amount.
+	 */
+	return 2UL << order;
+}
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
diff --git a/mm/compaction.c b/mm/compaction.c
index 9bd788886b1a..ae6ecf8f8e70 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1391,11 +1391,10 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 		return COMPACT_SUCCESS;
 
 	/*
-	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
-	 * This is because during migration, copies of pages need to be
-	 * allocated and for a short time, the footprint is higher
+	 * Watermarks for order-0 must be met for compaction to be able to
+	 * isolate free pages for migration targets.
 	 */
-	watermark = low_wmark_pages(zone) + (2UL << order);
+	watermark = low_wmark_pages(zone) + compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 				 alloc_flags, wmark_target))
 		return COMPACT_SKIPPED;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c84784765d3a..b676b4b51db0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2499,7 +2499,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 	 * If we have not reclaimed enough pages for compaction and the
 	 * inactive lists are large enough, continue reclaiming
 	 */
-	pages_for_compaction = (2UL << sc->order);
+	pages_for_compaction = compact_gap(sc->order);
 	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
 	if (get_nr_swap_pages() > 0)
 		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
@@ -2631,7 +2631,7 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
 	 * there is a buffer of free pages available to give compaction
 	 * a reasonable chance of completing and allocating the page
 	 */
-	watermark = high_wmark_pages(zone) + (2UL << sc->order);
+	watermark = high_wmark_pages(zone) + compact_gap(sc->order);
 	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, sc->reclaim_idx);
 
 	/*
@@ -3188,7 +3188,7 @@ static bool kswapd_shrink_node(pg_data_t *pgdat,
 	 * excessive reclaim. Assume that a process requested a high-order
 	 * can direct reclaim/compact.
 	 */
-	if (sc->order && sc->nr_reclaimed >= 2UL << sc->order)
+	if (sc->order && sc->nr_reclaimed >= compact_gap(sc->order))
 		sc->order = 0;
 
 	return sc->nr_scanned >= sc->nr_to_reclaim;
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 09/11] mm, compaction: use proper alloc_flags in __compaction_suitable()
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
                   ` (7 preceding siblings ...)
  2016-08-10  9:12 ` [PATCH v6 08/11] mm, compaction: create compact_gap wrapper Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders Vlastimil Babka
  2016-08-10  9:12 ` [PATCH v6 11/11] mm, vmscan: make compaction_ready() more accurate and readable Vlastimil Babka
  10 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

The __compaction_suitable() function checks the low watermark plus a
compact_gap() gap to decide if there's enough free memory to perform
compaction. This check uses direct compactor's alloc_flags, but that's wrong,
since these flags are not applicable for freepage isolation.

For example, alloc_flags may indicate access to memory reserves, making
compaction proceed, and then fail watermark check during the isolation.

A similar problem exists for ALLOC_CMA, which may be part of alloc_flags, but
not during freepage isolation. In this case however it makes sense to use
ALLOC_CMA both in __compaction_suitable() and __isolate_free_page(), since
there's actually nothing preventing the freepage scanner to isolate from CMA
pageblocks, with the assumption that a page that could be migrated once by
compaction can be migrated also later by CMA allocation. Thus we should count
pages in CMA pageblocks when considering compaction suitability and when
isolating freepages.

To sum up, this patch should remove some false positives from
__compaction_suitable(), and allow compaction to proceed when free pages
required for compaction reside in the CMA pageblocks.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 12 ++++++++++--
 mm/page_alloc.c |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ae6ecf8f8e70..80eaf9fff114 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1392,11 +1392,19 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
-	 * isolate free pages for migration targets.
+	 * isolate free pages for migration targets. This means that the
+	 * watermark and alloc_flags have to match, or be more pessimistic than
+	 * the check in __isolate_free_page(). We don't use the direct
+	 * compactor's alloc_flags, as they are not relevant for freepage
+	 * isolation. We however do use the direct compactor's classzone_idx to
+	 * skip over zones where lowmem reserves would prevent allocation even
+	 * if compaction succeeds.
+	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
+	 * suitable migration targets
 	 */
 	watermark = low_wmark_pages(zone) + compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
-				 alloc_flags, wmark_target))
+						ALLOC_CMA, wmark_target))
 		return COMPACT_SKIPPED;
 
 	/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b28517b918b0..621e4211ce16 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2493,7 +2493,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 	if (!is_migrate_isolate(mt)) {
 		/* Obey watermarks as if the page was being allocated */
 		watermark = low_wmark_pages(zone) + (1 << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
 
 		__mod_zone_freepage_state(zone, -(1UL << order), mt);
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
                   ` (8 preceding siblings ...)
  2016-08-10  9:12 ` [PATCH v6 09/11] mm, compaction: use proper alloc_flags in __compaction_suitable() Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  2016-08-16  6:16   ` Joonsoo Kim
  2016-08-10  9:12 ` [PATCH v6 11/11] mm, vmscan: make compaction_ready() more accurate and readable Vlastimil Babka
  10 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

The __compaction_suitable() function checks the low watermark plus a
compact_gap() gap to decide if there's enough free memory to perform
compaction. Then __isolate_free_page uses low watermark check to decide if
particular free page can be isolated. In the latter case, using low watermark
is needlessly pessimistic, as the free page isolations are only temporary. For
__compaction_suitable() the higher watermark makes sense for high-order
allocations where more freepages increase the chance of success, and we can
typically fail with some order-0 fallback when the system is struggling to
reach that watermark. But for low-order allocation, forming the page should not
be that hard. So using low watermark here might just prevent compaction from
even trying, and eventually lead to OOM killer even if we are above min
watermarks.

So after this patch, we use min watermark for non-costly orders in
__compaction_suitable(), and for all orders in __isolate_free_page().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/compaction.c | 6 +++++-
 mm/page_alloc.c | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 80eaf9fff114..0bba270f97ad 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1399,10 +1399,14 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 * isolation. We however do use the direct compactor's classzone_idx to
 	 * skip over zones where lowmem reserves would prevent allocation even
 	 * if compaction succeeds.
+	 * For costly orders, we require low watermark instead of min for
+	 * compaction to proceed to increase its chances.
 	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
 	 * suitable migration targets
 	 */
-	watermark = low_wmark_pages(zone) + compact_gap(order);
+	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
+				low_wmark_pages(zone) : min_wmark_pages(zone);
+	watermark += compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
 						ALLOC_CMA, wmark_target))
 		return COMPACT_SKIPPED;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 621e4211ce16..a5c0f914ec00 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2492,7 +2492,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 
 	if (!is_migrate_isolate(mt)) {
 		/* Obey watermarks as if the page was being allocated */
-		watermark = low_wmark_pages(zone) + (1 << order);
+		watermark = min_wmark_pages(zone) + (1UL << order);
 		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
 
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 11/11] mm, vmscan: make compaction_ready() more accurate and readable
  2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
                   ` (9 preceding siblings ...)
  2016-08-10  9:12 ` [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders Vlastimil Babka
@ 2016-08-10  9:12 ` Vlastimil Babka
  10 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-10  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel, Vlastimil Babka

The compaction_ready() is used during direct reclaim for costly order
allocations to skip reclaim for zones where compaction should be attempted
instead. It's combining the standard compaction_suitable() check with its own
watermark check based on high watermark with extra gap, and the result is
confusing at best.

This patch attempts to better structure and document the checks involved.
First, compaction_suitable() can determine that the allocation should either
succeed already, or that compaction doesn't have enough free pages to proceed.
The third possibility is that compaction has enough free pages, but we still
decide to reclaim first - unless we are already above the high watermark with
gap.  This does not mean that the reclaim will actually reach this watermark
during single attempt, this is rather an over-reclaim protection. So document
the code as such. The check for compaction_deferred() is removed completely, as
it in fact had no proper role here.

The result after this patch is mainly a less confusing code. We also skip some
over-reclaim in cases where the allocation should already succed.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmscan.c | 43 ++++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b676b4b51db0..f9b3112e963a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2617,38 +2617,35 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 }
 
 /*
- * Returns true if compaction should go ahead for a high-order request, or
- * the high-order allocation would succeed without compaction.
+ * Returns true if compaction should go ahead for a costly-order request, or
+ * the allocation would already succeed without compaction. Return false if we
+ * should reclaim first.
  */
 static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
 {
 	unsigned long watermark;
-	bool watermark_ok;
+	enum compact_result suitable;
 
-	/*
-	 * Compaction takes time to run and there are potentially other
-	 * callers using the pages just freed. Continue reclaiming until
-	 * there is a buffer of free pages available to give compaction
-	 * a reasonable chance of completing and allocating the page
-	 */
-	watermark = high_wmark_pages(zone) + compact_gap(sc->order);
-	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, sc->reclaim_idx);
-
-	/*
-	 * If compaction is deferred, reclaim up to a point where
-	 * compaction will have a chance of success when re-enabled
-	 */
-	if (compaction_deferred(zone, sc->order))
-		return watermark_ok;
+	suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx);
+	if (suitable == COMPACT_SUCCESS)
+		/* Allocation should succeed already. Don't reclaim. */
+		return true;
+	if (suitable == COMPACT_SKIPPED)
+		/* Compaction cannot yet proceed. Do reclaim. */
+		return false;
 
 	/*
-	 * If compaction is not ready to start and allocation is not likely
-	 * to succeed without it, then keep reclaiming.
+	 * Compaction is already possible, but it takes time to run and there
+	 * are potentially other callers using the pages just freed. So proceed
+	 * with reclaim to make a buffer of free pages available to give
+	 * compaction a reasonable chance of completing and allocating the page.
+	 * Note that we won't actually reclaim the whole buffer in one attempt
+	 * as the target watermark in should_continue_reclaim() is lower. But if
+	 * we are already above the high+gap watermark, don't reclaim at all.
 	 */
-	if (compaction_suitable(zone, sc->order, 0, sc->reclaim_idx) == COMPACT_SKIPPED)
-		return false;
+	watermark = high_wmark_pages(zone) + compact_gap(sc->order);
 
-	return watermark_ok;
+	return zone_watermark_ok_safe(zone, 0, watermark, sc->reclaim_idx);
 }
 
 /*
-- 
2.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority
  2016-08-10  9:12 ` [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority Vlastimil Babka
@ 2016-08-16  5:58   ` Joonsoo Kim
  2016-08-18 12:23     ` Vlastimil Babka
  0 siblings, 1 reply; 32+ messages in thread
From: Joonsoo Kim @ 2016-08-16  5:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Wed, Aug 10, 2016 at 11:12:20AM +0200, Vlastimil Babka wrote:
> During reclaim/compaction loop, it's desirable to get a final answer from
> unsuccessful compaction so we can either fail the allocation or invoke the OOM
> killer. However, heuristics such as deferred compaction or pageblock skip bits
> can cause compaction to skip parts or whole zones and lead to premature OOM's,
> failures or excessive reclaim/compaction retries.
> 
> To remedy this, we introduce a new direct compaction priority called
> COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to:
> 
> - ignore deferred compaction status for a zone
> - ignore pageblock skip hints
> - ignore cached scanner positions and scan the whole zone
> 
> The new priority should get eventually picked up by should_compact_retry() and
> this should improve success rates for costly allocations using __GFP_REPEAT,
> such as hugetlbfs allocations, and reduce some corner-case OOM's for non-costly
> allocations.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/compaction.h | 3 ++-
>  mm/compaction.c            | 5 ++++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index e88c037afe47..a1fba9994728 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -6,8 +6,9 @@
>   * Lower value means higher priority, analogically to reclaim priority.
>   */
>  enum compact_priority {
> +	COMPACT_PRIO_SYNC_FULL,
> +	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL,
>  	COMPACT_PRIO_SYNC_LIGHT,
> -	MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
>  	DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
>  	COMPACT_PRIO_ASYNC,
>  	INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a144f58f7193..ae4f40afcca1 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1644,6 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  		.alloc_flags = alloc_flags,
>  		.classzone_idx = classzone_idx,
>  		.direct_compaction = true,
> +		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
> +		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
> @@ -1689,7 +1691,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  								ac->nodemask) {
>  		enum compact_result status;
>  
> -		if (compaction_deferred(zone, order)) {
> +		if (prio > COMPACT_PRIO_SYNC_FULL
> +					&& compaction_deferred(zone, order)) {
>  			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
>  			continue;

Could we provide prio to compaction_deferred() and do the decision in
that that function?

BTW, in kcompactd, compaction_deferred() is checked but
.ignore_skip_hint=true. Is there any reason? If we can remove
compaction_deferred() for kcompactd, we can check .ignore_skip_hint
to determine if defer is needed or not.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 06/11] mm, compaction: more reliably increase direct compaction priority
  2016-08-10  9:12 ` [PATCH v6 06/11] mm, compaction: more reliably increase " Vlastimil Babka
@ 2016-08-16  6:07   ` Joonsoo Kim
  2016-08-16  6:31     ` Vlastimil Babka
  2016-08-18  9:10   ` Michal Hocko
  1 sibling, 1 reply; 32+ messages in thread
From: Joonsoo Kim @ 2016-08-16  6:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Wed, Aug 10, 2016 at 11:12:21AM +0200, Vlastimil Babka wrote:
> During reclaim/compaction loop, compaction priority can be increased by the
> should_compact_retry() function, but the current code is not optimal. Priority
> is only increased when compaction_failed() is true, which means that compaction
> has scanned the whole zone. This may not happen even after multiple attempts
> with a lower priority due to parallel activity, so we might needlessly
> struggle on the lower priorities and possibly run out of compaction retry
> attempts in the process.
> 
> After this patch we are guaranteed at least one attempt at the highest
> compaction priority even if we exhaust all retries at the lower priorities.

The only difference that this patch makes is increasing priority when
COMPACT_PARTIAL(COMPACTION_SUCCESS) returns. In that case, we can
usually allocate high-order freepage so we would not enter here. Am I
missing something? Is it really needed behaviour change?

Thanks.

> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fb975cec3518..b28517b918b0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3155,13 +3155,8 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  	 * so it doesn't really make much sense to retry except when the
>  	 * failure could be caused by insufficient priority
>  	 */
> -	if (compaction_failed(compact_result)) {
> -		if (*compact_priority > MIN_COMPACT_PRIORITY) {
> -			(*compact_priority)--;
> -			return true;
> -		}
> -		return false;
> -	}
> +	if (compaction_failed(compact_result))
> +		goto check_priority;
>  
>  	/*
>  	 * make sure the compaction wasn't deferred or didn't bail out early
> @@ -3185,6 +3180,15 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  	if (compaction_retries <= max_retries)
>  		return true;
>  
> +	/*
> +	 * Make sure there is at least one attempt at the highest priority
> +	 * if we exhausted all retries at the lower priorities
> +	 */
> +check_priority:
> +	if (*compact_priority > MIN_COMPACT_PRIORITY) {
> +		(*compact_priority)--;
> +		return true;
> +	}
>  	return false;

The only difference that this patch makes is increasing priority when
COMPACT_PARTIAL(COMPACTION_SUCCESS) returns. In that case, we can
usually allocate high-order freepage so we would not enter here. Am I
missing something? Is it really needed behaviour change?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS
  2016-08-16  6:12   ` Joonsoo Kim
@ 2016-08-16  6:11     ` Vlastimil Babka
  2016-08-18 11:59     ` Vlastimil Babka
  1 sibling, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-16  6:11 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/16/2016 08:12 AM, Joonsoo Kim wrote:
> On Wed, Aug 10, 2016 at 11:12:19AM +0200, Vlastimil Babka wrote:
>> Joonsoo has reminded me that in a later patch changing watermark checks
>> throughout compaction I forgot to update checks in try_to_compact_pages() and
>> compactd_do_work(). Closer inspection however shows that they are redundant now
>> that compact_zone() reliably reports success with COMPACT_SUCCESS, as they just
>> repeat (a subset) of checks that have just passed. So instead of checking
>> watermarks again, just test the return value.
>
> In fact, it's not redundant. Even if try_to_compact_pages() returns
> !COMPACT_SUCCESS, watermark check could return true.

Right, I meant they are redundant in the SUCCESS case.

> __compact_finished() calls find_suitable_fallback() and it's slightly
> different with watermark check. Anyway, I don't think it is a big
> problem.

I agree. It might be even better for long-term fragmentation that we 
e.g. try another zone instead of taking page from the "unsuitable 
fallback". If that's not successful, and the allocation is important 
enough there will later eventually be another watermark check permitting 
the unsuitable fallback.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS
  2016-08-10  9:12 ` [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS Vlastimil Babka
@ 2016-08-16  6:12   ` Joonsoo Kim
  2016-08-16  6:11     ` Vlastimil Babka
  2016-08-18 11:59     ` Vlastimil Babka
  2016-08-18  9:03   ` Michal Hocko
  1 sibling, 2 replies; 32+ messages in thread
From: Joonsoo Kim @ 2016-08-16  6:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Wed, Aug 10, 2016 at 11:12:19AM +0200, Vlastimil Babka wrote:
> Joonsoo has reminded me that in a later patch changing watermark checks
> throughout compaction I forgot to update checks in try_to_compact_pages() and
> compactd_do_work(). Closer inspection however shows that they are redundant now
> that compact_zone() reliably reports success with COMPACT_SUCCESS, as they just
> repeat (a subset) of checks that have just passed. So instead of checking
> watermarks again, just test the return value.

In fact, it's not redundant. Even if try_to_compact_pages() returns
!COMPACT_SUCCESS, watermark check could return true.
__compact_finished() calls find_suitable_fallback() and it's slightly
different with watermark check. Anyway, I don't think it is a big
problem.

Thanks.


> 
> Also remove the stray "bool success" variable from kcompactd_do_work().
> 
> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/compaction.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c355bf0d8599..a144f58f7193 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1698,9 +1698,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  					alloc_flags, ac_classzone_idx(ac));
>  		rc = max(status, rc);
>  
> -		/* If a normal allocation would succeed, stop compacting */
> -		if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
> -					ac_classzone_idx(ac), alloc_flags)) {
> +		/* The allocation should succeed, stop compacting */
> +		if (status == COMPACT_SUCCESS) {
>  			/*
>  			 * We think the allocation will succeed in this zone,
>  			 * but it is not certain, hence the false. The caller
> @@ -1873,8 +1872,6 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  		.ignore_skip_hint = true,
>  
>  	};
> -	bool success = false;
> -
>  	trace_mm_compaction_kcompactd_wake(pgdat->node_id, cc.order,
>  							cc.classzone_idx);
>  	count_vm_event(KCOMPACTD_WAKE);
> @@ -1903,9 +1900,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  			return;
>  		status = compact_zone(zone, &cc);
>  
> -		if (zone_watermark_ok(zone, cc.order, low_wmark_pages(zone),
> -						cc.classzone_idx, 0)) {
> -			success = true;
> +		if (status == COMPACT_SUCCESS) {
>  			compaction_defer_reset(zone, cc.order, false);
>  		} else if (status == COMPACT_PARTIAL_SKIPPED || status == COMPACT_COMPLETE) {
>  			/*
> -- 
> 2.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 08/11] mm, compaction: create compact_gap wrapper
  2016-08-10  9:12 ` [PATCH v6 08/11] mm, compaction: create compact_gap wrapper Vlastimil Babka
@ 2016-08-16  6:15   ` Joonsoo Kim
  2016-08-16  6:15     ` Vlastimil Babka
  0 siblings, 1 reply; 32+ messages in thread
From: Joonsoo Kim @ 2016-08-16  6:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Wed, Aug 10, 2016 at 11:12:23AM +0200, Vlastimil Babka wrote:
> Compaction uses a watermark gap of (2UL << order) pages at various places and
> it's not immediately obvious why. Abstract it through a compact_gap() wrapper
> to create a single place with a thorough explanation.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/compaction.h | 16 ++++++++++++++++
>  mm/compaction.c            |  7 +++----
>  mm/vmscan.c                |  6 +++---
>  3 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index a1fba9994728..e7f0d34a90fe 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -58,6 +58,22 @@ enum compact_result {
>  
>  struct alloc_context; /* in mm/internal.h */
>  
> +/*
> + * Number of free order-0 pages that should be available above given watermark
> + * to make sure compaction has reasonable chance of not running out of free
> + * pages that it needs to isolate as migration target during its work.
> + */
> +static inline unsigned long compact_gap(unsigned int order)
> +{
> +	/*
> +	 * Although all the isolations for migration are temporary, compaction
> +	 * may have up to 1 << order pages on its list and then try to split
> +	 * an (order - 1) free page. At that point, a gap of 1 << order might
> +	 * not be enough, so it's safer to require twice that amount.
> +	 */
> +	return 2UL << order;
> +}

I agree with this wrapper function but there is a question.

Could you elaborate more on this code comment? Freescanner could keep
COMPACT_CLUSTER_MAX freepages on the list. It's not associated with
requested order at least for now. Why compact_gap is 2UL << order in
this case?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 08/11] mm, compaction: create compact_gap wrapper
  2016-08-16  6:15   ` Joonsoo Kim
@ 2016-08-16  6:15     ` Vlastimil Babka
  2016-08-16  6:41       ` Joonsoo Kim
  0 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-16  6:15 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/16/2016 08:15 AM, Joonsoo Kim wrote:
> On Wed, Aug 10, 2016 at 11:12:23AM +0200, Vlastimil Babka wrote:
>> --- a/include/linux/compaction.h
>> +++ b/include/linux/compaction.h
>> @@ -58,6 +58,22 @@ enum compact_result {
>>
>>  struct alloc_context; /* in mm/internal.h */
>>
>> +/*
>> + * Number of free order-0 pages that should be available above given watermark
>> + * to make sure compaction has reasonable chance of not running out of free
>> + * pages that it needs to isolate as migration target during its work.
>> + */
>> +static inline unsigned long compact_gap(unsigned int order)
>> +{
>> +	/*
>> +	 * Although all the isolations for migration are temporary, compaction
>> +	 * may have up to 1 << order pages on its list and then try to split
>> +	 * an (order - 1) free page. At that point, a gap of 1 << order might
>> +	 * not be enough, so it's safer to require twice that amount.
>> +	 */
>> +	return 2UL << order;
>> +}
>
> I agree with this wrapper function but there is a question.
>
> Could you elaborate more on this code comment? Freescanner could keep
> COMPACT_CLUSTER_MAX freepages on the list. It's not associated with
> requested order at least for now. Why compact_gap is 2UL << order in
> this case?

It's true that for high enough order, COMPACT_CLUSTER_MAX might be more 
limiting than 1 << order. But then it also helps to have more free pages 
for probability of compaction success, so I don't think it's worth 
complicating the compact_gap() formula.

> Thanks.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders
  2016-08-10  9:12 ` [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders Vlastimil Babka
@ 2016-08-16  6:16   ` Joonsoo Kim
  2016-08-16  6:36     ` Vlastimil Babka
  0 siblings, 1 reply; 32+ messages in thread
From: Joonsoo Kim @ 2016-08-16  6:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Wed, Aug 10, 2016 at 11:12:25AM +0200, Vlastimil Babka wrote:
> The __compaction_suitable() function checks the low watermark plus a
> compact_gap() gap to decide if there's enough free memory to perform
> compaction. Then __isolate_free_page uses low watermark check to decide if
> particular free page can be isolated. In the latter case, using low watermark
> is needlessly pessimistic, as the free page isolations are only temporary. For
> __compaction_suitable() the higher watermark makes sense for high-order
> allocations where more freepages increase the chance of success, and we can
> typically fail with some order-0 fallback when the system is struggling to
> reach that watermark. But for low-order allocation, forming the page should not
> be that hard. So using low watermark here might just prevent compaction from
> even trying, and eventually lead to OOM killer even if we are above min
> watermarks.
> 
> So after this patch, we use min watermark for non-costly orders in
> __compaction_suitable(), and for all orders in __isolate_free_page().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/compaction.c | 6 +++++-
>  mm/page_alloc.c | 2 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 80eaf9fff114..0bba270f97ad 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1399,10 +1399,14 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  	 * isolation. We however do use the direct compactor's classzone_idx to
>  	 * skip over zones where lowmem reserves would prevent allocation even
>  	 * if compaction succeeds.
> +	 * For costly orders, we require low watermark instead of min for
> +	 * compaction to proceed to increase its chances.
>  	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
>  	 * suitable migration targets
>  	 */
> -	watermark = low_wmark_pages(zone) + compact_gap(order);
> +	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
> +				low_wmark_pages(zone) : min_wmark_pages(zone);
> +	watermark += compact_gap(order);
>  	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
>  						ALLOC_CMA, wmark_target))
>  		return COMPACT_SKIPPED;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 621e4211ce16..a5c0f914ec00 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2492,7 +2492,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  
>  	if (!is_migrate_isolate(mt)) {
>  		/* Obey watermarks as if the page was being allocated */
> -		watermark = low_wmark_pages(zone) + (1 << order);
> +		watermark = min_wmark_pages(zone) + (1UL << order);

This '1 << order' also needs some comment. Why can't we use
compact_gap() in this case?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 06/11] mm, compaction: more reliably increase direct compaction priority
  2016-08-16  6:07   ` Joonsoo Kim
@ 2016-08-16  6:31     ` Vlastimil Babka
  0 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-16  6:31 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/16/2016 08:07 AM, Joonsoo Kim wrote:
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>  mm/page_alloc.c | 18 +++++++++++-------
>>  1 file changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index fb975cec3518..b28517b918b0 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3155,13 +3155,8 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>>  	 * so it doesn't really make much sense to retry except when the
>>  	 * failure could be caused by insufficient priority
>>  	 */
>> -	if (compaction_failed(compact_result)) {
>> -		if (*compact_priority > MIN_COMPACT_PRIORITY) {
>> -			(*compact_priority)--;
>> -			return true;
>> -		}
>> -		return false;
>> -	}
>> +	if (compaction_failed(compact_result))
>> +		goto check_priority;
>>
>>  	/*
>>  	 * make sure the compaction wasn't deferred or didn't bail out early
>> @@ -3185,6 +3180,15 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>>  	if (compaction_retries <= max_retries)
>>  		return true;
>>
>> +	/*
>> +	 * Make sure there is at least one attempt at the highest priority
>> +	 * if we exhausted all retries at the lower priorities
>> +	 */
>> +check_priority:
>> +	if (*compact_priority > MIN_COMPACT_PRIORITY) {
>> +		(*compact_priority)--;
>> +		return true;
>> +	}
>>  	return false;
>
> The only difference that this patch makes is increasing priority when
> COMPACT_PARTIAL(COMPACTION_SUCCESS) returns. In that case, we can

Hm it's true that I adjusted this patch from the previous version, 
before realizing that PARTIAL is now SUCCESS.

> usually allocate high-order freepage so we would not enter here. Am I
> missing something? Is it really needed behaviour change?

It will likely be rare when this triggers, when compaction success 
doesn't lead to allocation success due to parallel allocation activity.

> Thanks.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders
  2016-08-16  6:16   ` Joonsoo Kim
@ 2016-08-16  6:36     ` Vlastimil Babka
  2016-08-16  6:46       ` Joonsoo Kim
  0 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-16  6:36 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/16/2016 08:16 AM, Joonsoo Kim wrote:
> On Wed, Aug 10, 2016 at 11:12:25AM +0200, Vlastimil Babka wrote:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 621e4211ce16..a5c0f914ec00 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -2492,7 +2492,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>>
>>  	if (!is_migrate_isolate(mt)) {
>>  		/* Obey watermarks as if the page was being allocated */
>> -		watermark = low_wmark_pages(zone) + (1 << order);
>> +		watermark = min_wmark_pages(zone) + (1UL << order);
>
> This '1 << order' also needs some comment. Why can't we use
> compact_gap() in this case?

This is just short-cutting the high-order watermark check to check only 
order-0, because we already know the high-order page exists.
We can't use compact_gap() as that's too high to use for a single 
allocation watermark, since we can be already holding some free pages on 
the list. So it would defeat the gap purpose.

> Thanks.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 08/11] mm, compaction: create compact_gap wrapper
  2016-08-16  6:15     ` Vlastimil Babka
@ 2016-08-16  6:41       ` Joonsoo Kim
  2016-08-18 12:13         ` Vlastimil Babka
  0 siblings, 1 reply; 32+ messages in thread
From: Joonsoo Kim @ 2016-08-16  6:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Tue, Aug 16, 2016 at 08:15:36AM +0200, Vlastimil Babka wrote:
> On 08/16/2016 08:15 AM, Joonsoo Kim wrote:
> >On Wed, Aug 10, 2016 at 11:12:23AM +0200, Vlastimil Babka wrote:
> >>--- a/include/linux/compaction.h
> >>+++ b/include/linux/compaction.h
> >>@@ -58,6 +58,22 @@ enum compact_result {
> >>
> >> struct alloc_context; /* in mm/internal.h */
> >>
> >>+/*
> >>+ * Number of free order-0 pages that should be available above given watermark
> >>+ * to make sure compaction has reasonable chance of not running out of free
> >>+ * pages that it needs to isolate as migration target during its work.
> >>+ */
> >>+static inline unsigned long compact_gap(unsigned int order)
> >>+{
> >>+	/*
> >>+	 * Although all the isolations for migration are temporary, compaction
> >>+	 * may have up to 1 << order pages on its list and then try to split
> >>+	 * an (order - 1) free page. At that point, a gap of 1 << order might
> >>+	 * not be enough, so it's safer to require twice that amount.
> >>+	 */
> >>+	return 2UL << order;
> >>+}
> >
> >I agree with this wrapper function but there is a question.
> >
> >Could you elaborate more on this code comment? Freescanner could keep
> >COMPACT_CLUSTER_MAX freepages on the list. It's not associated with
> >requested order at least for now. Why compact_gap is 2UL << order in
> >this case?
> 
> It's true that for high enough order, COMPACT_CLUSTER_MAX might be
> more limiting than 1 << order. But then it also helps to have more

AFAIK, regardless of order, migration scanner isolates
COMPACT_CLUSTER_MAX pages. And, freepage scanner isolates
more than nr_migratepages freepages.

> free pages for probability of compaction success, so I don't think
> it's worth complicating the compact_gap() formula.

I agree that it's not worth complicating the compact_gap() formula but
it would be better to fix the comment?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders
  2016-08-16  6:36     ` Vlastimil Babka
@ 2016-08-16  6:46       ` Joonsoo Kim
  2016-08-18 12:20         ` Vlastimil Babka
  0 siblings, 1 reply; 32+ messages in thread
From: Joonsoo Kim @ 2016-08-16  6:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Tue, Aug 16, 2016 at 08:36:12AM +0200, Vlastimil Babka wrote:
> On 08/16/2016 08:16 AM, Joonsoo Kim wrote:
> >On Wed, Aug 10, 2016 at 11:12:25AM +0200, Vlastimil Babka wrote:
> >>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>index 621e4211ce16..a5c0f914ec00 100644
> >>--- a/mm/page_alloc.c
> >>+++ b/mm/page_alloc.c
> >>@@ -2492,7 +2492,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
> >>
> >> 	if (!is_migrate_isolate(mt)) {
> >> 		/* Obey watermarks as if the page was being allocated */
> >>-		watermark = low_wmark_pages(zone) + (1 << order);
> >>+		watermark = min_wmark_pages(zone) + (1UL << order);
> >
> >This '1 << order' also needs some comment. Why can't we use
> >compact_gap() in this case?
> 
> This is just short-cutting the high-order watermark check to check
> only order-0, because we already know the high-order page exists.
> We can't use compact_gap() as that's too high to use for a single
> allocation watermark, since we can be already holding some free
> pages on the list. So it would defeat the gap purpose.

Oops. I missed that. Thanks for clarifying it.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 03/11] mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS
  2016-08-10  9:12 ` [PATCH v6 03/11] mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS Vlastimil Babka
@ 2016-08-18  9:01   ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2016-08-18  9:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Wed 10-08-16 11:12:18, Vlastimil Babka wrote:
> COMPACT_PARTIAL has historically meant that compaction returned after doing
> some work without fully compacting a zone. It however didn't distinguish if
> compaction terminated because it succeeded in creating the requested high-order
> page. This has changed recently and now we only return COMPACT_PARTIAL when
> compaction thinks it succeeded, or the high-order watermark check in
> compaction_suitable() passes and no compaction needs to be done.
> 
> So at this point we can make the return value clearer by renaming it to
> COMPACT_SUCCESS. The next patch will remove some redundant tests for success
> where compaction just returned COMPACT_SUCCESS.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/compaction.h        |  8 ++++----
>  include/trace/events/compaction.h |  2 +-
>  mm/compaction.c                   | 12 ++++++------
>  mm/vmscan.c                       |  2 +-
>  4 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 1bb58581301c..e88c037afe47 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -49,10 +49,10 @@ enum compact_result {
>  	COMPACT_CONTENDED,
>  
>  	/*
> -	 * direct compaction partially compacted a zone and there might be
> -	 * suitable pages
> +	 * direct compaction terminated after concluding that the allocation
> +	 * should now succeed
>  	 */
> -	COMPACT_PARTIAL,
> +	COMPACT_SUCCESS,
>  };
>  
>  struct alloc_context; /* in mm/internal.h */
> @@ -88,7 +88,7 @@ static inline bool compaction_made_progress(enum compact_result result)
>  	 * that the compaction successfully isolated and migrated some
>  	 * pageblocks.
>  	 */
> -	if (result == COMPACT_PARTIAL)
> +	if (result == COMPACT_SUCCESS)
>  		return true;
>  
>  	return false;
> diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
> index c2ba402ab256..cbdb90b6b308 100644
> --- a/include/trace/events/compaction.h
> +++ b/include/trace/events/compaction.h
> @@ -13,7 +13,7 @@
>  	EM( COMPACT_SKIPPED,		"skipped")		\
>  	EM( COMPACT_DEFERRED,		"deferred")		\
>  	EM( COMPACT_CONTINUE,		"continue")		\
> -	EM( COMPACT_PARTIAL,		"partial")		\
> +	EM( COMPACT_SUCCESS,		"success")		\
>  	EM( COMPACT_PARTIAL_SKIPPED,	"partial_skipped")	\
>  	EM( COMPACT_COMPLETE,		"complete")		\
>  	EM( COMPACT_NO_SUITABLE_PAGE,	"no_suitable_page")	\
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 328bdfeece2d..c355bf0d8599 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1329,13 +1329,13 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
>  
>  		/* Job done if page is free of the right migratetype */
>  		if (!list_empty(&area->free_list[migratetype]))
> -			return COMPACT_PARTIAL;
> +			return COMPACT_SUCCESS;
>  
>  #ifdef CONFIG_CMA
>  		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
>  		if (migratetype == MIGRATE_MOVABLE &&
>  			!list_empty(&area->free_list[MIGRATE_CMA]))
> -			return COMPACT_PARTIAL;
> +			return COMPACT_SUCCESS;
>  #endif
>  		/*
>  		 * Job done if allocation would steal freepages from
> @@ -1343,7 +1343,7 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
>  		 */
>  		if (find_suitable_fallback(area, order, migratetype,
>  						true, &can_steal) != -1)
> -			return COMPACT_PARTIAL;
> +			return COMPACT_SUCCESS;
>  	}
>  
>  	return COMPACT_NO_SUITABLE_PAGE;
> @@ -1367,7 +1367,7 @@ static enum compact_result compact_finished(struct zone *zone,
>   * compaction_suitable: Is this suitable to run compaction on this zone now?
>   * Returns
>   *   COMPACT_SKIPPED  - If there are too few free pages for compaction
> - *   COMPACT_PARTIAL  - If the allocation would succeed without compaction
> + *   COMPACT_SUCCESS  - If the allocation would succeed without compaction
>   *   COMPACT_CONTINUE - If compaction should run now
>   */
>  static enum compact_result __compaction_suitable(struct zone *zone, int order,
> @@ -1388,7 +1388,7 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  	 */
>  	if (zone_watermark_ok(zone, order, watermark, classzone_idx,
>  								alloc_flags))
> -		return COMPACT_PARTIAL;
> +		return COMPACT_SUCCESS;
>  
>  	/*
>  	 * Watermarks for order-0 must be met for compaction. Note the 2UL.
> @@ -1477,7 +1477,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
>  	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
>  							cc->classzone_idx);
>  	/* Compaction is likely to fail */
> -	if (ret == COMPACT_PARTIAL || ret == COMPACT_SKIPPED)
> +	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
>  		return ret;
>  
>  	/* huh, compaction_suitable is returning something unexpected */
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 374d95d04178..c84784765d3a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2514,7 +2514,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
>  			continue;
>  
>  		switch (compaction_suitable(zone, sc->order, 0, sc->reclaim_idx)) {
> -		case COMPACT_PARTIAL:
> +		case COMPACT_SUCCESS:
>  		case COMPACT_CONTINUE:
>  			return false;
>  		default:
> -- 
> 2.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS
  2016-08-10  9:12 ` [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS Vlastimil Babka
  2016-08-16  6:12   ` Joonsoo Kim
@ 2016-08-18  9:03   ` Michal Hocko
  1 sibling, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2016-08-18  9:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Wed 10-08-16 11:12:19, Vlastimil Babka wrote:
> Joonsoo has reminded me that in a later patch changing watermark checks
> throughout compaction I forgot to update checks in try_to_compact_pages() and
> compactd_do_work(). Closer inspection however shows that they are redundant now
> that compact_zone() reliably reports success with COMPACT_SUCCESS, as they just
> repeat (a subset) of checks that have just passed. So instead of checking
> watermarks again, just test the return value.

the less watermark checks we do the better because they just increase a
probability of subtle and hard to explain corner cases.

> Also remove the stray "bool success" variable from kcompactd_do_work().
> 
> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/compaction.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c355bf0d8599..a144f58f7193 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1698,9 +1698,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  					alloc_flags, ac_classzone_idx(ac));
>  		rc = max(status, rc);
>  
> -		/* If a normal allocation would succeed, stop compacting */
> -		if (zone_watermark_ok(zone, order, low_wmark_pages(zone),
> -					ac_classzone_idx(ac), alloc_flags)) {
> +		/* The allocation should succeed, stop compacting */
> +		if (status == COMPACT_SUCCESS) {
>  			/*
>  			 * We think the allocation will succeed in this zone,
>  			 * but it is not certain, hence the false. The caller
> @@ -1873,8 +1872,6 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  		.ignore_skip_hint = true,
>  
>  	};
> -	bool success = false;
> -
>  	trace_mm_compaction_kcompactd_wake(pgdat->node_id, cc.order,
>  							cc.classzone_idx);
>  	count_vm_event(KCOMPACTD_WAKE);
> @@ -1903,9 +1900,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  			return;
>  		status = compact_zone(zone, &cc);
>  
> -		if (zone_watermark_ok(zone, cc.order, low_wmark_pages(zone),
> -						cc.classzone_idx, 0)) {
> -			success = true;
> +		if (status == COMPACT_SUCCESS) {
>  			compaction_defer_reset(zone, cc.order, false);
>  		} else if (status == COMPACT_PARTIAL_SKIPPED || status == COMPACT_COMPLETE) {
>  			/*
> -- 
> 2.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 06/11] mm, compaction: more reliably increase direct compaction priority
  2016-08-10  9:12 ` [PATCH v6 06/11] mm, compaction: more reliably increase " Vlastimil Babka
  2016-08-16  6:07   ` Joonsoo Kim
@ 2016-08-18  9:10   ` Michal Hocko
  2016-08-18  9:44     ` Vlastimil Babka
  1 sibling, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2016-08-18  9:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Wed 10-08-16 11:12:21, Vlastimil Babka wrote:
> During reclaim/compaction loop, compaction priority can be increased by the
> should_compact_retry() function, but the current code is not optimal. Priority
> is only increased when compaction_failed() is true, which means that compaction
> has scanned the whole zone. This may not happen even after multiple attempts
> with a lower priority due to parallel activity, so we might needlessly
> struggle on the lower priorities and possibly run out of compaction retry
> attempts in the process.
> 
> After this patch we are guaranteed at least one attempt at the highest
> compaction priority even if we exhaust all retries at the lower priorities.

I expect we will tend to do some special handling at the highest
priority so guaranteeing at least one run with that prio seems sensible to me. The only
question is whether we really want to enforce the highest priority for
costly orders as well. I think we want to reserve the highest (maybe add
one more) prio for !costly orders as those invoke the OOM killer and the
failure are quite disruptive.

> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/page_alloc.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fb975cec3518..b28517b918b0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3155,13 +3155,8 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  	 * so it doesn't really make much sense to retry except when the
>  	 * failure could be caused by insufficient priority
>  	 */
> -	if (compaction_failed(compact_result)) {
> -		if (*compact_priority > MIN_COMPACT_PRIORITY) {
> -			(*compact_priority)--;
> -			return true;
> -		}
> -		return false;
> -	}
> +	if (compaction_failed(compact_result))
> +		goto check_priority;
>  
>  	/*
>  	 * make sure the compaction wasn't deferred or didn't bail out early
> @@ -3185,6 +3180,15 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>  	if (compaction_retries <= max_retries)
>  		return true;
>  
> +	/*
> +	 * Make sure there is at least one attempt at the highest priority
> +	 * if we exhausted all retries at the lower priorities
> +	 */
> +check_priority:
> +	if (*compact_priority > MIN_COMPACT_PRIORITY) {
> +		(*compact_priority)--;
> +		return true;
> +	}
>  	return false;
>  }
>  #else
> -- 
> 2.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 06/11] mm, compaction: more reliably increase direct compaction priority
  2016-08-18  9:10   ` Michal Hocko
@ 2016-08-18  9:44     ` Vlastimil Babka
  2016-08-18  9:48       ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-18  9:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/18/2016 11:10 AM, Michal Hocko wrote:
> On Wed 10-08-16 11:12:21, Vlastimil Babka wrote:
>> During reclaim/compaction loop, compaction priority can be increased by the
>> should_compact_retry() function, but the current code is not optimal. Priority
>> is only increased when compaction_failed() is true, which means that compaction
>> has scanned the whole zone. This may not happen even after multiple attempts
>> with a lower priority due to parallel activity, so we might needlessly
>> struggle on the lower priorities and possibly run out of compaction retry
>> attempts in the process.
>>
>> After this patch we are guaranteed at least one attempt at the highest
>> compaction priority even if we exhaust all retries at the lower priorities.
>
> I expect we will tend to do some special handling at the highest
> priority so guaranteeing at least one run with that prio seems sensible to me. The only
> question is whether we really want to enforce the highest priority for
> costly orders as well. I think we want to reserve the highest (maybe add
> one more) prio for !costly orders as those invoke the OOM killer and the
> failure are quite disruptive.

Costly orders are already ruled out of reaching the highest priority 
unless they are __GFP_REPEAT, so I assumed that if they are allocations 
with __GFP_REPEAT, they really would like to succeed, so let them use 
the highest priority.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 06/11] mm, compaction: more reliably increase direct compaction priority
  2016-08-18  9:44     ` Vlastimil Babka
@ 2016-08-18  9:48       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2016-08-18  9:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, Joonsoo Kim, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On Thu 18-08-16 11:44:00, Vlastimil Babka wrote:
> On 08/18/2016 11:10 AM, Michal Hocko wrote:
> > On Wed 10-08-16 11:12:21, Vlastimil Babka wrote:
> > > During reclaim/compaction loop, compaction priority can be increased by the
> > > should_compact_retry() function, but the current code is not optimal. Priority
> > > is only increased when compaction_failed() is true, which means that compaction
> > > has scanned the whole zone. This may not happen even after multiple attempts
> > > with a lower priority due to parallel activity, so we might needlessly
> > > struggle on the lower priorities and possibly run out of compaction retry
> > > attempts in the process.
> > > 
> > > After this patch we are guaranteed at least one attempt at the highest
> > > compaction priority even if we exhaust all retries at the lower priorities.
> > 
> > I expect we will tend to do some special handling at the highest
> > priority so guaranteeing at least one run with that prio seems sensible to me. The only
> > question is whether we really want to enforce the highest priority for
> > costly orders as well. I think we want to reserve the highest (maybe add
> > one more) prio for !costly orders as those invoke the OOM killer and the
> > failure are quite disruptive.
> 
> Costly orders are already ruled out of reaching the highest priority unless
> they are __GFP_REPEAT, so I assumed that if they are allocations with
> __GFP_REPEAT, they really would like to succeed, so let them use the highest
> priority.

But even when __GFP_REPEAT is set then we do not want to be too
aggressive. E.g. hugetlb pages are better to fail than the cause
excessive reclaim or cause some long term fragmentation issues which
might be a result of the skipped heuristics. costly orders are IMHO
simply second class citizens even with they ask to try harder with
__GFP_REPEAT.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS
  2016-08-16  6:12   ` Joonsoo Kim
  2016-08-16  6:11     ` Vlastimil Babka
@ 2016-08-18 11:59     ` Vlastimil Babka
  1 sibling, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-18 11:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/16/2016 08:12 AM, Joonsoo Kim wrote:
> On Wed, Aug 10, 2016 at 11:12:19AM +0200, Vlastimil Babka wrote:
>> Joonsoo has reminded me that in a later patch changing watermark checks
>> throughout compaction I forgot to update checks in try_to_compact_pages() and
>> compactd_do_work(). Closer inspection however shows that they are redundant now
>> that compact_zone() reliably reports success with COMPACT_SUCCESS, as they just
>> repeat (a subset) of checks that have just passed. So instead of checking
>> watermarks again, just test the return value.
>
> In fact, it's not redundant. Even if try_to_compact_pages() returns
> !COMPACT_SUCCESS, watermark check could return true.
> __compact_finished() calls find_suitable_fallback() and it's slightly
> different with watermark check. Anyway, I don't think it is a big
> problem.

Andrew, can you please replace the changelog to clarify this?

===
Joonsoo has reminded me that in a later patch changing watermark checks
throughout compaction I forgot to update checks in 
try_to_compact_pages() and compactd_do_work(). Closer inspection however 
shows that they are redundant now in the success case, because 
compact_zone() now reliably reports this with COMPACT_SUCCESS. So 
effectively the checks just repeat (a subset) of checks that have just 
passed. So instead of checking watermarks again, just test the return value.

Note it's also possible that compaction would declare failure e.g. 
because its find_suitable_fallback() is more strict than simple 
watermark check, and then the watermark check we are removing would then 
still succeed. After this patch this is not possible and it's arguably 
better, because for long-term fragmentation avoidance we should rather 
try a different zone than allocate with the unsuitable fallback. If 
compaction of all zones fail and the allocation is important enough, it 
will retry and succeed anyway.

Also remove the stray "bool success" variable from kcompactd_do_work().
===


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 08/11] mm, compaction: create compact_gap wrapper
  2016-08-16  6:41       ` Joonsoo Kim
@ 2016-08-18 12:13         ` Vlastimil Babka
  0 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-18 12:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/16/2016 08:41 AM, Joonsoo Kim wrote:
>> free pages for probability of compaction success, so I don't think
>> it's worth complicating the compact_gap() formula.
> 
> I agree that it's not worth complicating the compact_gap() formula but
> it would be better to fix the comment?

OK, Andrew can you add this -fix?
Thanks.

----8<----

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders
  2016-08-16  6:46       ` Joonsoo Kim
@ 2016-08-18 12:20         ` Vlastimil Babka
  0 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-18 12:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/16/2016 08:46 AM, Joonsoo Kim wrote:
> On Tue, Aug 16, 2016 at 08:36:12AM +0200, Vlastimil Babka wrote:
>> On 08/16/2016 08:16 AM, Joonsoo Kim wrote:
>>> On Wed, Aug 10, 2016 at 11:12:25AM +0200, Vlastimil Babka wrote:
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 621e4211ce16..a5c0f914ec00 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -2492,7 +2492,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>>>>
>>>> 	if (!is_migrate_isolate(mt)) {
>>>> 		/* Obey watermarks as if the page was being allocated */
>>>> -		watermark = low_wmark_pages(zone) + (1 << order);
>>>> +		watermark = min_wmark_pages(zone) + (1UL << order);
>>>
>>> This '1 << order' also needs some comment. Why can't we use
>>> compact_gap() in this case?
>>
>> This is just short-cutting the high-order watermark check to check
>> only order-0, because we already know the high-order page exists.
>> We can't use compact_gap() as that's too high to use for a single
>> allocation watermark, since we can be already holding some free
>> pages on the list. So it would defeat the gap purpose.
> 
> Oops. I missed that. Thanks for clarifying it.

So let's expand the comment?

----8<----

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority
  2016-08-16  5:58   ` Joonsoo Kim
@ 2016-08-18 12:23     ` Vlastimil Babka
  0 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-08-18 12:23 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Michal Hocko, Mel Gorman, David Rientjes,
	Rik van Riel, linux-mm, linux-kernel

On 08/16/2016 07:58 AM, Joonsoo Kim wrote:
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1644,6 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>>  		.alloc_flags = alloc_flags,
>>  		.classzone_idx = classzone_idx,
>>  		.direct_compaction = true,
>> +		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
>> +		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
>>  	};
>>  	INIT_LIST_HEAD(&cc.freepages);
>>  	INIT_LIST_HEAD(&cc.migratepages);
>> @@ -1689,7 +1691,8 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>>  								ac->nodemask) {
>>  		enum compact_result status;
>>
>> -		if (compaction_deferred(zone, order)) {
>> +		if (prio > COMPACT_PRIO_SYNC_FULL
>> +					&& compaction_deferred(zone, order)) {
>>  			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
>>  			continue;
>
> Could we provide prio to compaction_deferred() and do the decision in
> that that function?
>
> BTW, in kcompactd, compaction_deferred() is checked but
> .ignore_skip_hint=true. Is there any reason? If we can remove
> compaction_deferred() for kcompactd, we can check .ignore_skip_hint
> to determine if defer is needed or not.

I don't want to change kcompactd right now, as the current series seems 
to help against premature OOMs. But I'll revisit it later.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-08-18 12:23 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-10  9:12 [PATCH v6 00/11] make direct compaction more deterministic Vlastimil Babka
2016-08-10  9:12 ` [PATCH v6 01/11] mm, compaction: make whole_zone flag ignore cached scanner positions Vlastimil Babka
2016-08-10  9:12 ` [PATCH v6 02/11] mm, compaction: cleanup unused functions Vlastimil Babka
2016-08-10  9:12 ` [PATCH v6 03/11] mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS Vlastimil Babka
2016-08-18  9:01   ` Michal Hocko
2016-08-10  9:12 ` [PATCH v6 04/11] mm, compaction: don't recheck watermarks after COMPACT_SUCCESS Vlastimil Babka
2016-08-16  6:12   ` Joonsoo Kim
2016-08-16  6:11     ` Vlastimil Babka
2016-08-18 11:59     ` Vlastimil Babka
2016-08-18  9:03   ` Michal Hocko
2016-08-10  9:12 ` [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority Vlastimil Babka
2016-08-16  5:58   ` Joonsoo Kim
2016-08-18 12:23     ` Vlastimil Babka
2016-08-10  9:12 ` [PATCH v6 06/11] mm, compaction: more reliably increase " Vlastimil Babka
2016-08-16  6:07   ` Joonsoo Kim
2016-08-16  6:31     ` Vlastimil Babka
2016-08-18  9:10   ` Michal Hocko
2016-08-18  9:44     ` Vlastimil Babka
2016-08-18  9:48       ` Michal Hocko
2016-08-10  9:12 ` [PATCH v6 07/11] mm, compaction: use correct watermark when checking compaction success Vlastimil Babka
2016-08-10  9:12 ` [PATCH v6 08/11] mm, compaction: create compact_gap wrapper Vlastimil Babka
2016-08-16  6:15   ` Joonsoo Kim
2016-08-16  6:15     ` Vlastimil Babka
2016-08-16  6:41       ` Joonsoo Kim
2016-08-18 12:13         ` Vlastimil Babka
2016-08-10  9:12 ` [PATCH v6 09/11] mm, compaction: use proper alloc_flags in __compaction_suitable() Vlastimil Babka
2016-08-10  9:12 ` [PATCH v6 10/11] mm, compaction: require only min watermarks for non-costly orders Vlastimil Babka
2016-08-16  6:16   ` Joonsoo Kim
2016-08-16  6:36     ` Vlastimil Babka
2016-08-16  6:46       ` Joonsoo Kim
2016-08-18 12:20         ` Vlastimil Babka
2016-08-10  9:12 ` [PATCH v6 11/11] mm, vmscan: make compaction_ready() more accurate and readable Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).